iceberg-cpp
Loading...
Searching...
No Matches
Classes | Public Types | Public Member Functions | Static Public Member Functions | List of all members
iceberg::DeleteFilter Class Reference

Concrete batch-oriented delete filter for merge-on-read data batches. More...

#include <delete_filter.h>

Classes

struct  EqDeleteGroup
 
struct  FieldLookupResult
 Field lookup output for current or fallback equality-delete fields. More...
 

Public Types

using FieldLookup = std::function< Result< std::optional< FieldLookupResult > >(int32_t)>
 Lookup a field by ID, including fields from table schema fallbacks.
 

Public Member Functions

const std::shared_ptr< Schema > & RequiredSchema () const
 Schema required from the underlying data file reader.
 
const std::shared_ptr< Schema > & ExpectedSchema () const
 The original schema requested by the caller, before delete columns were added.
 
void IncrementDeleteCount (int64_t count=1)
 Increment the delete counter by the given count.
 
Result< const PositionDeleteIndex * > DeletedRowPositions () const
 Expose the loaded position delete index for external use.
 
Result< std::function< Result< bool >(const StructLike &)> > EqDeletedRowFilter () const
 Returns a predicate that is true for rows NOT matched by any equality delete.
 
Result< std::function< Result< bool >(const StructLike &)> > FindEqualityDeleteRows () const
 Returns a predicate that is true for rows matched by any equality delete.
 
Result< AliveRowSelectionComputeAliveRows (const ArrowSchema &batch_schema, const ArrowArray &batch) const
 Compute alive rows relative to the supplied Arrow C Data batch.
 
bool HasPositionDeletes () const
 
bool HasEqualityDeletes () const
 
 DeleteFilter (const DeleteFilter &)=delete
 
DeleteFilteroperator= (const DeleteFilter &)=delete
 

Static Public Member Functions

static Result< FieldLookupMakeFieldLookup (std::shared_ptr< Schema > table_schema, std::span< const std::shared_ptr< Schema > > schemas={})
 Build a lookup from the current schema and optional table schemas.
 
static Result< FieldLookupMakeFieldLookup (std::shared_ptr< TableMetadata > table_metadata)
 Build a lookup from table metadata which uses the current schema first, then table metadata schemas as fallback.
 
static Result< std::unique_ptr< DeleteFilter > > Make (std::string file_path, std::span< const std::shared_ptr< DataFile > > delete_files, std::shared_ptr< Schema > table_schema, std::shared_ptr< Schema > requested_schema, std::shared_ptr< FileIO > io, bool need_row_pos_col=true, std::shared_ptr< DeleteCounter > counter=nullptr)
 Create a DeleteFilter with current schema only field lookup.
 
static Result< std::unique_ptr< DeleteFilter > > Make (std::string file_path, std::span< const std::shared_ptr< DataFile > > delete_files, std::shared_ptr< TableMetadata > table_metadata, std::shared_ptr< Schema > requested_schema, std::shared_ptr< FileIO > io, bool need_row_pos_col=true, std::shared_ptr< DeleteCounter > counter=nullptr)
 Create a DeleteFilter using table metadata for schema-aware field lookup.
 
static Result< std::unique_ptr< DeleteFilter > > Make (std::string file_path, std::span< const std::shared_ptr< DataFile > > delete_files, std::shared_ptr< Schema > table_schema, std::shared_ptr< Schema > requested_schema, std::shared_ptr< FileIO > io, std::span< const std::shared_ptr< Schema > > schemas, bool need_row_pos_col=true, std::shared_ptr< DeleteCounter > counter=nullptr)
 Create a DeleteFilter with table schemas for dropped equality fields.
 
static Result< std::unique_ptr< DeleteFilter > > Make (std::string file_path, std::span< const std::shared_ptr< DataFile > > delete_files, std::shared_ptr< Schema > requested_schema, std::shared_ptr< FileIO > io, FieldLookup field_lookup, bool need_row_pos_col=true, std::shared_ptr< DeleteCounter > counter=nullptr)
 Create a DeleteFilter with a custom field lookup.
 

Detailed Description

Concrete batch-oriented delete filter for merge-on-read data batches.

Member Function Documentation

◆ ComputeAliveRows()

Result< AliveRowSelection > iceberg::DeleteFilter::ComputeAliveRows ( const ArrowSchema batch_schema,
const ArrowArray batch 
) const

Compute alive rows relative to the supplied Arrow C Data batch.

Returns the indices (zero-based, relative to the batch) of rows not matched by any delete. Deleted-row counts are forwarded to the DeleteCounter supplied at construction.

◆ DeletedRowPositions()

Result< const PositionDeleteIndex * > iceberg::DeleteFilter::DeletedRowPositions ( ) const

Expose the loaded position delete index for external use.

Triggers lazy loading of position delete files on first call. Returns nullptr when there are no position deletes. Returns an error if loading fails.

The returned pointer is valid only for the lifetime of this DeleteFilter.

◆ EqDeletedRowFilter()

Result< std::function< Result< bool >(const StructLike &)> > iceberg::DeleteFilter::EqDeletedRowFilter ( ) const

Returns a predicate that is true for rows NOT matched by any equality delete.

The returned function is valid for the lifetime of this DeleteFilter and is cached after the first call. When there are no equality deletes, returns a predicate that always returns true (every row is alive).

Note
The returned predicate is NOT thread-safe: it mutates internal projection state on each call. Do not invoke it concurrently from multiple threads.

◆ FindEqualityDeleteRows()

Result< std::function< Result< bool >(const StructLike &)> > iceberg::DeleteFilter::FindEqualityDeleteRows ( ) const

Returns a predicate that is true for rows matched by any equality delete.

Inverse of EqDeletedRowFilter(). When there are no equality deletes, returns a predicate that always returns false (no row is deleted).

◆ IncrementDeleteCount()

void iceberg::DeleteFilter::IncrementDeleteCount ( int64_t  count = 1)

Increment the delete counter by the given count.

Allows callers to record deletes that occur outside ComputeAliveRows (e.g. when applying deletes in a vectorised path).

◆ Make()

Result< std::unique_ptr< DeleteFilter > > iceberg::DeleteFilter::Make ( std::string  file_path,
std::span< const std::shared_ptr< DataFile > >  delete_files,
std::shared_ptr< Schema table_schema,
std::shared_ptr< Schema requested_schema,
std::shared_ptr< FileIO io,
bool  need_row_pos_col = true,
std::shared_ptr< DeleteCounter counter = nullptr 
)
static

Create a DeleteFilter with current schema only field lookup.

Parameters
need_row_pos_colIf true, _pos is added to RequiredSchema when position deletes are present so ComputeAliveRows can apply them. Pass false when the caller owns position filtering externally (e.g. a vectorised reader that applies the position delete index directly to Arrow column buffers). Note that when need_row_pos_col is false, HasPositionDeletes() may return true but ComputeAliveRows will not apply position deletes because _pos is absent from RequiredSchema. The caller is responsible for applying them.
counterOptional counter incremented for each deleted row.

◆ MakeFieldLookup()

Result< DeleteFilter::FieldLookup > iceberg::DeleteFilter::MakeFieldLookup ( std::shared_ptr< Schema table_schema,
std::span< const std::shared_ptr< Schema > >  schemas = {} 
)
static

Build a lookup from the current schema and optional table schemas.

The current table schema is searched first. schemas is the table metadata schema list and may contain table_schema; current schema duplicates are ignored and fallback schemas are searched from latest schema id to oldest.


The documentation for this class was generated from the following files: