|
iceberg-cpp
|
Concrete batch-oriented delete filter for merge-on-read data batches. More...
#include <delete_filter.h>
Classes | |
| struct | EqDeleteGroup |
| struct | FieldLookupResult |
| Field lookup output for current or fallback equality-delete fields. More... | |
Public Types | |
| using | FieldLookup = std::function< Result< std::optional< FieldLookupResult > >(int32_t)> |
| Lookup a field by ID, including fields from table schema fallbacks. | |
Public Member Functions | |
| const std::shared_ptr< Schema > & | RequiredSchema () const |
| Schema required from the underlying data file reader. | |
| const std::shared_ptr< Schema > & | ExpectedSchema () const |
| The original schema requested by the caller, before delete columns were added. | |
| void | IncrementDeleteCount (int64_t count=1) |
| Increment the delete counter by the given count. | |
| Result< const PositionDeleteIndex * > | DeletedRowPositions () const |
| Expose the loaded position delete index for external use. | |
| Result< std::function< Result< bool >(const StructLike &)> > | EqDeletedRowFilter () const |
| Returns a predicate that is true for rows NOT matched by any equality delete. | |
| Result< std::function< Result< bool >(const StructLike &)> > | FindEqualityDeleteRows () const |
| Returns a predicate that is true for rows matched by any equality delete. | |
| Result< AliveRowSelection > | ComputeAliveRows (const ArrowSchema &batch_schema, const ArrowArray &batch) const |
| Compute alive rows relative to the supplied Arrow C Data batch. | |
| bool | HasPositionDeletes () const |
| bool | HasEqualityDeletes () const |
| DeleteFilter (const DeleteFilter &)=delete | |
| DeleteFilter & | operator= (const DeleteFilter &)=delete |
Static Public Member Functions | |
| static Result< FieldLookup > | MakeFieldLookup (std::shared_ptr< Schema > table_schema, std::span< const std::shared_ptr< Schema > > schemas={}) |
| Build a lookup from the current schema and optional table schemas. | |
| static Result< FieldLookup > | MakeFieldLookup (std::shared_ptr< TableMetadata > table_metadata) |
| Build a lookup from table metadata which uses the current schema first, then table metadata schemas as fallback. | |
| static Result< std::unique_ptr< DeleteFilter > > | Make (std::string file_path, std::span< const std::shared_ptr< DataFile > > delete_files, std::shared_ptr< Schema > table_schema, std::shared_ptr< Schema > requested_schema, std::shared_ptr< FileIO > io, bool need_row_pos_col=true, std::shared_ptr< DeleteCounter > counter=nullptr) |
| Create a DeleteFilter with current schema only field lookup. | |
| static Result< std::unique_ptr< DeleteFilter > > | Make (std::string file_path, std::span< const std::shared_ptr< DataFile > > delete_files, std::shared_ptr< TableMetadata > table_metadata, std::shared_ptr< Schema > requested_schema, std::shared_ptr< FileIO > io, bool need_row_pos_col=true, std::shared_ptr< DeleteCounter > counter=nullptr) |
| Create a DeleteFilter using table metadata for schema-aware field lookup. | |
| static Result< std::unique_ptr< DeleteFilter > > | Make (std::string file_path, std::span< const std::shared_ptr< DataFile > > delete_files, std::shared_ptr< Schema > table_schema, std::shared_ptr< Schema > requested_schema, std::shared_ptr< FileIO > io, std::span< const std::shared_ptr< Schema > > schemas, bool need_row_pos_col=true, std::shared_ptr< DeleteCounter > counter=nullptr) |
| Create a DeleteFilter with table schemas for dropped equality fields. | |
| static Result< std::unique_ptr< DeleteFilter > > | Make (std::string file_path, std::span< const std::shared_ptr< DataFile > > delete_files, std::shared_ptr< Schema > requested_schema, std::shared_ptr< FileIO > io, FieldLookup field_lookup, bool need_row_pos_col=true, std::shared_ptr< DeleteCounter > counter=nullptr) |
| Create a DeleteFilter with a custom field lookup. | |
Concrete batch-oriented delete filter for merge-on-read data batches.
| Result< AliveRowSelection > iceberg::DeleteFilter::ComputeAliveRows | ( | const ArrowSchema & | batch_schema, |
| const ArrowArray & | batch | ||
| ) | const |
Compute alive rows relative to the supplied Arrow C Data batch.
Returns the indices (zero-based, relative to the batch) of rows not matched by any delete. Deleted-row counts are forwarded to the DeleteCounter supplied at construction.
| Result< const PositionDeleteIndex * > iceberg::DeleteFilter::DeletedRowPositions | ( | ) | const |
Expose the loaded position delete index for external use.
Triggers lazy loading of position delete files on first call. Returns nullptr when there are no position deletes. Returns an error if loading fails.
The returned pointer is valid only for the lifetime of this DeleteFilter.
| Result< std::function< Result< bool >(const StructLike &)> > iceberg::DeleteFilter::EqDeletedRowFilter | ( | ) | const |
Returns a predicate that is true for rows NOT matched by any equality delete.
The returned function is valid for the lifetime of this DeleteFilter and is cached after the first call. When there are no equality deletes, returns a predicate that always returns true (every row is alive).
| Result< std::function< Result< bool >(const StructLike &)> > iceberg::DeleteFilter::FindEqualityDeleteRows | ( | ) | const |
Returns a predicate that is true for rows matched by any equality delete.
Inverse of EqDeletedRowFilter(). When there are no equality deletes, returns a predicate that always returns false (no row is deleted).
| void iceberg::DeleteFilter::IncrementDeleteCount | ( | int64_t | count = 1 | ) |
Increment the delete counter by the given count.
Allows callers to record deletes that occur outside ComputeAliveRows (e.g. when applying deletes in a vectorised path).
|
static |
Create a DeleteFilter with current schema only field lookup.
| need_row_pos_col | If true, _pos is added to RequiredSchema when position deletes are present so ComputeAliveRows can apply them. Pass false when the caller owns position filtering externally (e.g. a vectorised reader that applies the position delete index directly to Arrow column buffers). Note that when need_row_pos_col is false, HasPositionDeletes() may return true but ComputeAliveRows will not apply position deletes because _pos is absent from RequiredSchema. The caller is responsible for applying them. |
| counter | Optional counter incremented for each deleted row. |
|
static |
Build a lookup from the current schema and optional table schemas.
The current table schema is searched first. schemas is the table metadata schema list and may contain table_schema; current schema duplicates are ignored and fallback schemas are searched from latest schema id to oldest.