|
iceberg-cpp
|
A scan task for inserts generated by adding a data file to the table. More...
#include <table_scan.h>
Public Member Functions | |
| ChangelogOperation | operation () const override |
| const std::shared_ptr< DataFile > & | data_file () const |
| The data file containing the added rows. | |
| const std::vector< std::shared_ptr< DataFile > > & | delete_files () const |
| A list of delete files to apply when reading the data file in this task. | |
| ChangelogScanTask (int32_t change_ordinal, int64_t commit_snapshot_id, std::shared_ptr< DataFile > data_file, std::vector< std::shared_ptr< DataFile > > delete_files={}, std::shared_ptr< Expression > residual_filter=nullptr) | |
| Construct an AddedRowsScanTask. | |
Public Member Functions inherited from iceberg::ChangelogScanTask | |
| ChangelogScanTask (int32_t change_ordinal, int64_t commit_snapshot_id, std::shared_ptr< DataFile > data_file, std::vector< std::shared_ptr< DataFile > > delete_files={}, std::shared_ptr< Expression > residual_filter=nullptr) | |
| Construct an AddedRowsScanTask. | |
| Kind | kind () const override |
| The kind of scan task. | |
| int64_t | size_bytes () const override |
| The number of bytes that should be read by this scan task. | |
| int32_t | files_count () const override |
| The number of files that should be read by this scan task. | |
| int64_t | estimated_row_count () const override |
| The number of rows that should be read by this scan task. | |
| int32_t | change_ordinal () const |
| The position of this change in the changelog order (0-based). | |
| int64_t | commit_snapshot_id () const |
| The snapshot ID that committed this change. | |
| const std::shared_ptr< Expression > & | residual_filter () const |
| Residual filter to apply after reading. | |
Additional Inherited Members | |
Public Types inherited from iceberg::ScanTask | |
| enum class | Kind : uint8_t { kFileScanTask , kChangelogScanTask } |
Protected Attributes inherited from iceberg::ChangelogScanTask | |
| int32_t | change_ordinal_ |
| int64_t | commit_snapshot_id_ |
| std::shared_ptr< DataFile > | data_file_ |
| std::vector< std::shared_ptr< DataFile > > | delete_files_ |
| std::shared_ptr< Expression > | residual_filter_ |
A scan task for inserts generated by adding a data file to the table.
This task represents data files that were added to the table, along with any delete files that should be applied when reading the data.
Added data files may have matching delete files. This may happen if a matching position delete file is committed in the same snapshot or if changes for multiple snapshots are squashed together.
Suppose snapshot S1 adds data files F1, F2, F3 and a position delete file, D1, that marks particular records in F1 as deleted. A scan for changes generated by S1 should include the following tasks:
Readers consuming these tasks should produce added records with metadata like change ordinal and commit snapshot ID.
|
inline |
Construct an AddedRowsScanTask.
| change_ordinal | Position in the changelog order (0-based). |
| commit_snapshot_id | The snapshot ID that committed this change. |
| data_file | The data file containing the added rows. |
| delete_files | Delete files that apply to this data file. |
| residual_filter | Optional residual filter to apply after reading. |
|
inline |
A list of delete files to apply when reading the data file in this task.
|
inlineoverridevirtual |
Implements iceberg::ChangelogScanTask.