iceberg-cpp
Loading...
Searching...
No Matches
Public Types | Public Member Functions | Static Public Member Functions | List of all members
iceberg::ManifestGroup Class Reference

Coordinates reading manifest files and producing scan tasks. More...

#include <manifest_group.h>

Inheritance diagram for iceberg::ManifestGroup:
iceberg::ErrorCollector

Public Types

using CreateTasksFunction = std::function< Result< std::vector< std::shared_ptr< ScanTask > > >(std::vector< ManifestEntry > &&, const TaskContext &)>
 

Public Member Functions

 ManifestGroup (ManifestGroup &&) noexcept
 
ManifestGroupoperator= (ManifestGroup &&) noexcept
 
 ManifestGroup (const ManifestGroup &)=delete
 
ManifestGroupoperator= (const ManifestGroup &)=delete
 
ManifestGroupFilterData (std::shared_ptr< Expression > filter)
 Set a row-level data filter.
 
ManifestGroupFilterFiles (std::shared_ptr< Expression > filter)
 Set a filter that is evaluated against each DataFile's metadata.
 
ManifestGroupFilterPartitions (std::shared_ptr< Expression > filter)
 Set a partition filter expression.
 
ManifestGroupFilterManifestEntries (std::function< bool(const ManifestEntry &)> predicate)
 Set a custom manifest entry filter predicate.
 
ManifestGroupIgnoreDeleted ()
 Ignore deleted entries in manifests.
 
ManifestGroupIgnoreExisting ()
 Ignore existing entries in manifests.
 
ManifestGroupIgnoreResiduals ()
 Ignore residual filter computation.
 
ManifestGroupSelect (std::vector< std::string > columns)
 Select specific columns from manifest entries.
 
ManifestGroupCaseSensitive (bool case_sensitive)
 Set case sensitivity for column name matching.
 
ManifestGroupColumnsToKeepStats (std::unordered_set< int32_t > column_ids)
 Specify columns that should retain their statistics.
 
Result< std::vector< std::shared_ptr< FileScanTask > > > PlanFiles ()
 Plan scan tasks for all matching data files.
 
Result< std::vector< ManifestEntry > > Entries ()
 Get all matching manifest entries.
 
Result< std::vector< std::shared_ptr< ScanTask > > > Plan (const CreateTasksFunction &create_tasks)
 Plan tasks using a custom task creation function.
 
- Public Member Functions inherited from iceberg::ErrorCollector
 ErrorCollector (ErrorCollector &&)=default
 
ErrorCollectoroperator= (ErrorCollector &&)=default
 
 ErrorCollector (const ErrorCollector &)=default
 
ErrorCollectoroperator= (const ErrorCollector &)=default
 
template<typename... Args>
auto & AddError (this auto &self, ErrorKind kind, const std::format_string< Args... > fmt, Args &&... args)
 Add a specific error and return reference to derived class.
 
auto & AddError (this auto &self, Error err)
 Add an existing error object and return reference to derived class.
 
auto & AddError (this auto &self, std::unexpected< Error > err)
 Add an unexpected result's error and return reference to derived class.
 
bool has_errors () const
 Check if any errors have been collected.
 
size_t error_count () const
 Get the number of errors collected.
 
Status CheckErrors () const
 Check for accumulated errors and return them if any exist.
 
void ClearErrors ()
 Clear all accumulated errors.
 
const std::vector< Error > & errors () const
 Get read-only access to all collected errors.
 

Static Public Member Functions

static Result< std::unique_ptr< ManifestGroup > > Make (std::shared_ptr< FileIO > io, std::shared_ptr< Schema > schema, std::unordered_map< int32_t, std::shared_ptr< PartitionSpec > > specs_by_id_, std::vector< ManifestFile > manifests)
 Construct a ManifestGroup with a list of manifests.
 
static Result< std::unique_ptr< ManifestGroup > > Make (std::shared_ptr< FileIO > io, std::shared_ptr< Schema > schema, std::unordered_map< int32_t, std::shared_ptr< PartitionSpec > > specs_by_id, std::vector< ManifestFile > data_manifests, std::vector< ManifestFile > delete_manifests)
 Construct a ManifestGroup with pre-separated manifests.
 

Additional Inherited Members

- Protected Attributes inherited from iceberg::ErrorCollector
std::vector< Errorerrors_
 

Detailed Description

Coordinates reading manifest files and producing scan tasks.

Member Function Documentation

◆ ColumnsToKeepStats()

ManifestGroup & iceberg::ManifestGroup::ColumnsToKeepStats ( std::unordered_set< int32_t >  column_ids)

Specify columns that should retain their statistics.

Parameters
column_idsField IDs of columns whose statistics should be preserved.

◆ FilterManifestEntries()

ManifestGroup & iceberg::ManifestGroup::FilterManifestEntries ( std::function< bool(const ManifestEntry &)>  predicate)

Set a custom manifest entry filter predicate.

Parameters
predicateA function that returns true if the entry should be included.

◆ Make() [1/2]

Result< std::unique_ptr< ManifestGroup > > iceberg::ManifestGroup::Make ( std::shared_ptr< FileIO io,
std::shared_ptr< Schema schema,
std::unordered_map< int32_t, std::shared_ptr< PartitionSpec > >  specs_by_id,
std::vector< ManifestFile data_manifests,
std::vector< ManifestFile delete_manifests 
)
static

Construct a ManifestGroup with pre-separated manifests.

Parameters
ioFileIO for reading manifest files.
schemaCurrent table schema.
specs_by_idMapping of partition spec ID to PartitionSpec.
data_manifestsList of data manifest files.
delete_manifestsList of delete manifest files.

◆ Make() [2/2]

Result< std::unique_ptr< ManifestGroup > > iceberg::ManifestGroup::Make ( std::shared_ptr< FileIO io,
std::shared_ptr< Schema schema,
std::unordered_map< int32_t, std::shared_ptr< PartitionSpec > >  specs_by_id_,
std::vector< ManifestFile manifests 
)
static

Construct a ManifestGroup with a list of manifests.

Parameters
ioFileIO for reading manifest files.
schemaCurrent table schema.
specs_by_idMapping of partition spec ID to PartitionSpec.
manifestsList of manifest files to process.

◆ Plan()

Result< std::vector< std::shared_ptr< ScanTask > > > iceberg::ManifestGroup::Plan ( const CreateTasksFunction &  create_tasks)

Plan tasks using a custom task creation function.

Parameters
create_tasksA function that creates ScanTasks from entries and context.
Returns
A list of ScanTask objects, or error on failure.

◆ Select()

ManifestGroup & iceberg::ManifestGroup::Select ( std::vector< std::string >  columns)

Select specific columns from manifest entries.

Parameters
columnsColumn names to select from manifest entries.

The documentation for this class was generated from the following files: