API for schema evolution. More...

#include <update_schema.h>

Inheritance diagram for iceberg::UpdateSchema:

Classes
struct	ApplyResult

struct	Move
	Represents a column move operation within a struct (internal use only). More...

Public Member Functions
UpdateSchema &	AllowIncompatibleChanges ()
	Allow incompatible changes to the schema.

UpdateSchema &	AddColumn (std::string_view name, std::shared_ptr< Type > type, std::string_view doc="")
	Add a new optional top-level column with documentation.

UpdateSchema &	AddColumn (std::optional< std::string_view > parent, std::string_view name, std::shared_ptr< Type > type, std::string_view doc="")
	Add a new optional column to a nested struct with documentation.

UpdateSchema &	AddRequiredColumn (std::string_view name, std::shared_ptr< Type > type, std::string_view doc="")
	Add a new required top-level column with documentation.

UpdateSchema &	AddRequiredColumn (std::optional< std::string_view > parent, std::string_view name, std::shared_ptr< Type > type, std::string_view doc="")
	Add a new required column to a nested struct with documentation.

UpdateSchema &	RenameColumn (std::string_view name, std::string_view new_name)
	Rename a column in the schema.

UpdateSchema &	UpdateColumn (std::string_view name, std::shared_ptr< PrimitiveType > new_type)
	Update a column in the schema to a new primitive type.

UpdateSchema &	UpdateColumnDoc (std::string_view name, std::string_view new_doc)
	Update the documentation string for a column.

UpdateSchema &	MakeColumnOptional (std::string_view name)
	Update a column to be optional.

UpdateSchema &	RequireColumn (std::string_view name)
	Update a column to be required.

UpdateSchema &	DeleteColumn (std::string_view name)
	Delete a column in the schema.

UpdateSchema &	MoveFirst (std::string_view name)
	Move a column from its current position to the start of the schema or its parent struct.

UpdateSchema &	MoveBefore (std::string_view name, std::string_view before_name)
	Move a column from its current position to directly before a reference column.

UpdateSchema &	MoveAfter (std::string_view name, std::string_view after_name)
	Move a column from its current position to directly after a reference column.

UpdateSchema &	UnionByNameWith (std::shared_ptr< Schema > new_schema)
	Applies all field additions and updates from the provided new schema to the existing schema to create a union schema.

UpdateSchema &	SetIdentifierFields (const std::span< std::string_view > &names)
	Set the identifier fields given a set of field names.

UpdateSchema &	CaseSensitive (bool case_sensitive)
	Determines if the case of schema needs to be considered when comparing column names.

Kind	kind () const final
	Return the kind of this pending update.

bool	IsRetryable () const override
	Schema updates are not retryable.

Result< ApplyResult >	Apply ()
	Apply the pending changes to the original schema and return the result.

Public Member Functions inherited from iceberg::PendingUpdate
virtual Status	Commit ()
	Apply the pending changes and commit.

virtual Status	Finalize (Result< const TableMetadata * > commit_result)
	Finalize the pending update.

	PendingUpdate (const PendingUpdate &)=delete

PendingUpdate &	operator= (const PendingUpdate &)=delete

	PendingUpdate (PendingUpdate &&) noexcept=default

PendingUpdate &	operator= (PendingUpdate &&) noexcept=default

Public Member Functions inherited from iceberg::ErrorCollector
	ErrorCollector (ErrorCollector &&)=default

ErrorCollector &	operator= (ErrorCollector &&)=default

	ErrorCollector (const ErrorCollector &)=default

ErrorCollector &	operator= (const ErrorCollector &)=default

template<typename... Args>
auto &	AddError (this auto &self, ErrorKind kind, const std::format_string< Args... > fmt, Args &&... args)
	Add a specific error and return reference to derived class.

auto &	AddError (this auto &self, Error err)
	Add an existing error object and return reference to derived class.

auto &	AddError (this auto &self, std::unexpected< Error > err)
	Add an unexpected result's error and return reference to derived class.

bool	has_errors () const
	Check if any errors have been collected.

size_t	error_count () const
	Get the number of errors collected.

Status	CheckErrors () const
	Check for accumulated errors and return them if any exist.

void	ClearErrors ()
	Clear all accumulated errors.

const std::vector< Error > &	errors () const
	Get read-only access to all collected errors.

Static Public Member Functions
static Result< std::shared_ptr< UpdateSchema > >	Make (std::shared_ptr< TransactionContext > ctx)

Additional Inherited Members
Public Types inherited from iceberg::PendingUpdate
enum class	Kind : uint8_t { kExpireSnapshots , kSetSnapshot , kUpdateLocation , kUpdatePartitionSpec , kUpdatePartitionStatistics , kUpdateProperties , kUpdateSchema , kUpdateSnapshot , kUpdateSnapshotReference , kUpdateSortOrder , kUpdateStatistics }

Protected Member Functions inherited from iceberg::PendingUpdate
	PendingUpdate (std::shared_ptr< TransactionContext > ctx)

const TableMetadata &	base () const

Protected Attributes inherited from iceberg::PendingUpdate
std::shared_ptr< TransactionContext >	ctx_

Protected Attributes inherited from iceberg::ErrorCollector
std::vector< Error >	errors_

Detailed Description

API for schema evolution.

When committing, these changes will be applied to the current table metadata. Commit conflicts will not be resolved and will result in a CommitFailed error.

TODO(Guotao Yu): Add support for V3 default values when adding columns. Currently, all added columns use null as the default value, but Iceberg V3 supports custom default values for new columns.

Member Function Documentation

◆ AddColumn() [1/2]

UpdateSchema & iceberg::UpdateSchema::AddColumn	(	std::optional< std::string_view >	parent,
		std::string_view	name,
		std::shared_ptr< Type >	type,
		std::string_view	doc = `""`
	)

Add a new optional column to a nested struct with documentation.

The parent name is used to find the parent using Schema::FindFieldByName(). If the parent name is null or empty, the new column will be added to the root as a top-level column. If parent identifies a struct, a new column is added to that struct. If it identifies a list, the column is added to the list element struct, and if it identifies a map, the new column is added to the map's value struct.

The given name is used to name the new column and names containing "." are not handled differently.

If type is a nested type, its field IDs are reassigned when added to the existing schema.

The added column will be optional with a null default value.

Parameters

parent	Name of the parent struct to which the column will be added.
name	Name for the new column.
type	Type for the new column.
doc	Documentation string for the new column.

Returns: Reference to this for method chaining.

Note: InvalidArgument will be reported if parent doesn't identify a struct.

◆ AddColumn() [2/2]

UpdateSchema & iceberg::UpdateSchema::AddColumn	(	std::string_view	name,
		std::shared_ptr< Type >	type,
		std::string_view	doc = `""`
	)

Add a new optional top-level column with documentation.

Because "." may be interpreted as a column path separator or may be used in field names, it is not allowed in names passed to this method. To add to nested structures or to add fields with names that contain ".", use AddColumn(parent, name, type, doc).

If type is a nested type, its field IDs are reassigned when added to the existing schema.

The added column will be optional with a null default value.

Parameters

name	Name for the new column.
type	Type for the new column.
doc	Documentation string for the new column.

Returns: Reference to this for method chaining.

Note: InvalidArgument will be reported if name contains ".".

◆ AddRequiredColumn() [1/2]

UpdateSchema & iceberg::UpdateSchema::AddRequiredColumn	(	std::optional< std::string_view >	parent,
		std::string_view	name,
		std::shared_ptr< Type >	type,
		std::string_view	doc = `""`
	)

Add a new required column to a nested struct with documentation.

Adding a required column without a default is an incompatible change that can break reading older data. To suppress exceptions thrown when an incompatible change is detected, call AllowIncompatibleChanges().

The parent name is used to find the parent using Schema::FindFieldByName(). If the parent name is null or empty, the new column will be added to the root as a top-level column. If parent identifies a struct, a new column is added to that struct. If it identifies a list, the column is added to the list element struct, and if it identifies a map, the new column is added to the map's value struct.

The given name is used to name the new column and names containing "." are not handled differently.

If type is a nested type, its field IDs are reassigned when added to the existing schema.

Parameters

parent	Name of the parent struct to which the column will be added.
name	Name for the new column.
type	Type for the new column.
doc	Documentation string for the new column.

Returns: Reference to this for method chaining.

Note: InvalidArgument will be reported if parent doesn't identify a struct.

◆ AddRequiredColumn() [2/2]

UpdateSchema & iceberg::UpdateSchema::AddRequiredColumn	(	std::string_view	name,
		std::shared_ptr< Type >	type,
		std::string_view	doc = `""`
	)

Add a new required top-level column with documentation.

Adding a required column without a default is an incompatible change that can break reading older data. To suppress exceptions thrown when an incompatible change is detected, call AllowIncompatibleChanges().

Because "." may be interpreted as a column path separator or may be used in field names, it is not allowed in names passed to this method. To add to nested structures or to add fields with names that contain ".", use AddRequiredColumn(parent, name, type, doc).

If type is a nested type, its field IDs are reassigned when added to the existing schema.

Parameters

name	Name for the new column.
type	Type for the new column.
doc	Documentation string for the new column.

Returns: Reference to this for method chaining.

Note: InvalidArgument will be reported if name contains ".".

◆ AllowIncompatibleChanges()

UpdateSchema & iceberg::UpdateSchema::AllowIncompatibleChanges ( )

Allow incompatible changes to the schema.

Incompatible changes can cause failures when attempting to read older data files. For example, adding a required column and attempting to read data files without that column will cause a failure. However, if there are no data files that are not compatible with the change, it can be allowed.

This option allows incompatible changes to be made to a schema. This should be used when the caller has validated that the change will not break. For example, if a column is added as optional but always populated and data older than the column addition has been deleted from the table, this can be used with RequireColumn() to mark the column required.

Returns: Reference to this for method chaining.

◆ Apply()

Result< UpdateSchema::ApplyResult > iceberg::UpdateSchema::Apply ( )

Apply the pending changes to the original schema and return the result.

This does not result in a permanent update.

Returns: The result Schema and last column id when all pending updates are applied.

◆ CaseSensitive()

UpdateSchema & iceberg::UpdateSchema::CaseSensitive ( bool case_sensitive )

Determines if the case of schema needs to be considered when comparing column names.

Parameters

case_sensitive When false case is not considered in column name comparisons.

Returns: Reference to this for method chaining.

◆ DeleteColumn()

UpdateSchema & iceberg::UpdateSchema::DeleteColumn ( std::string_view name )

Delete a column in the schema.

The name is used to find the column to delete using Schema::FindFieldByName().

Parameters

name	Name of the column to delete.

Returns: Reference to this for method chaining.

Note: InvalidArgument will be reported if name doesn't identify a column in the schema or if this change conflicts with other additions, renames, or updates.

◆ IsRetryable()

bool iceberg::UpdateSchema::IsRetryable ( ) const

inlineoverridevirtual

Schema updates are not retryable.

The update records field IDs, move targets, and last-column-id-derived state from the schema that was current when the builder was created. Replaying after a refresh can apply a different schema evolution than the caller originally authored.

Implements iceberg::PendingUpdate.

◆ kind()

Kind iceberg::UpdateSchema::kind ( ) const

inlinefinalvirtual

Return the kind of this pending update.

Implements iceberg::PendingUpdate.

◆ MakeColumnOptional()

UpdateSchema & iceberg::UpdateSchema::MakeColumnOptional ( std::string_view name )

Update a column to be optional.

Parameters

name	Name of the column to mark optional.

Returns: Reference to this for method chaining.

◆ MoveAfter()

UpdateSchema & iceberg::UpdateSchema::MoveAfter	(	std::string_view	name,
		std::string_view	after_name
	)

Move a column from its current position to directly after a reference column.

The name is used to find the column to move using Schema::FindFieldByName(). If the name identifies a nested column, it can only be moved within the nested struct that contains it.

Parameters

name	Name of the column to move.
after_name	Name of the reference column.

Returns: Reference to this for method chaining.

Note: InvalidArgument will be reported if name doesn't identify a column in the schema or if this change conflicts with other changes.

◆ MoveBefore()

UpdateSchema & iceberg::UpdateSchema::MoveBefore	(	std::string_view	name,
		std::string_view	before_name
	)

Move a column from its current position to directly before a reference column.

The name is used to find the column to move using Schema::FindFieldByName(). If the name identifies a nested column, it can only be moved within the nested struct that contains it.

Parameters

name	Name of the column to move.
before_name	Name of the reference column.

Returns: Reference to this for method chaining.

Note: InvalidArgument will be reported if name doesn't identify a column in the schema or if this change conflicts with other changes.

◆ MoveFirst()

UpdateSchema & iceberg::UpdateSchema::MoveFirst ( std::string_view name )

Move a column from its current position to the start of the schema or its parent struct.

Parameters

name	Name of the column to move.

Returns: Reference to this for method chaining.

Note: InvalidArgument will be reported if name doesn't identify a column in the schema or if this change conflicts with other changes.

◆ RenameColumn()

UpdateSchema & iceberg::UpdateSchema::RenameColumn	(	std::string_view	name,
		std::string_view	new_name
	)

Rename a column in the schema.

The name is used to find the column to rename using Schema::FindFieldByName().

The new name may contain "." and such names are not parsed or handled differently.

Columns may be updated and renamed in the same schema update.

Parameters

name	Name of the column to rename.
new_name	Replacement name for the column.

Returns: Reference to this for method chaining.

Note: InvalidArgument will be reported if name doesn't identify a column in the schema or if this change conflicts with other additions, renames, or updates.

◆ RequireColumn()

UpdateSchema & iceberg::UpdateSchema::RequireColumn ( std::string_view name )

Update a column to be required.

This is an incompatible change that can break reading older data. This method will result in an exception unless AllowIncompatibleChanges() has been called.

Parameters

name	Name of the column to mark required.

Returns: Reference to this for method chaining.

◆ SetIdentifierFields()

UpdateSchema & iceberg::UpdateSchema::SetIdentifierFields ( const std::span< std::string_view > & names )

Set the identifier fields given a set of field names.

Because identifier fields are unique, duplicated names will be ignored. See Schema::identifier_field_ids() to learn more about Iceberg identifier.

Parameters

names Names of the columns to set as identifier fields.

Returns: Reference to this for method chaining.

◆ UnionByNameWith()

UpdateSchema & iceberg::UpdateSchema::UnionByNameWith ( std::shared_ptr< Schema > new_schema )

Applies all field additions and updates from the provided new schema to the existing schema to create a union schema.

For fields with same canonical names in both schemas it is required that the widen types is supported using UpdateColumn(). Differences in type are ignored if the new type is narrower than the existing type (e.g. long to int, double to float).

Only supports turning a previously required field into an optional one if it is marked optional in the provided new schema using MakeColumnOptional().

Only supports updating existing field docs with fields docs from the provided new schema using UpdateColumnDoc().

Parameters

new_schema A schema used in conjunction with the existing schema to create a union schema.

Returns: Reference to this for method chaining.

Note: InvalidState will be reported if it encounters errors during provided schema traversal.; InvalidArgument will be reported if name doesn't identify a column in the schema or if this change introduces a type incompatibility or if it conflicts with other additions, renames, or updates.

◆ UpdateColumn()

UpdateSchema & iceberg::UpdateSchema::UpdateColumn	(	std::string_view	name,
		std::shared_ptr< PrimitiveType >	new_type
	)

Update a column in the schema to a new primitive type.

The name is used to find the column to update using Schema::FindFieldByName().

Only updates that widen types are allowed.

Columns may be updated and renamed in the same schema update.

Parameters

name	Name of the column to update.
new_type	Replacement type for the column (must be primitive).

Returns: Reference to this for method chaining.

Note: InvalidArgument will be reported if name doesn't identify a column in the schema or if this change introduces a type incompatibility or if it conflicts with other additions, renames, or updates.

◆ UpdateColumnDoc()

UpdateSchema & iceberg::UpdateSchema::UpdateColumnDoc	(	std::string_view	name,
		std::string_view	new_doc
	)

Update the documentation string for a column.

The name is used to find the column to update using Schema::FindFieldByName().

Parameters

name	Name of the column to update the documentation string for.
new_doc	Replacement documentation string for the column.

Returns: Reference to this for method chaining.

Note: InvalidArgument will be reported if name doesn't identify a column in the schema or if the column will be deleted.

The documentation for this class was generated from the following files:

iceberg/update/update_schema.h
iceberg/update/update_schema.cc

Classes

Public Member Functions

Static Public Member Functions

Additional Inherited Members

Detailed Description

Member Function Documentation

◆ AddColumn() [1/2]

◆ AddColumn() [2/2]

◆ AddRequiredColumn() [1/2]

◆ AddRequiredColumn() [2/2]

◆ AllowIncompatibleChanges()

◆ Apply()

◆ CaseSensitive()

◆ DeleteColumn()

◆ IsRetryable()

◆ kind()

◆ MakeColumnOptional()

◆ MoveAfter()

◆ MoveBefore()

◆ MoveFirst()

◆ RenameColumn()

◆ RequireColumn()

◆ SetIdentifierFields()

◆ UnionByNameWith()

◆ UpdateColumn()

◆ UpdateColumnDoc()