Class StreamingDelete
- All Implemented Interfaces:
DeleteFiles,PendingUpdate<Snapshot>,SnapshotUpdate<DeleteFiles>
Delete implementation that avoids loading full manifests in memory.
This implementation will attempt to commit 5 times before throwing CommitFailedException.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidAdd a data file to the new snapshot.protected voidadd(DeleteFile file) Add a delete file to the new snapshot.protected voidadd(DeleteFile file, long dataSequenceNumber) Add a delete file to the new snapshot.protected voidadd(ManifestFile manifest) Add all files in a manifest to the new snapshot.protected org.apache.iceberg.DeleteFileIndexaddedDeleteFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, PartitionSet partitionSet, Snapshot parent) Returns matching delete files have been added to the table since a starting snapshot.protected booleanprotected booleanapply()Apply the pending changes and return the uncommitted changes for validation.apply(TableMetadata base, Snapshot snapshot) Apply the update's changes to the given metadata and snapshot.protected booleancaseSensitive(boolean isCaseSensitive) protected voidcleanAll()protected voidcleanUncommitted(Set<ManifestFile> committed) Clean up any uncommitted manifests that were created.voidcommit()Apply the pending changes and commit.protected CommitMetricsprotected TableMetadatacurrent()protected PartitionSpecdataSpec()protected voiddelete(CharSequence path) Add a specific data path to be deleted in the new snapshot.protected voidAdd a specific data file to be deleted in the new snapshot.protected voiddelete(DeleteFile file) Add a specific delete file to be deleted in the new snapshot.protected voiddeleteByRowFilter(Expression expr) Add a filter to match files to delete.deleteFile(CharSequence path) Delete a file path from the underlying table.protected voiddeleteFile(String path) deleteFile(DataFile file) Delete a file tracked by aDataFilefrom the underlying table.Delete files that match anExpressionon data rows from the table.protected booleanprotected booleandeleteWith(Consumer<String> deleteCallback) Set a callback to delete files instead of the table's default.protected voiddropPartition(int specId, StructLike partition) Add a partition tuple to drop from the table during the delete phase.protected voidprotected voidprotected booleanprotected OutputFileprotected ManifestReader<DeleteFile>newDeleteManifestReader(ManifestFile manifest) protected ManifestWriter<DeleteFile>protected OutputFileDeprecated.protected EncryptedOutputFileprotected ManifestReader<DataFile>newManifestReader(ManifestFile manifest) protected ManifestWriter<DataFile>protected RollingManifestWriter<DeleteFile>protected RollingManifestWriter<DataFile>protected StringA string that describes the action that produced the new snapshot.protected TableMetadatarefresh()protected DeleteFilesreportWith(MetricsReporter newReporter) protected ExpressionscanManifestsWith(ExecutorService executorService) Use a particular executor to scan manifests.protected DeleteFilesself()Set a summary property in the snapshot produced by this update.protected voidsetNewDataFilesDataSequenceNumber(long sequenceNumber) protected longCalled to stage a snapshot in table metadata, but not update the current snapshot id.summary()protected Stringprotected voidtargetBranch(String branch) A setter for the target branch on which snapshot producer operation should be performedPerform operations on a particular branchGenerates update event to notify about metadata changesprotected voidvalidate(TableMetadata base, Snapshot parent) Validate the current metadata.protected voidvalidateAddedDataFiles(TableMetadata base, Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent) Validates that no files matching a filter have been added to the table since a starting snapshot.protected voidvalidateAddedDataFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no files matching given partitions have been added to the table since a starting snapshot.protected voidvalidateDataFilesExist(TableMetadata base, Long startingSnapshotId, CharSequenceSet requiredDataFiles, boolean skipDeletes, Expression conflictDetectionFilter, Snapshot parent) protected voidvalidateDeletedDataFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Snapshot parent) Validates that no files matching a filter have been deleted from the table since a starting snapshot.protected voidvalidateDeletedDataFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no files matching a filter have been deleted from the table since a starting snapshot.Enables validation that any files that are part of the deletion still exist when committing the operation.protected voidvalidateNoNewDeleteFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Snapshot parent) Validates that no delete files matching a filter have been added to the table since a starting snapshot.protected voidvalidateNoNewDeleteFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no delete files matching a partition set have been added to the table since a starting snapshot.protected voidvalidateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, Iterable<DataFile> dataFiles, Snapshot parent) Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.protected voidvalidateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Iterable<DataFile> dataFiles, Snapshot parent) Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.protected ExecutorServiceMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.iceberg.DeleteFiles
caseSensitiveMethods inherited from interface org.apache.iceberg.PendingUpdate
apply, commit, updateEventMethods inherited from interface org.apache.iceberg.SnapshotUpdate
deleteWith, scanManifestsWith, set, stageOnly
-
Constructor Details
-
StreamingDelete
-
-
Method Details
-
self
-
operation
A string that describes the action that produced the new snapshot.- Returns:
- a string operation
-
deleteFile
Description copied from interface:DeleteFilesDelete a file path from the underlying table.To remove a file from the table, this path must equal a path in the table's metadata. Paths that are different but equivalent will not be removed. For example, file:/path/file.avro is equivalent to file:///path/file.avro, but would not remove the latter path from the table.
- Specified by:
deleteFilein interfaceDeleteFiles- Parameters:
path- a fully-qualified file path to remove from the table- Returns:
- this for method chaining
-
deleteFile
Description copied from interface:DeleteFilesDelete a file tracked by aDataFilefrom the underlying table.- Specified by:
deleteFilein interfaceDeleteFiles- Parameters:
file- a DataFile to remove from the table- Returns:
- this for method chaining
-
deleteFromRowFilter
Description copied from interface:DeleteFilesDelete files that match anExpressionon data rows from the table.A file is selected to be deleted by the expression if it could contain any rows that match the expression (candidate files are selected using an
inclusive projection). These candidate files are deleted if all of the rows in the file must match the expression (the partition data matches the expression'sProjections.strict(PartitionSpec)strict projection}). This guarantees that files are deleted if and only if all rows in the file must match the expression.Files that may contain some rows that match the expression and some rows that do not will result in a
ValidationException.- Specified by:
deleteFromRowFilterin interfaceDeleteFiles- Parameters:
expr- an expression on rows in the table- Returns:
- this for method chaining
-
validateFilesExist
Description copied from interface:DeleteFilesEnables validation that any files that are part of the deletion still exist when committing the operation.- Specified by:
validateFilesExistin interfaceDeleteFiles- Returns:
- this for method chaining
-
toBranch
Description copied from interface:SnapshotUpdatePerform operations on a particular branch- Specified by:
toBranchin interfaceSnapshotUpdate<DeleteFiles>- Parameters:
branch- which is name of SnapshotRef of type branch.
-
validate
Validate the current metadata.Child operations can override this to add custom validation.
- Parameters:
base- current table metadata to validateparent- ending snapshot on the lineage which is being validated
-
set
Description copied from interface:SnapshotUpdateSet a summary property in the snapshot produced by this update.- Parameters:
property- a String property namevalue- a String property value- Returns:
- this for method chaining
-
caseSensitive
-
isCaseSensitive
protected boolean isCaseSensitive() -
dataSpec
-
rowFilter
-
addedDataFiles
-
failAnyDelete
protected void failAnyDelete() -
failMissingDeletePaths
protected void failMissingDeletePaths() -
deleteByRowFilter
Add a filter to match files to delete. A file will be deleted if all of the rows it contains match this or any other filter passed to this method.- Parameters:
expr- an expression to match rows.
-
dropPartition
Add a partition tuple to drop from the table during the delete phase. -
delete
Add a specific data file to be deleted in the new snapshot. -
delete
Add a specific delete file to be deleted in the new snapshot. -
delete
Add a specific data path to be deleted in the new snapshot. -
deletesDataFiles
protected boolean deletesDataFiles() -
deletesDeleteFiles
protected boolean deletesDeleteFiles() -
addsDataFiles
protected boolean addsDataFiles() -
addsDeleteFiles
protected boolean addsDeleteFiles() -
add
Add a data file to the new snapshot. -
add
Add a delete file to the new snapshot. -
add
Add a delete file to the new snapshot. -
add
Add all files in a manifest to the new snapshot. -
validateAddedDataFiles
protected void validateAddedDataFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no files matching given partitions have been added to the table since a starting snapshot.- Parameters:
base- table metadata to validatestartingSnapshotId- id of the snapshot current at the start of the operationpartitionSet- a set of partitions to filter new conflicting data filesparent- ending snapshot on the lineage being validated
-
validateAddedDataFiles
protected void validateAddedDataFiles(TableMetadata base, Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent) Validates that no files matching a filter have been added to the table since a starting snapshot.- Parameters:
base- table metadata to validatestartingSnapshotId- id of the snapshot current at the start of the operationconflictDetectionFilter- an expression used to find new conflicting data files
-
validateNoNewDeletesForDataFiles
protected void validateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, Iterable<DataFile> dataFiles, Snapshot parent) Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.- Parameters:
base- table metadata to validatestartingSnapshotId- id of the snapshot current at the start of the operationdataFiles- data files to validate have no new row deletesparent- ending snapshot on the branch being validated
-
validateNoNewDeletesForDataFiles
protected void validateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Iterable<DataFile> dataFiles, Snapshot parent) Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.- Parameters:
base- table metadata to validatestartingSnapshotId- id of the snapshot current at the start of the operationdataFilter- a data filterdataFiles- data files to validate have no new row deletesparent- ending snapshot on the branch being validated
-
validateNoNewDeleteFiles
protected void validateNoNewDeleteFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Snapshot parent) Validates that no delete files matching a filter have been added to the table since a starting snapshot.- Parameters:
base- table metadata to validatestartingSnapshotId- id of the snapshot current at the start of the operationdataFilter- an expression used to find new conflicting delete filesparent- ending snapshot on the branch being validated
-
validateNoNewDeleteFiles
protected void validateNoNewDeleteFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no delete files matching a partition set have been added to the table since a starting snapshot.- Parameters:
base- table metadata to validatestartingSnapshotId- id of the snapshot current at the start of the operationpartitionSet- a partition set used to find new conflicting delete filesparent- ending snapshot on the branch being validated
-
addedDeleteFiles
protected org.apache.iceberg.DeleteFileIndex addedDeleteFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, PartitionSet partitionSet, Snapshot parent) Returns matching delete files have been added to the table since a starting snapshot.- Parameters:
base- table metadata to validatestartingSnapshotId- id of the snapshot current at the start of the operationdataFilter- an expression used to find delete filespartitionSet- a partition set used to find delete filesparent- parent snapshot of the branch
-
validateDeletedDataFiles
protected void validateDeletedDataFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Snapshot parent) Validates that no files matching a filter have been deleted from the table since a starting snapshot.- Parameters:
base- table metadata to validatestartingSnapshotId- id of the snapshot current at the start of the operationdataFilter- an expression used to find deleted data filesparent- ending snapshot on the branch being validated
-
validateDeletedDataFiles
protected void validateDeletedDataFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no files matching a filter have been deleted from the table since a starting snapshot.- Parameters:
base- table metadata to validatestartingSnapshotId- id of the snapshot current at the start of the operationpartitionSet- a partition set used to find deleted data filesparent- ending snapshot on the branch being validated
-
setNewDataFilesDataSequenceNumber
protected void setNewDataFilesDataSequenceNumber(long sequenceNumber) -
validateDataFilesExist
protected void validateDataFilesExist(TableMetadata base, Long startingSnapshotId, CharSequenceSet requiredDataFiles, boolean skipDeletes, Expression conflictDetectionFilter, Snapshot parent) -
summary
-
apply
Apply the update's changes to the given metadata and snapshot. Return the new manifest list.- Parameters:
base- the base table metadata to apply changes tosnapshot- snapshot to apply the changes to- Returns:
- a manifest list for the new snapshot.
-
updateEvent
Description copied from interface:PendingUpdateGenerates update event to notify about metadata changes- Returns:
- the generated event
-
cleanUncommitted
Clean up any uncommitted manifests that were created.Manifests may not be committed if apply is called more because a commit conflict has occurred. Implementations may keep around manifests because the same changes will be made by both apply calls. This method instructs the implementation to clean up those manifests and passes the paths of the manifests that were actually committed.
- Parameters:
committed- a set of manifest paths that were actually committed
-
stageOnly
Description copied from interface:SnapshotUpdateCalled to stage a snapshot in table metadata, but not update the current snapshot id.- Specified by:
stageOnlyin interfaceSnapshotUpdate<ThisT>- Returns:
- this for method chaining
-
scanManifestsWith
Description copied from interface:SnapshotUpdateUse a particular executor to scan manifests. The default worker pool will be used by default.- Specified by:
scanManifestsWithin interfaceSnapshotUpdate<ThisT>- Parameters:
executorService- the provided executor- Returns:
- this for method chaining
-
commitMetrics
-
reportWith
-
targetBranch
A setter for the target branch on which snapshot producer operation should be performed- Parameters:
branch- to set as target branch
-
targetBranch
-
workerPool
-
deleteWith
Description copied from interface:SnapshotUpdateSet a callback to delete files instead of the table's default.- Specified by:
deleteWithin interfaceSnapshotUpdate<ThisT>- Parameters:
deleteCallback- a String consumer used to delete locations.- Returns:
- this for method chaining
-
apply
Description copied from interface:PendingUpdateApply the pending changes and return the uncommitted changes for validation.This does not result in a permanent update.
- Specified by:
applyin interfacePendingUpdate<ThisT>- Returns:
- the uncommitted changes that would be committed by calling
PendingUpdate.commit()
-
current
-
refresh
-
commit
public void commit()Description copied from interface:PendingUpdateApply the pending changes and commit.Changes are committed by calling the underlying table's commit method.
Once the commit is successful, the updated table will be refreshed.
- Specified by:
commitin interfacePendingUpdate<ThisT>
-
cleanAll
protected void cleanAll() -
deleteFile
-
manifestListPath
-
newManifestOutput
Deprecated.will be removed in 1.7.0; UseSnapshotProducer.newManifestOutputFile()instead -
newManifestOutputFile
-
newManifestWriter
-
newDeleteManifestWriter
-
newRollingManifestWriter
-
newRollingDeleteManifestWriter
-
newManifestReader
-
newDeleteManifestReader
-
snapshotId
protected long snapshotId() -
canInheritSnapshotId
protected boolean canInheritSnapshotId()
-
SnapshotProducer.newManifestOutputFile()instead