Class BaseRowDelta
- All Implemented Interfaces:
PendingUpdate<Snapshot>
,RowDelta
,SnapshotUpdate<RowDelta>
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected void
Add a data file to the new snapshot.protected void
add
(DeleteFile file) Add a delete file to the new snapshot.protected void
add
(DeleteFile file, long dataSequenceNumber) Add a delete file to the new snapshot.protected void
add
(ManifestFile manifest) Add all files in a manifest to the new snapshot.addDeletes
(DeleteFile deletes) Add aDeleteFile
to the table.protected org.apache.iceberg.DeleteFileIndex
addedDeleteFiles
(TableMetadata base, Long startingSnapshotId, Expression dataFilter, PartitionSet partitionSet, Snapshot parent) Returns matching delete files have been added to the table since a starting snapshot.Add aDataFile
to the table.protected boolean
protected boolean
apply()
Apply the pending changes and return the uncommitted changes for validation.apply
(TableMetadata base, Snapshot snapshot) Apply the update's changes to the given metadata and snapshot.protected boolean
caseSensitive
(boolean isCaseSensitive) protected void
cleanAll()
protected void
cleanUncommitted
(Set<ManifestFile> committed) Clean up any uncommitted manifests that were created.protected boolean
void
commit()
Apply the pending changes and commit.protected CommitMetrics
conflictDetectionFilter
(Expression newConflictDetectionFilter) Sets a conflict detection filter used to validate concurrently added data and delete files.protected TableMetadata
current()
protected PartitionSpec
dataSpec()
protected void
delete
(CharSequence path) Add a specific data path to be deleted in the new snapshot.protected void
Add a specific data file to be deleted in the new snapshot.protected void
delete
(DeleteFile file) Add a specific delete file to be deleted in the new snapshot.protected void
deleteByRowFilter
(Expression expr) Add a filter to match files to delete.protected void
deleteFile
(String path) protected boolean
protected boolean
deleteWith
(Consumer<String> deleteCallback) Set a callback to delete files instead of the table's default.protected void
dropPartition
(int specId, StructLike partition) Add a partition tuple to drop from the table during the delete phase.protected void
protected void
protected boolean
protected OutputFile
protected ManifestReader
<DeleteFile> newDeleteManifestReader
(ManifestFile manifest) protected ManifestWriter
<DeleteFile> protected EncryptedOutputFile
protected ManifestReader
<DataFile> newManifestReader
(ManifestFile manifest) protected ManifestWriter
<DataFile> protected RollingManifestWriter
<DeleteFile> protected RollingManifestWriter
<DataFile> protected String
A string that describes the action that produced the new snapshot.protected TableOperations
ops()
protected TableMetadata
refresh()
removeDeletes
(DeleteFile deletes) Removes a rewrittenDeleteFile
from the table.removeRows
(DataFile file) Remove aDataFile
from the table.protected RowDelta
reportWith
(MetricsReporter newReporter) protected Expression
scanManifestsWith
(ExecutorService executorService) Use a particular executor to scan manifests.protected BaseRowDelta
self()
Set a summary property in the snapshot produced by this update.protected void
setNewDataFilesDataSequenceNumber
(long sequenceNumber) protected long
Called to stage a snapshot in table metadata, but not update the current snapshot id.summary()
protected String
protected void
targetBranch
(String branch) A setter for the target branch on which snapshot producer operation should be performedPerform operations on a particular branchGenerates update event to notify about metadata changesprotected void
validate
(TableMetadata base, Snapshot parent) Validate the current metadata.protected void
validateAddedDataFiles
(TableMetadata base, Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent) Validates that no files matching a filter have been added to the table since a starting snapshot.protected void
validateAddedDataFiles
(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no files matching given partitions have been added to the table since a starting snapshot.protected void
validateAddedDVs
(TableMetadata base, Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent) validateDataFilesExist
(Iterable<? extends CharSequence> referencedFiles) Add data file paths that must not be removed by conflicting commits for this RowDelta to succeed.protected void
validateDataFilesExist
(TableMetadata base, Long startingSnapshotId, CharSequenceSet requiredDataFiles, boolean skipDeletes, Expression conflictDetectionFilter, Snapshot parent) protected void
validateDeletedDataFiles
(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Snapshot parent) Validates that no files matching a filter have been deleted from the table since a starting snapshot.protected void
validateDeletedDataFiles
(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no files matching a filter have been deleted from the table since a starting snapshot.Enable validation that referenced data files passed toRowDelta.validateDataFilesExist(Iterable)
have not been removed by a delete operation.validateFromSnapshot
(long snapshotId) Set the snapshot ID used in any reads for this operation.protected void
Enables validation that data files added concurrently do not conflict with this commit's operation.Enables validation that delete files added concurrently do not conflict with this commit's operation.protected void
validateNoNewDeleteFiles
(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Snapshot parent) Validates that no delete files matching a filter have been added to the table since a starting snapshot.protected void
validateNoNewDeleteFiles
(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no delete files matching a partition set have been added to the table since a starting snapshot.protected void
validateNoNewDeletesForDataFiles
(TableMetadata base, Long startingSnapshotId, Iterable<DataFile> dataFiles, Snapshot parent) Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.protected void
validateNoNewDeletesForDataFiles
(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Iterable<DataFile> dataFiles, Snapshot parent) Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.protected ExecutorService
protected List
<ManifestFile> writeDataManifests
(Collection<DataFile> files, Long dataSeq, PartitionSpec spec) protected List
<ManifestFile> writeDataManifests
(Collection<DataFile> files, PartitionSpec spec) protected List
<ManifestFile> writeDeleteManifests
(Collection<DeleteFile> files, PartitionSpec spec) Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.iceberg.PendingUpdate
apply, commit, updateEvent
Methods inherited from interface org.apache.iceberg.RowDelta
caseSensitive
Methods inherited from interface org.apache.iceberg.SnapshotUpdate
deleteWith, scanManifestsWith, set, stageOnly
-
Constructor Details
-
BaseRowDelta
-
-
Method Details
-
self
-
operation
A string that describes the action that produced the new snapshot.- Returns:
- a string operation
-
addRows
Description copied from interface:RowDelta
Add aDataFile
to the table. -
addDeletes
Description copied from interface:RowDelta
Add aDeleteFile
to the table.- Specified by:
addDeletes
in interfaceRowDelta
- Parameters:
deletes
- a delete file of rows to delete- Returns:
- this for method chaining
-
removeRows
Description copied from interface:RowDelta
Remove aDataFile
from the table.- Specified by:
removeRows
in interfaceRowDelta
- Parameters:
file
- a data file- Returns:
- this for method chaining
-
removeDeletes
Description copied from interface:RowDelta
Removes a rewrittenDeleteFile
from the table.- Specified by:
removeDeletes
in interfaceRowDelta
- Parameters:
deletes
- a delete file that can be removed from the table- Returns:
- this for method chaining
-
validateFromSnapshot
Description copied from interface:RowDelta
Set the snapshot ID used in any reads for this operation.Validations will check changes after this snapshot ID. If the from snapshot is not set, all ancestor snapshots through the table's initial snapshot are validated.
- Specified by:
validateFromSnapshot
in interfaceRowDelta
- Parameters:
snapshotId
- a snapshot ID- Returns:
- this for method chaining
-
validateDeletedFiles
Description copied from interface:RowDelta
Enable validation that referenced data files passed toRowDelta.validateDataFilesExist(Iterable)
have not been removed by a delete operation.If a data file has a row deleted using a position delete file, rewriting or overwriting the data file concurrently would un-delete the row. Deleting the data file is normally allowed, but a delete may be part of a transaction that reads and re-appends a row. This method is used to validate deletes for the transaction case.
- Specified by:
validateDeletedFiles
in interfaceRowDelta
- Returns:
- this for method chaining
-
validateDataFilesExist
Description copied from interface:RowDelta
Add data file paths that must not be removed by conflicting commits for this RowDelta to succeed.If any path has been removed by a conflicting commit in the table since the snapshot passed to
RowDelta.validateFromSnapshot(long)
, the operation will fail with aValidationException
.By default, this validation checks only rewrite and overwrite commits. To apply validation to delete commits, call
RowDelta.validateDeletedFiles()
.- Specified by:
validateDataFilesExist
in interfaceRowDelta
- Parameters:
referencedFiles
- file paths that are referenced by a position delete file- Returns:
- this for method chaining
-
conflictDetectionFilter
Description copied from interface:RowDelta
Sets a conflict detection filter used to validate concurrently added data and delete files.If not called, a true literal will be used as the conflict detection filter.
- Specified by:
conflictDetectionFilter
in interfaceRowDelta
- Parameters:
newConflictDetectionFilter
- an expression on rows in the table- Returns:
- this for method chaining
-
validateNoConflictingDataFiles
Description copied from interface:RowDelta
Enables validation that data files added concurrently do not conflict with this commit's operation.This method should be called when the table is queried to determine which files to delete/append. If a concurrent operation commits a new file after the data was read and that file might contain rows matching the specified conflict detection filter, this operation will detect this during retries and fail.
Calling this method is required to maintain serializable isolation for update/delete operations. Otherwise, the isolation level will be snapshot isolation.
Validation uses the conflict detection filter passed to
RowDelta.conflictDetectionFilter(Expression)
and applies to operations that happened after the snapshot passed toRowDelta.validateFromSnapshot(long)
.- Specified by:
validateNoConflictingDataFiles
in interfaceRowDelta
- Returns:
- this for method chaining
-
validateNoConflictingDeleteFiles
Description copied from interface:RowDelta
Enables validation that delete files added concurrently do not conflict with this commit's operation.This method must be called when the table is queried to produce a row delta for UPDATE and MERGE operations independently of the isolation level. Calling this method isn't required for DELETE operations as it is OK to delete a record that is also deleted concurrently.
Validation uses the conflict detection filter passed to
RowDelta.conflictDetectionFilter(Expression)
and applies to operations that happened after the snapshot passed toRowDelta.validateFromSnapshot(long)
.- Specified by:
validateNoConflictingDeleteFiles
in interfaceRowDelta
- Returns:
- this for method chaining
-
toBranch
Description copied from interface:SnapshotUpdate
Perform operations on a particular branch- Specified by:
toBranch
in interfaceSnapshotUpdate<RowDelta>
- Parameters:
branch
- which is name of SnapshotRef of type branch.
-
validate
Validate the current metadata.Child operations can override this to add custom validation.
- Parameters:
base
- current table metadata to validateparent
- ending snapshot on the lineage which is being validated
-
set
Description copied from interface:SnapshotUpdate
Set a summary property in the snapshot produced by this update.- Parameters:
property
- a String property namevalue
- a String property value- Returns:
- this for method chaining
-
caseSensitive
-
isCaseSensitive
protected boolean isCaseSensitive() -
dataSpec
-
rowFilter
-
addedDataFiles
-
failAnyDelete
protected void failAnyDelete() -
failMissingDeletePaths
protected void failMissingDeletePaths() -
deleteByRowFilter
Add a filter to match files to delete. A file will be deleted if all of the rows it contains match this or any other filter passed to this method.- Parameters:
expr
- an expression to match rows.
-
dropPartition
Add a partition tuple to drop from the table during the delete phase. -
delete
Add a specific data file to be deleted in the new snapshot. -
delete
Add a specific delete file to be deleted in the new snapshot. -
delete
Add a specific data path to be deleted in the new snapshot. -
deletesDataFiles
protected boolean deletesDataFiles() -
deletesDeleteFiles
protected boolean deletesDeleteFiles() -
addsDataFiles
protected boolean addsDataFiles() -
addsDeleteFiles
protected boolean addsDeleteFiles() -
add
Add a data file to the new snapshot. -
add
Add a delete file to the new snapshot. -
add
Add a delete file to the new snapshot. -
validateNewDeleteFile
-
add
Add all files in a manifest to the new snapshot. -
validateAddedDataFiles
protected void validateAddedDataFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no files matching given partitions have been added to the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationpartitionSet
- a set of partitions to filter new conflicting data filesparent
- ending snapshot on the lineage being validated
-
validateAddedDataFiles
protected void validateAddedDataFiles(TableMetadata base, Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent) Validates that no files matching a filter have been added to the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationconflictDetectionFilter
- an expression used to find new conflicting data files
-
validateNoNewDeletesForDataFiles
protected void validateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, Iterable<DataFile> dataFiles, Snapshot parent) Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationdataFiles
- data files to validate have no new row deletesparent
- ending snapshot on the branch being validated
-
validateNoNewDeletesForDataFiles
protected void validateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Iterable<DataFile> dataFiles, Snapshot parent) Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationdataFilter
- a data filterdataFiles
- data files to validate have no new row deletesparent
- ending snapshot on the branch being validated
-
validateNoNewDeleteFiles
protected void validateNoNewDeleteFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Snapshot parent) Validates that no delete files matching a filter have been added to the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationdataFilter
- an expression used to find new conflicting delete filesparent
- ending snapshot on the branch being validated
-
validateNoNewDeleteFiles
protected void validateNoNewDeleteFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no delete files matching a partition set have been added to the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationpartitionSet
- a partition set used to find new conflicting delete filesparent
- ending snapshot on the branch being validated
-
addedDeleteFiles
protected org.apache.iceberg.DeleteFileIndex addedDeleteFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, PartitionSet partitionSet, Snapshot parent) Returns matching delete files have been added to the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationdataFilter
- an expression used to find delete filespartitionSet
- a partition set used to find delete filesparent
- parent snapshot of the branch
-
validateDeletedDataFiles
protected void validateDeletedDataFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Snapshot parent) Validates that no files matching a filter have been deleted from the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationdataFilter
- an expression used to find deleted data filesparent
- ending snapshot on the branch being validated
-
validateDeletedDataFiles
protected void validateDeletedDataFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no files matching a filter have been deleted from the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationpartitionSet
- a partition set used to find deleted data filesparent
- ending snapshot on the branch being validated
-
setNewDataFilesDataSequenceNumber
protected void setNewDataFilesDataSequenceNumber(long sequenceNumber) -
validateDataFilesExist
protected void validateDataFilesExist(TableMetadata base, Long startingSnapshotId, CharSequenceSet requiredDataFiles, boolean skipDeletes, Expression conflictDetectionFilter, Snapshot parent) -
validateAddedDVs
protected void validateAddedDVs(TableMetadata base, Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent) -
summary
-
apply
Apply the update's changes to the given metadata and snapshot. Return the new manifest list.- Parameters:
base
- the base table metadata to apply changes tosnapshot
- snapshot to apply the changes to- Returns:
- a manifest list for the new snapshot.
-
updateEvent
Description copied from interface:PendingUpdate
Generates update event to notify about metadata changes- Returns:
- the generated event
-
cleanUncommitted
Clean up any uncommitted manifests that were created.Manifests may not be committed if apply is called more because a commit conflict has occurred. Implementations may keep around manifests because the same changes will be made by both apply calls. This method instructs the implementation to clean up those manifests and passes the paths of the manifests that were actually committed.
- Parameters:
committed
- a set of manifest paths that were actually committed
-
stageOnly
Description copied from interface:SnapshotUpdate
Called to stage a snapshot in table metadata, but not update the current snapshot id.- Specified by:
stageOnly
in interfaceSnapshotUpdate<ThisT>
- Returns:
- this for method chaining
-
scanManifestsWith
Description copied from interface:SnapshotUpdate
Use a particular executor to scan manifests. The default worker pool will be used by default.- Specified by:
scanManifestsWith
in interfaceSnapshotUpdate<ThisT>
- Parameters:
executorService
- the provided executor- Returns:
- this for method chaining
-
ops
-
commitMetrics
-
reportWith
-
targetBranch
A setter for the target branch on which snapshot producer operation should be performed- Parameters:
branch
- to set as target branch
-
targetBranch
-
workerPool
-
deleteWith
Description copied from interface:SnapshotUpdate
Set a callback to delete files instead of the table's default.- Specified by:
deleteWith
in interfaceSnapshotUpdate<ThisT>
- Parameters:
deleteCallback
- a String consumer used to delete locations.- Returns:
- this for method chaining
-
apply
Description copied from interface:PendingUpdate
Apply the pending changes and return the uncommitted changes for validation.This does not result in a permanent update.
- Specified by:
apply
in interfacePendingUpdate<ThisT>
- Returns:
- the uncommitted changes that would be committed by calling
PendingUpdate.commit()
-
current
-
refresh
-
commit
public void commit()Description copied from interface:PendingUpdate
Apply the pending changes and commit.Changes are committed by calling the underlying table's commit method.
Once the commit is successful, the updated table will be refreshed.
- Specified by:
commit
in interfacePendingUpdate<ThisT>
-
cleanAll
protected void cleanAll() -
deleteFile
-
manifestListPath
-
newManifestOutputFile
-
newManifestWriter
-
newDeleteManifestWriter
-
newRollingManifestWriter
-
newRollingDeleteManifestWriter
-
newManifestReader
-
newDeleteManifestReader
-
snapshotId
protected long snapshotId() -
canInheritSnapshotId
protected boolean canInheritSnapshotId() -
cleanupAfterCommit
protected boolean cleanupAfterCommit() -
writeDataManifests
-
writeDataManifests
protected List<ManifestFile> writeDataManifests(Collection<DataFile> files, Long dataSeq, PartitionSpec spec) -
writeDeleteManifests
-