Class BaseRowDelta

java.lang.Object
org.apache.iceberg.BaseRowDelta
All Implemented Interfaces:
PendingUpdate<Snapshot>, RowDelta, SnapshotUpdate<RowDelta>

public class BaseRowDelta extends Object implements RowDelta
  • Constructor Details

  • Method Details

    • self

      protected BaseRowDelta self()
    • operation

      protected String operation()
      A string that describes the action that produced the new snapshot.
      Returns:
      a string operation
    • addRows

      public RowDelta addRows(DataFile inserts)
      Description copied from interface: RowDelta
      Add a DataFile to the table.
      Specified by:
      addRows in interface RowDelta
      Parameters:
      inserts - a data file of rows to insert
      Returns:
      this for method chaining
    • addDeletes

      public RowDelta addDeletes(DeleteFile deletes)
      Description copied from interface: RowDelta
      Add a DeleteFile to the table.
      Specified by:
      addDeletes in interface RowDelta
      Parameters:
      deletes - a delete file of rows to delete
      Returns:
      this for method chaining
    • removeRows

      public RowDelta removeRows(DataFile file)
      Description copied from interface: RowDelta
      Remove a DataFile from the table.
      Specified by:
      removeRows in interface RowDelta
      Parameters:
      file - a data file
      Returns:
      this for method chaining
    • removeDeletes

      public RowDelta removeDeletes(DeleteFile deletes)
      Description copied from interface: RowDelta
      Removes a rewritten DeleteFile from the table.
      Specified by:
      removeDeletes in interface RowDelta
      Parameters:
      deletes - a delete file that can be removed from the table
      Returns:
      this for method chaining
    • validateFromSnapshot

      public RowDelta validateFromSnapshot(long snapshotId)
      Description copied from interface: RowDelta
      Set the snapshot ID used in any reads for this operation.

      Validations will check changes after this snapshot ID. If the from snapshot is not set, all ancestor snapshots through the table's initial snapshot are validated.

      Specified by:
      validateFromSnapshot in interface RowDelta
      Parameters:
      snapshotId - a snapshot ID
      Returns:
      this for method chaining
    • validateDeletedFiles

      public RowDelta validateDeletedFiles()
      Description copied from interface: RowDelta
      Enable validation that referenced data files passed to RowDelta.validateDataFilesExist(Iterable) have not been removed by a delete operation.

      If a data file has a row deleted using a position delete file, rewriting or overwriting the data file concurrently would un-delete the row. Deleting the data file is normally allowed, but a delete may be part of a transaction that reads and re-appends a row. This method is used to validate deletes for the transaction case.

      Specified by:
      validateDeletedFiles in interface RowDelta
      Returns:
      this for method chaining
    • validateDataFilesExist

      public RowDelta validateDataFilesExist(Iterable<? extends CharSequence> referencedFiles)
      Description copied from interface: RowDelta
      Add data file paths that must not be removed by conflicting commits for this RowDelta to succeed.

      If any path has been removed by a conflicting commit in the table since the snapshot passed to RowDelta.validateFromSnapshot(long), the operation will fail with a ValidationException.

      By default, this validation checks only rewrite and overwrite commits. To apply validation to delete commits, call RowDelta.validateDeletedFiles().

      Specified by:
      validateDataFilesExist in interface RowDelta
      Parameters:
      referencedFiles - file paths that are referenced by a position delete file
      Returns:
      this for method chaining
    • conflictDetectionFilter

      public RowDelta conflictDetectionFilter(Expression newConflictDetectionFilter)
      Description copied from interface: RowDelta
      Sets a conflict detection filter used to validate concurrently added data and delete files.

      If not called, a true literal will be used as the conflict detection filter.

      Specified by:
      conflictDetectionFilter in interface RowDelta
      Parameters:
      newConflictDetectionFilter - an expression on rows in the table
      Returns:
      this for method chaining
    • validateNoConflictingDataFiles

      public RowDelta validateNoConflictingDataFiles()
      Description copied from interface: RowDelta
      Enables validation that data files added concurrently do not conflict with this commit's operation.

      This method should be called when the table is queried to determine which files to delete/append. If a concurrent operation commits a new file after the data was read and that file might contain rows matching the specified conflict detection filter, this operation will detect this during retries and fail.

      Calling this method is required to maintain serializable isolation for update/delete operations. Otherwise, the isolation level will be snapshot isolation.

      Validation uses the conflict detection filter passed to RowDelta.conflictDetectionFilter(Expression) and applies to operations that happened after the snapshot passed to RowDelta.validateFromSnapshot(long).

      Specified by:
      validateNoConflictingDataFiles in interface RowDelta
      Returns:
      this for method chaining
    • validateNoConflictingDeleteFiles

      public RowDelta validateNoConflictingDeleteFiles()
      Description copied from interface: RowDelta
      Enables validation that delete files added concurrently do not conflict with this commit's operation.

      This method must be called when the table is queried to produce a row delta for UPDATE and MERGE operations independently of the isolation level. Calling this method isn't required for DELETE operations as it is OK to delete a record that is also deleted concurrently.

      Validation uses the conflict detection filter passed to RowDelta.conflictDetectionFilter(Expression) and applies to operations that happened after the snapshot passed to RowDelta.validateFromSnapshot(long).

      Specified by:
      validateNoConflictingDeleteFiles in interface RowDelta
      Returns:
      this for method chaining
    • toBranch

      public RowDelta toBranch(String branch)
      Description copied from interface: SnapshotUpdate
      Perform operations on a particular branch
      Specified by:
      toBranch in interface SnapshotUpdate<RowDelta>
      Parameters:
      branch - which is name of SnapshotRef of type branch.
    • validate

      protected void validate(TableMetadata base, Snapshot parent)
      Validate the current metadata.

      Child operations can override this to add custom validation.

      Parameters:
      base - current table metadata to validate
      parent - ending snapshot on the lineage which is being validated
    • set

      public RowDelta set(String property, String value)
      Description copied from interface: SnapshotUpdate
      Set a summary property in the snapshot produced by this update.
      Parameters:
      property - a String property name
      value - a String property value
      Returns:
      this for method chaining
    • caseSensitive

      public RowDelta caseSensitive(boolean isCaseSensitive)
    • isCaseSensitive

      protected boolean isCaseSensitive()
    • dataSpec

      protected PartitionSpec dataSpec()
    • rowFilter

      protected Expression rowFilter()
    • addedDataFiles

      protected List<DataFile> addedDataFiles()
    • failAnyDelete

      protected void failAnyDelete()
    • failMissingDeletePaths

      protected void failMissingDeletePaths()
    • deleteByRowFilter

      protected void deleteByRowFilter(Expression expr)
      Add a filter to match files to delete. A file will be deleted if all of the rows it contains match this or any other filter passed to this method.
      Parameters:
      expr - an expression to match rows.
    • dropPartition

      protected void dropPartition(int specId, StructLike partition)
      Add a partition tuple to drop from the table during the delete phase.
    • delete

      protected void delete(DataFile file)
      Add a specific data file to be deleted in the new snapshot.
    • delete

      protected void delete(DeleteFile file)
      Add a specific delete file to be deleted in the new snapshot.
    • delete

      protected void delete(CharSequence path)
      Add a specific data path to be deleted in the new snapshot.
    • deletesDataFiles

      protected boolean deletesDataFiles()
    • deletesDeleteFiles

      protected boolean deletesDeleteFiles()
    • addsDataFiles

      protected boolean addsDataFiles()
    • addsDeleteFiles

      protected boolean addsDeleteFiles()
    • add

      protected void add(DataFile file)
      Add a data file to the new snapshot.
    • add

      protected void add(DeleteFile file)
      Add a delete file to the new snapshot.
    • add

      protected void add(DeleteFile file, long dataSequenceNumber)
      Add a delete file to the new snapshot.
    • validateNewDeleteFile

      protected void validateNewDeleteFile(DeleteFile file)
    • add

      protected void add(ManifestFile manifest)
      Add all files in a manifest to the new snapshot.
    • validateAddedDataFiles

      protected void validateAddedDataFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent)
      Validates that no files matching given partitions have been added to the table since a starting snapshot.
      Parameters:
      base - table metadata to validate
      startingSnapshotId - id of the snapshot current at the start of the operation
      partitionSet - a set of partitions to filter new conflicting data files
      parent - ending snapshot on the lineage being validated
    • validateAddedDataFiles

      protected void validateAddedDataFiles(TableMetadata base, Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent)
      Validates that no files matching a filter have been added to the table since a starting snapshot.
      Parameters:
      base - table metadata to validate
      startingSnapshotId - id of the snapshot current at the start of the operation
      conflictDetectionFilter - an expression used to find new conflicting data files
    • validateNoNewDeletesForDataFiles

      protected void validateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, Iterable<DataFile> dataFiles, Snapshot parent)
      Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.
      Parameters:
      base - table metadata to validate
      startingSnapshotId - id of the snapshot current at the start of the operation
      dataFiles - data files to validate have no new row deletes
      parent - ending snapshot on the branch being validated
    • validateNoNewDeletesForDataFiles

      protected void validateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Iterable<DataFile> dataFiles, Snapshot parent)
      Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.
      Parameters:
      base - table metadata to validate
      startingSnapshotId - id of the snapshot current at the start of the operation
      dataFilter - a data filter
      dataFiles - data files to validate have no new row deletes
      parent - ending snapshot on the branch being validated
    • validateNoNewDeleteFiles

      protected void validateNoNewDeleteFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Snapshot parent)
      Validates that no delete files matching a filter have been added to the table since a starting snapshot.
      Parameters:
      base - table metadata to validate
      startingSnapshotId - id of the snapshot current at the start of the operation
      dataFilter - an expression used to find new conflicting delete files
      parent - ending snapshot on the branch being validated
    • validateNoNewDeleteFiles

      protected void validateNoNewDeleteFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent)
      Validates that no delete files matching a partition set have been added to the table since a starting snapshot.
      Parameters:
      base - table metadata to validate
      startingSnapshotId - id of the snapshot current at the start of the operation
      partitionSet - a partition set used to find new conflicting delete files
      parent - ending snapshot on the branch being validated
    • addedDeleteFiles

      protected org.apache.iceberg.DeleteFileIndex addedDeleteFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, PartitionSet partitionSet, Snapshot parent)
      Returns matching delete files have been added to the table since a starting snapshot.
      Parameters:
      base - table metadata to validate
      startingSnapshotId - id of the snapshot current at the start of the operation
      dataFilter - an expression used to find delete files
      partitionSet - a partition set used to find delete files
      parent - parent snapshot of the branch
    • validateDeletedDataFiles

      protected void validateDeletedDataFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Snapshot parent)
      Validates that no files matching a filter have been deleted from the table since a starting snapshot.
      Parameters:
      base - table metadata to validate
      startingSnapshotId - id of the snapshot current at the start of the operation
      dataFilter - an expression used to find deleted data files
      parent - ending snapshot on the branch being validated
    • validateDeletedDataFiles

      protected void validateDeletedDataFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent)
      Validates that no files matching a filter have been deleted from the table since a starting snapshot.
      Parameters:
      base - table metadata to validate
      startingSnapshotId - id of the snapshot current at the start of the operation
      partitionSet - a partition set used to find deleted data files
      parent - ending snapshot on the branch being validated
    • setNewDataFilesDataSequenceNumber

      protected void setNewDataFilesDataSequenceNumber(long sequenceNumber)
    • validateDataFilesExist

      protected void validateDataFilesExist(TableMetadata base, Long startingSnapshotId, CharSequenceSet requiredDataFiles, boolean skipDeletes, Expression conflictDetectionFilter, Snapshot parent)
    • validateAddedDVs

      protected void validateAddedDVs(TableMetadata base, Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent)
    • summary

      protected Map<String,String> summary()
    • apply

      public List<ManifestFile> apply(TableMetadata base, Snapshot snapshot)
      Apply the update's changes to the given metadata and snapshot. Return the new manifest list.
      Parameters:
      base - the base table metadata to apply changes to
      snapshot - snapshot to apply the changes to
      Returns:
      a manifest list for the new snapshot.
    • updateEvent

      public Object updateEvent()
      Description copied from interface: PendingUpdate
      Generates update event to notify about metadata changes
      Returns:
      the generated event
    • cleanUncommitted

      protected void cleanUncommitted(Set<ManifestFile> committed)
      Clean up any uncommitted manifests that were created.

      Manifests may not be committed if apply is called more because a commit conflict has occurred. Implementations may keep around manifests because the same changes will be made by both apply calls. This method instructs the implementation to clean up those manifests and passes the paths of the manifests that were actually committed.

      Parameters:
      committed - a set of manifest paths that were actually committed
    • stageOnly

      public RowDelta stageOnly()
      Description copied from interface: SnapshotUpdate
      Called to stage a snapshot in table metadata, but not update the current snapshot id.
      Specified by:
      stageOnly in interface SnapshotUpdate<ThisT>
      Returns:
      this for method chaining
    • scanManifestsWith

      public RowDelta scanManifestsWith(ExecutorService executorService)
      Description copied from interface: SnapshotUpdate
      Use a particular executor to scan manifests. The default worker pool will be used by default.
      Specified by:
      scanManifestsWith in interface SnapshotUpdate<ThisT>
      Parameters:
      executorService - the provided executor
      Returns:
      this for method chaining
    • ops

      protected TableOperations ops()
    • commitMetrics

      protected CommitMetrics commitMetrics()
    • reportWith

      protected RowDelta reportWith(MetricsReporter newReporter)
    • targetBranch

      protected void targetBranch(String branch)
      A setter for the target branch on which snapshot producer operation should be performed
      Parameters:
      branch - to set as target branch
    • targetBranch

      protected String targetBranch()
    • workerPool

      protected ExecutorService workerPool()
    • deleteWith

      public RowDelta deleteWith(Consumer<String> deleteCallback)
      Description copied from interface: SnapshotUpdate
      Set a callback to delete files instead of the table's default.
      Specified by:
      deleteWith in interface SnapshotUpdate<ThisT>
      Parameters:
      deleteCallback - a String consumer used to delete locations.
      Returns:
      this for method chaining
    • apply

      public Snapshot apply()
      Description copied from interface: PendingUpdate
      Apply the pending changes and return the uncommitted changes for validation.

      This does not result in a permanent update.

      Specified by:
      apply in interface PendingUpdate<ThisT>
      Returns:
      the uncommitted changes that would be committed by calling PendingUpdate.commit()
    • current

      protected TableMetadata current()
    • refresh

      protected TableMetadata refresh()
    • commit

      public void commit()
      Description copied from interface: PendingUpdate
      Apply the pending changes and commit.

      Changes are committed by calling the underlying table's commit method.

      Once the commit is successful, the updated table will be refreshed.

      Specified by:
      commit in interface PendingUpdate<ThisT>
    • cleanAll

      protected void cleanAll()
    • deleteFile

      protected void deleteFile(String path)
    • manifestListPath

      protected OutputFile manifestListPath()
    • newManifestOutputFile

      protected EncryptedOutputFile newManifestOutputFile()
    • newManifestWriter

      protected ManifestWriter<DataFile> newManifestWriter(PartitionSpec spec)
    • newDeleteManifestWriter

      protected ManifestWriter<DeleteFile> newDeleteManifestWriter(PartitionSpec spec)
    • newRollingManifestWriter

      protected RollingManifestWriter<DataFile> newRollingManifestWriter(PartitionSpec spec)
    • newRollingDeleteManifestWriter

      protected RollingManifestWriter<DeleteFile> newRollingDeleteManifestWriter(PartitionSpec spec)
    • newManifestReader

      protected ManifestReader<DataFile> newManifestReader(ManifestFile manifest)
    • newDeleteManifestReader

      protected ManifestReader<DeleteFile> newDeleteManifestReader(ManifestFile manifest)
    • snapshotId

      protected long snapshotId()
    • canInheritSnapshotId

      protected boolean canInheritSnapshotId()
    • cleanupAfterCommit

      protected boolean cleanupAfterCommit()
    • writeDataManifests

      protected List<ManifestFile> writeDataManifests(Collection<DataFile> files, PartitionSpec spec)
    • writeDataManifests

      protected List<ManifestFile> writeDataManifests(Collection<DataFile> files, Long dataSeq, PartitionSpec spec)
    • writeDeleteManifests

      protected List<ManifestFile> writeDeleteManifests(Collection<DeleteFile> files, PartitionSpec spec)