Interface ReplacePartitions

All Superinterfaces:
PendingUpdate<Snapshot>, SnapshotUpdate<ReplacePartitions>
All Known Implementing Classes:
BaseReplacePartitions

public interface ReplacePartitions extends SnapshotUpdate<ReplacePartitions>
API for overwriting files in a table by partition.

This is provided to implement SQL compatible with Hive table operations but is not recommended. Instead, use the overwrite API to explicitly overwrite data.

The default validation mode is idempotent, meaning the overwrite is correct and should be committed out regardless of other concurrent changes to the table. Alternatively, this API can be configured to validate that no new data or deletes have been applied since a snapshot ID associated when this operation began. This can be done by calling validateNoConflictingDeletes(), validateNoConflictingData(), to ensure that no conflicting delete files or data files respectively have been written since the snapshot passed to validateFromSnapshot(long).

This API accumulates file additions and produces a new Snapshot of the table by replacing all files in partitions with new data with the new additions. This operation is used to implement dynamic partition replacement.

When committing, these changes will be applied to the latest table snapshot. Commit conflicts will be resolved by applying the changes to the new latest snapshot and reattempting the commit.

  • Method Details

    • addFile

      ReplacePartitions addFile(DataFile file)
      Add a DataFile to the table.
      Parameters:
      file - a data file
      Returns:
      this for method chaining
    • validateAppendOnly

      ReplacePartitions validateAppendOnly()
      Validate that no partitions will be replaced and the operation is append-only.
      Returns:
      this for method chaining
    • validateFromSnapshot

      ReplacePartitions validateFromSnapshot(long snapshotId)
      Set the snapshot ID used in validations for this operation.

      All validations will check changes after this snapshot ID. If this is not called, validation will occur from the beginning of the table's history.

      This method should be called before this operation is committed. If a concurrent operation committed a data or delta file or removed a data file after the given snapshot ID that might contain rows matching a partition marked for deletion, validation will detect this and fail.

      Parameters:
      snapshotId - a snapshot ID, it should be set to when this operation started to read the table.
      Returns:
      this for method chaining
    • validateNoConflictingDeletes

      ReplacePartitions validateNoConflictingDeletes()
      Enables validation that deletes that happened concurrently do not conflict with this commit's operation.

      Validating concurrent deletes is required during non-idempotent replace partition operations. This will check if a concurrent operation deletes data in any of the partitions being overwritten, as the replace partition must be aborted to avoid undeleting rows that were removed concurrently.

      Returns:
      this for method chaining
    • validateNoConflictingData

      ReplacePartitions validateNoConflictingData()
      Enables validation that data added concurrently does not conflict with this commit's operation.

      Validating concurrent data files is required during non-idempotent replace partition operations. This will check if a concurrent operation inserts data in any of the partitions being overwritten, as the replace partition must be aborted to avoid removing rows added concurrently.

      Returns:
      this for method chaining