Class ExpireSnapshotsSparkAction

java.lang.Object
org.apache.iceberg.spark.actions.ExpireSnapshotsSparkAction
All Implemented Interfaces:
Action<ExpireSnapshots,ExpireSnapshots.Result>, ExpireSnapshots

public class ExpireSnapshotsSparkAction extends Object implements ExpireSnapshots
An action that performs the same operation as ExpireSnapshots but uses Spark to determine the delta in files between the pre and post-expiration table metadata. All of the same restrictions of ExpireSnapshots also apply to this action.

This action first leverages ExpireSnapshots to expire snapshots and then uses metadata tables to find files that can be safely deleted. This is done by anti-joining two Datasets that contain all manifest and content files before and after the expiration. The snapshot expiration will be fully committed before any deletes are issued.

This operation performs a shuffle so the parallelism can be controlled through 'spark.sql.shuffle.partitions'.

Deletes are still performed locally after retrieving the results from the Spark executors.