Package org.apache.iceberg.spark.actions
Class DeleteReachableFilesSparkAction
java.lang.Object
org.apache.iceberg.spark.actions.DeleteReachableFilesSparkAction
- All Implemented Interfaces:
- Action<DeleteReachableFiles,,- DeleteReachableFiles.Result> - DeleteReachableFiles
An implementation of 
DeleteReachableFiles that uses metadata tables in Spark to determine
 which files should be deleted.- 
Nested Class SummaryNested classes/interfaces inherited from interface org.apache.iceberg.actions.DeleteReachableFilesDeleteReachableFiles.Result
- 
Field SummaryFieldsModifier and TypeFieldDescriptionprotected static final org.apache.iceberg.relocated.com.google.common.base.Joinerprotected static final org.apache.iceberg.relocated.com.google.common.base.Splitterprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringstatic final Stringstatic final boolean
- 
Method SummaryModifier and TypeMethodDescriptionprotected org.apache.spark.sql.Dataset<FileInfo>protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table, Set<Long> snapshotIds) protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files) deleteWith(Consumer<String> newDeleteFunc) Passes an alternative delete implementation that will be used for files.execute()Executes this action.executeDeleteWith(ExecutorService executorService) Passes an alternative executor service that will be used for files removal.Set theFileIOto be used for files removalprotected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>loadMetadataTable(Table table, MetadataTableType type) protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table, Set<Long> snapshotIds) protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table, Set<Long> snapshotIds) protected JobGroupInfonewJobGroupInfo(String groupId, String desc) protected TablenewStaticTable(TableMetadata metadata, FileIO io) options()protected org.apache.spark.sql.Dataset<FileInfo>otherMetadataFileDS(Table table) protected DeleteReachableFilesSparkActionself()protected org.apache.spark.sql.SparkSessionspark()protected org.apache.spark.api.java.JavaSparkContextprotected org.apache.spark.sql.Dataset<FileInfo>statisticsFileDS(Table table, Set<Long> snapshotIds) protected <T> TwithJobGroupInfo(JobGroupInfo info, Supplier<T> supplier) 
- 
Field Details- 
STREAM_RESULTS- See Also:
 
- 
STREAM_RESULTS_DEFAULTpublic static final boolean STREAM_RESULTS_DEFAULT- See Also:
 
- 
MANIFEST- See Also:
 
- 
MANIFEST_LIST- See Also:
 
- 
STATISTICS_FILES- See Also:
 
- 
OTHERS- See Also:
 
- 
FILE_PATH- See Also:
 
- 
LAST_MODIFIED- See Also:
 
- 
COMMA_SPLITTERprotected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER
- 
COMMA_JOINERprotected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
 
- 
- 
Method Details- 
self
- 
ioDescription copied from interface:DeleteReachableFilesSet theFileIOto be used for files removal- Specified by:
- ioin interface- DeleteReachableFiles
- Parameters:
- fileIO- FileIO to use for files removal
- Returns:
- this for method chaining
 
- 
deleteWithDescription copied from interface:DeleteReachableFilesPasses an alternative delete implementation that will be used for files.- Specified by:
- deleteWithin interface- DeleteReachableFiles
- Parameters:
- newDeleteFunc- a function that will be called to delete files. The function accepts path to file as an argument.
- Returns:
- this for method chaining
 
- 
executeDeleteWithDescription copied from interface:DeleteReachableFilesPasses an alternative executor service that will be used for files removal. This service will only be used if a custom delete function is provided byDeleteReachableFiles.deleteWith(Consumer)or if the FileIO does notsupport bulk deletes. Otherwise, parallelism should be controlled by the IO specificdeleteFilesmethod.- Specified by:
- executeDeleteWithin interface- DeleteReachableFiles
- Parameters:
- executorService- the service to use
- Returns:
- this for method chaining
 
- 
executeDescription copied from interface:ActionExecutes this action.- Specified by:
- executein interface- Action<DeleteReachableFiles,- DeleteReachableFiles.Result> 
- Returns:
- the result of this action
 
- 
sparkprotected org.apache.spark.sql.SparkSession spark()
- 
sparkContextprotected org.apache.spark.api.java.JavaSparkContext sparkContext()
- 
option
- 
options
- 
options
- 
withJobGroupInfo
- 
newJobGroupInfo
- 
newStaticTable
- 
contentFileDS
- 
contentFileDS
- 
manifestDS
- 
manifestDS
- 
manifestListDS
- 
manifestListDS
- 
statisticsFileDS
- 
otherMetadataFileDS
- 
allReachableOtherMetadataFileDS
- 
loadMetadataTableprotected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type) 
- 
deleteFilesprotected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.- Parameters:
- executorService- an executor service to use for parallel deletes
- deleteFunc- a delete func
- files- an iterator of Spark rows of the structure (path: String, type: String)
- Returns:
- stats on which files were deleted
 
- 
deleteFilesprotected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files) 
 
-