Package org.apache.iceberg.spark.actions
Class DeleteReachableFilesSparkAction
java.lang.Object
org.apache.iceberg.spark.actions.DeleteReachableFilesSparkAction
- All Implemented Interfaces:
Action<DeleteReachableFiles,,DeleteReachableFiles.Result> DeleteReachableFiles
An implementation of
DeleteReachableFiles that uses metadata tables in Spark to determine
which files should be deleted.-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.DeleteReachableFiles
DeleteReachableFiles.Result -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final org.apache.iceberg.relocated.com.google.common.base.Joinerprotected static final org.apache.iceberg.relocated.com.google.common.base.Splitterprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringstatic final Stringstatic final boolean -
Method Summary
Modifier and TypeMethodDescriptionprotected org.apache.spark.sql.Dataset<FileInfo> protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table, Set<Long> snapshotIds) protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files) deleteWith(Consumer<String> newDeleteFunc) Passes an alternative delete implementation that will be used for files.execute()Executes this action.executeDeleteWith(ExecutorService executorService) Passes an alternative executor service that will be used for files removal.Set theFileIOto be used for files removalprotected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type) protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table, Set<Long> snapshotIds) protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table, Set<Long> snapshotIds) protected JobGroupInfonewJobGroupInfo(String groupId, String desc) protected TablenewStaticTable(TableMetadata metadata, FileIO io) options()protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table) protected DeleteReachableFilesSparkActionself()protected org.apache.spark.sql.SparkSessionspark()protected org.apache.spark.api.java.JavaSparkContextprotected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table, Set<Long> snapshotIds) protected <T> TwithJobGroupInfo(JobGroupInfo info, Supplier<T> supplier)
-
Field Details
-
STREAM_RESULTS
- See Also:
-
STREAM_RESULTS_DEFAULT
public static final boolean STREAM_RESULTS_DEFAULT- See Also:
-
MANIFEST
- See Also:
-
MANIFEST_LIST
- See Also:
-
STATISTICS_FILES
- See Also:
-
OTHERS
- See Also:
-
FILE_PATH
- See Also:
-
LAST_MODIFIED
- See Also:
-
COMMA_SPLITTER
protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER -
COMMA_JOINER
protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
-
-
Method Details
-
self
-
io
Description copied from interface:DeleteReachableFilesSet theFileIOto be used for files removal- Specified by:
ioin interfaceDeleteReachableFiles- Parameters:
fileIO- FileIO to use for files removal- Returns:
- this for method chaining
-
deleteWith
Description copied from interface:DeleteReachableFilesPasses an alternative delete implementation that will be used for files.- Specified by:
deleteWithin interfaceDeleteReachableFiles- Parameters:
newDeleteFunc- a function that will be called to delete files. The function accepts path to file as an argument.- Returns:
- this for method chaining
-
executeDeleteWith
Description copied from interface:DeleteReachableFilesPasses an alternative executor service that will be used for files removal. This service will only be used if a custom delete function is provided byDeleteReachableFiles.deleteWith(Consumer)or if the FileIO does notsupport bulk deletes. Otherwise, parallelism should be controlled by the IO specificdeleteFilesmethod.- Specified by:
executeDeleteWithin interfaceDeleteReachableFiles- Parameters:
executorService- the service to use- Returns:
- this for method chaining
-
execute
Description copied from interface:ActionExecutes this action.- Specified by:
executein interfaceAction<DeleteReachableFiles,DeleteReachableFiles.Result> - Returns:
- the result of this action
-
spark
protected org.apache.spark.sql.SparkSession spark() -
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext() -
option
-
options
-
options
-
withJobGroupInfo
-
newJobGroupInfo
-
newStaticTable
-
contentFileDS
-
contentFileDS
-
manifestDS
-
manifestDS
-
manifestListDS
-
manifestListDS
-
statisticsFileDS
-
otherMetadataFileDS
-
allReachableOtherMetadataFileDS
-
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type) -
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.- Parameters:
executorService- an executor service to use for parallel deletesdeleteFunc- a delete funcfiles- an iterator of Spark rows of the structure (path: String, type: String)- Returns:
- stats on which files were deleted
-
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files)
-