Package org.apache.iceberg.spark.actions
Class RewriteTablePathSparkAction
java.lang.Object
org.apache.iceberg.spark.actions.RewriteTablePathSparkAction
- All Implemented Interfaces:
Action<RewriteTablePath,,RewriteTablePath.Result> RewriteTablePath
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classNested classes/interfaces inherited from interface org.apache.iceberg.actions.RewriteTablePath
RewriteTablePath.Result -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final org.apache.iceberg.relocated.com.google.common.base.Joinerprotected static final org.apache.iceberg.relocated.com.google.common.base.Splitterprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final String -
Method Summary
Modifier and TypeMethodDescriptionprotected org.apache.spark.sql.Dataset<FileInfo>protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table, Set<Long> snapshotIds) protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files) endVersion(String eVersion) Last metadata version to rewrite, identified by name of a metadata.json file in the table's metadata log.execute()Executes this action.protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>loadMetadataTable(Table table, MetadataTableType type) protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table, Set<Long> snapshotIds) protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table, Set<Long> snapshotIds) protected JobGroupInfonewJobGroupInfo(String groupId, String desc) protected TablenewStaticTable(String metadataFileLocation, FileIO io) protected TablenewStaticTable(TableMetadata metadata, FileIO io) options()protected org.apache.spark.sql.Dataset<FileInfo>otherMetadataFileDS(Table table) rewriteLocationPrefix(String sPrefix, String tPrefix) Configure a source prefix that will be replaced by the specified target prefix in all pathsprotected RewriteTablePathself()protected org.apache.spark.sql.SparkSessionspark()protected org.apache.spark.api.java.JavaSparkContextstagingLocation(String stagingLocation) Custom staging location.startVersion(String sVersion) First metadata version to rewrite, identified by name of a metadata.json file in the table's metadata log.protected org.apache.spark.sql.Dataset<FileInfo>statisticsFileDS(Table table, Set<Long> snapshotIds) protected <T> TwithJobGroupInfo(JobGroupInfo info, Supplier<T> supplier)
-
Field Details
-
MANIFEST
- See Also:
-
MANIFEST_LIST
- See Also:
-
STATISTICS_FILES
- See Also:
-
OTHERS
- See Also:
-
FILE_PATH
- See Also:
-
LAST_MODIFIED
- See Also:
-
COMMA_SPLITTER
protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER -
COMMA_JOINER
protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
-
-
Method Details
-
self
-
rewriteLocationPrefix
Description copied from interface:RewriteTablePathConfigure a source prefix that will be replaced by the specified target prefix in all paths- Specified by:
rewriteLocationPrefixin interfaceRewriteTablePath- Parameters:
sPrefix- the source prefix to be replacedtPrefix- the target prefix- Returns:
- this for method chaining
-
startVersion
Description copied from interface:RewriteTablePathFirst metadata version to rewrite, identified by name of a metadata.json file in the table's metadata log. It is optional, if provided then this action will only rewrite metadata files added after this version.- Specified by:
startVersionin interfaceRewriteTablePath- Parameters:
sVersion- name of a metadata.json file. For example, "00001-8893aa9e-f92e-4443-80e7-cfa42238a654.metadata.json".- Returns:
- this for method chaining
-
endVersion
Description copied from interface:RewriteTablePathLast metadata version to rewrite, identified by name of a metadata.json file in the table's metadata log. It is optional, if provided then this action will only rewrite metadata files added before this file, including the file itself.- Specified by:
endVersionin interfaceRewriteTablePath- Parameters:
eVersion- name of a metadata.json file. For example, "00001-8893aa9e-f92e-4443-80e7-cfa42238a654.metadata.json".- Returns:
- this for method chaining
-
stagingLocation
Description copied from interface:RewriteTablePathCustom staging location. It is optional. By default, staging location is a subdirectory under table's metadata directory.- Specified by:
stagingLocationin interfaceRewriteTablePath- Parameters:
stagingLocation- the staging location- Returns:
- this for method chaining
-
execute
Description copied from interface:ActionExecutes this action.- Specified by:
executein interfaceAction<RewriteTablePath,RewriteTablePath.Result> - Returns:
- the result of this action
-
spark
protected org.apache.spark.sql.SparkSession spark() -
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext() -
option
-
options
-
options
-
withJobGroupInfo
-
newJobGroupInfo
-
newStaticTable
-
newStaticTable
-
contentFileDS
-
contentFileDS
-
manifestDS
-
manifestDS
-
manifestListDS
-
manifestListDS
-
statisticsFileDS
-
otherMetadataFileDS
-
allReachableOtherMetadataFileDS
-
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type) -
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.- Parameters:
executorService- an executor service to use for parallel deletesdeleteFunc- a delete funcfiles- an iterator of Spark rows of the structure (path: String, type: String)- Returns:
- stats on which files were deleted
-
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files)
-