Package org.apache.iceberg.spark.actions
Class MigrateTableSparkAction
java.lang.Object
org.apache.iceberg.spark.actions.MigrateTableSparkAction
- All Implemented Interfaces:
Action<MigrateTable,,MigrateTable.Result> MigrateTable
Takes a Spark table in the source catalog and attempts to transform it into an Iceberg table in
the same location with the same identifier. Once complete the identifier which previously
referred to a non-Iceberg table will refer to the newly migrated Iceberg table.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.MigrateTable
MigrateTable.Result -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final org.apache.iceberg.relocated.com.google.common.base.Joinerprotected static final org.apache.iceberg.relocated.com.google.common.base.Splitterprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final String -
Method Summary
Modifier and TypeMethodDescriptionprotected org.apache.spark.sql.Dataset<FileInfo>backupTableName(String tableName) Sets a table name for the backup of the original table.protected org.apache.spark.sql.connector.catalog.StagingTableCatalogcheckDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) protected org.apache.spark.sql.connector.catalog.TableCatalogcheckSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table, Set<Long> snapshotIds) protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files) protected org.apache.spark.sql.connector.catalog.StagingTableCatalogprotected org.apache.spark.sql.connector.catalog.IdentifierDrops the backup of the original table after a successful migrationprotected voidensureNameMappingPresent(Table table) execute()Executes this action.executeWith(ExecutorService service) Sets the executor service to use for parallel file reading.protected StringgetMetadataLocation(Table table) protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>loadMetadataTable(Table table, MetadataTableType type) protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table, Set<Long> snapshotIds) protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table, Set<Long> snapshotIds) protected JobGroupInfonewJobGroupInfo(String groupId, String desc) protected TablenewStaticTable(TableMetadata metadata, FileIO io) options()protected org.apache.spark.sql.Dataset<FileInfo>otherMetadataFileDS(Table table) protected MigrateTableSparkActionself()protected voidsetProperties(Map<String, String> properties) protected voidsetProperty(String key, String value) protected org.apache.spark.sql.connector.catalog.TableCatalogprotected org.apache.spark.sql.connector.catalog.Identifierprotected Stringprotected org.apache.spark.sql.SparkSessionspark()protected org.apache.spark.api.java.JavaSparkContextprotected StagedSparkTableprotected org.apache.spark.sql.Dataset<FileInfo>statisticsFileDS(Table table, Set<Long> snapshotIds) tableProperties(Map<String, String> properties) Sets table properties in the newly created Iceberg table.tableProperty(String property, String value) Sets a table property in the newly created Iceberg table.protected org.apache.spark.sql.catalyst.catalog.CatalogTableprotected <T> TwithJobGroupInfo(JobGroupInfo info, Supplier<T> supplier)
-
Field Details
-
LOCATION
- See Also:
-
ICEBERG_METADATA_FOLDER
- See Also:
-
EXCLUDED_PROPERTIES
-
MANIFEST
- See Also:
-
MANIFEST_LIST
- See Also:
-
STATISTICS_FILES
- See Also:
-
OTHERS
- See Also:
-
FILE_PATH
- See Also:
-
LAST_MODIFIED
- See Also:
-
COMMA_SPLITTER
protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER -
COMMA_JOINER
protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
-
-
Method Details
-
self
-
destCatalog
protected org.apache.spark.sql.connector.catalog.StagingTableCatalog destCatalog() -
destTableIdent
protected org.apache.spark.sql.connector.catalog.Identifier destTableIdent() -
tableProperties
Description copied from interface:MigrateTableSets table properties in the newly created Iceberg table. Any properties with the same key name will be overwritten.- Specified by:
tablePropertiesin interfaceMigrateTable- Parameters:
properties- a map of properties to set- Returns:
- this for method chaining
-
tableProperty
Description copied from interface:MigrateTableSets a table property in the newly created Iceberg table. Any properties with the same key will be overwritten.- Specified by:
tablePropertyin interfaceMigrateTable- Parameters:
property- a table property namevalue- a table property value- Returns:
- this for method chaining
-
dropBackup
Description copied from interface:MigrateTableDrops the backup of the original table after a successful migration- Specified by:
dropBackupin interfaceMigrateTable- Returns:
- this for method chaining
-
backupTableName
Description copied from interface:MigrateTableSets a table name for the backup of the original table.- Specified by:
backupTableNamein interfaceMigrateTable- Parameters:
tableName- the table name for backup- Returns:
- this for method chaining
-
executeWith
Description copied from interface:MigrateTableSets the executor service to use for parallel file reading. The default is not using executor service.- Specified by:
executeWithin interfaceMigrateTable- Parameters:
service- executor service- Returns:
- this for method chaining
-
execute
Description copied from interface:ActionExecutes this action.- Specified by:
executein interfaceAction<MigrateTable,MigrateTable.Result> - Returns:
- the result of this action
-
destTableProps
-
checkSourceCatalog
protected org.apache.spark.sql.connector.catalog.TableCatalog checkSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) -
sourceTableLocation
-
v1SourceTable
protected org.apache.spark.sql.catalyst.catalog.CatalogTable v1SourceTable() -
sourceCatalog
protected org.apache.spark.sql.connector.catalog.TableCatalog sourceCatalog() -
sourceTableIdent
protected org.apache.spark.sql.connector.catalog.Identifier sourceTableIdent() -
setProperties
-
setProperty
-
additionalProperties
-
checkDestinationCatalog
protected org.apache.spark.sql.connector.catalog.StagingTableCatalog checkDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) -
stageDestTable
-
ensureNameMappingPresent
-
getMetadataLocation
-
spark
protected org.apache.spark.sql.SparkSession spark() -
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext() -
option
-
options
-
options
-
withJobGroupInfo
-
newJobGroupInfo
-
newStaticTable
-
contentFileDS
-
contentFileDS
-
manifestDS
-
manifestDS
-
manifestListDS
-
manifestListDS
-
statisticsFileDS
-
otherMetadataFileDS
-
allReachableOtherMetadataFileDS
-
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type) -
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.- Parameters:
executorService- an executor service to use for parallel deletesdeleteFunc- a delete funcfiles- an iterator of Spark rows of the structure (path: String, type: String)- Returns:
- stats on which files were deleted
-
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files)
-