Package org.apache.iceberg.spark.actions
Class MigrateTableSparkAction
- java.lang.Object
-
- org.apache.iceberg.spark.actions.MigrateTableSparkAction
-
- All Implemented Interfaces:
Action<MigrateTable,MigrateTable.Result>
,MigrateTable
public class MigrateTableSparkAction extends java.lang.Object implements MigrateTable
Takes a Spark table in the source catalog and attempts to transform it into an Iceberg table in the same location with the same identifier. Once complete the identifier which previously referred to a non-Iceberg table will refer to the newly migrated Iceberg table.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.MigrateTable
MigrateTable.Result
-
-
Field Summary
Fields Modifier and Type Field Description protected static org.apache.iceberg.relocated.com.google.common.base.Joiner
COMMA_JOINER
protected static org.apache.iceberg.relocated.com.google.common.base.Splitter
COMMA_SPLITTER
protected static java.util.List<java.lang.String>
EXCLUDED_PROPERTIES
protected static java.lang.String
FILE_PATH
protected static java.lang.String
ICEBERG_METADATA_FOLDER
protected static java.lang.String
LAST_MODIFIED
protected static java.lang.String
LOCATION
protected static java.lang.String
MANIFEST
protected static java.lang.String
MANIFEST_LIST
protected static java.lang.String
OTHERS
protected static java.lang.String
STATISTICS_FILES
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected java.util.Map<java.lang.String,java.lang.String>
additionalProperties()
protected org.apache.spark.sql.Dataset<FileInfo>
allReachableOtherMetadataFileDS(Table table)
MigrateTableSparkAction
backupTableName(java.lang.String tableName)
Sets a table name for the backup of the original table.protected org.apache.spark.sql.connector.catalog.StagingTableCatalog
checkDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
protected org.apache.spark.sql.connector.catalog.TableCatalog
checkSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
protected org.apache.spark.sql.Dataset<FileInfo>
contentFileDS(Table table)
protected org.apache.spark.sql.Dataset<FileInfo>
contentFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary
deleteFiles(java.util.concurrent.ExecutorService executorService, java.util.function.Consumer<java.lang.String> deleteFunc, java.util.Iterator<FileInfo> files)
Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary
deleteFiles(SupportsBulkOperations io, java.util.Iterator<FileInfo> files)
protected org.apache.spark.sql.connector.catalog.StagingTableCatalog
destCatalog()
protected org.apache.spark.sql.connector.catalog.Identifier
destTableIdent()
protected java.util.Map<java.lang.String,java.lang.String>
destTableProps()
MigrateTableSparkAction
dropBackup()
Drops the backup of the original table after a successful migrationprotected void
ensureNameMappingPresent(Table table)
MigrateTable.Result
execute()
Executes this action.MigrateTableSparkAction
executeWith(java.util.concurrent.ExecutorService service)
Sets the executor service to use for parallel file reading.protected java.lang.String
getMetadataLocation(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
loadMetadataTable(Table table, MetadataTableType type)
protected org.apache.spark.sql.Dataset<FileInfo>
manifestDS(Table table)
protected org.apache.spark.sql.Dataset<FileInfo>
manifestDS(Table table, java.util.Set<java.lang.Long> snapshotIds)
protected org.apache.spark.sql.Dataset<FileInfo>
manifestListDS(Table table)
protected org.apache.spark.sql.Dataset<FileInfo>
manifestListDS(Table table, java.util.Set<java.lang.Long> snapshotIds)
protected JobGroupInfo
newJobGroupInfo(java.lang.String groupId, java.lang.String desc)
protected Table
newStaticTable(TableMetadata metadata, FileIO io)
ThisT
option(java.lang.String name, java.lang.String value)
protected java.util.Map<java.lang.String,java.lang.String>
options()
ThisT
options(java.util.Map<java.lang.String,java.lang.String> newOptions)
protected org.apache.spark.sql.Dataset<FileInfo>
otherMetadataFileDS(Table table)
protected MigrateTableSparkAction
self()
protected void
setProperties(java.util.Map<java.lang.String,java.lang.String> properties)
protected void
setProperty(java.lang.String key, java.lang.String value)
protected org.apache.spark.sql.connector.catalog.TableCatalog
sourceCatalog()
protected org.apache.spark.sql.connector.catalog.Identifier
sourceTableIdent()
protected java.lang.String
sourceTableLocation()
protected org.apache.spark.sql.SparkSession
spark()
protected org.apache.spark.api.java.JavaSparkContext
sparkContext()
protected StagedSparkTable
stageDestTable()
protected org.apache.spark.sql.Dataset<FileInfo>
statisticsFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)
MigrateTableSparkAction
tableProperties(java.util.Map<java.lang.String,java.lang.String> properties)
Sets table properties in the newly created Iceberg table.MigrateTableSparkAction
tableProperty(java.lang.String property, java.lang.String value)
Sets a table property in the newly created Iceberg table.protected org.apache.spark.sql.catalyst.catalog.CatalogTable
v1SourceTable()
protected <T> T
withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)
-
-
-
Field Detail
-
LOCATION
protected static final java.lang.String LOCATION
- See Also:
- Constant Field Values
-
ICEBERG_METADATA_FOLDER
protected static final java.lang.String ICEBERG_METADATA_FOLDER
- See Also:
- Constant Field Values
-
EXCLUDED_PROPERTIES
protected static final java.util.List<java.lang.String> EXCLUDED_PROPERTIES
-
MANIFEST
protected static final java.lang.String MANIFEST
- See Also:
- Constant Field Values
-
MANIFEST_LIST
protected static final java.lang.String MANIFEST_LIST
- See Also:
- Constant Field Values
-
STATISTICS_FILES
protected static final java.lang.String STATISTICS_FILES
- See Also:
- Constant Field Values
-
OTHERS
protected static final java.lang.String OTHERS
- See Also:
- Constant Field Values
-
FILE_PATH
protected static final java.lang.String FILE_PATH
- See Also:
- Constant Field Values
-
LAST_MODIFIED
protected static final java.lang.String LAST_MODIFIED
- See Also:
- Constant Field Values
-
COMMA_SPLITTER
protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER
-
COMMA_JOINER
protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
-
-
Method Detail
-
self
protected MigrateTableSparkAction self()
-
destCatalog
protected org.apache.spark.sql.connector.catalog.StagingTableCatalog destCatalog()
-
destTableIdent
protected org.apache.spark.sql.connector.catalog.Identifier destTableIdent()
-
tableProperties
public MigrateTableSparkAction tableProperties(java.util.Map<java.lang.String,java.lang.String> properties)
Description copied from interface:MigrateTable
Sets table properties in the newly created Iceberg table. Any properties with the same key name will be overwritten.- Specified by:
tableProperties
in interfaceMigrateTable
- Parameters:
properties
- a map of properties to set- Returns:
- this for method chaining
-
tableProperty
public MigrateTableSparkAction tableProperty(java.lang.String property, java.lang.String value)
Description copied from interface:MigrateTable
Sets a table property in the newly created Iceberg table. Any properties with the same key will be overwritten.- Specified by:
tableProperty
in interfaceMigrateTable
- Parameters:
property
- a table property namevalue
- a table property value- Returns:
- this for method chaining
-
dropBackup
public MigrateTableSparkAction dropBackup()
Description copied from interface:MigrateTable
Drops the backup of the original table after a successful migration- Specified by:
dropBackup
in interfaceMigrateTable
- Returns:
- this for method chaining
-
backupTableName
public MigrateTableSparkAction backupTableName(java.lang.String tableName)
Description copied from interface:MigrateTable
Sets a table name for the backup of the original table.- Specified by:
backupTableName
in interfaceMigrateTable
- Parameters:
tableName
- the table name for backup- Returns:
- this for method chaining
-
executeWith
public MigrateTableSparkAction executeWith(java.util.concurrent.ExecutorService service)
Description copied from interface:MigrateTable
Sets the executor service to use for parallel file reading. The default is not using executor service.- Specified by:
executeWith
in interfaceMigrateTable
- Parameters:
service
- executor service- Returns:
- this for method chaining
-
execute
public MigrateTable.Result execute()
Description copied from interface:Action
Executes this action.- Specified by:
execute
in interfaceAction<MigrateTable,MigrateTable.Result>
- Returns:
- the result of this action
-
destTableProps
protected java.util.Map<java.lang.String,java.lang.String> destTableProps()
-
checkSourceCatalog
protected org.apache.spark.sql.connector.catalog.TableCatalog checkSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
-
sourceTableLocation
protected java.lang.String sourceTableLocation()
-
v1SourceTable
protected org.apache.spark.sql.catalyst.catalog.CatalogTable v1SourceTable()
-
sourceCatalog
protected org.apache.spark.sql.connector.catalog.TableCatalog sourceCatalog()
-
sourceTableIdent
protected org.apache.spark.sql.connector.catalog.Identifier sourceTableIdent()
-
setProperties
protected void setProperties(java.util.Map<java.lang.String,java.lang.String> properties)
-
setProperty
protected void setProperty(java.lang.String key, java.lang.String value)
-
additionalProperties
protected java.util.Map<java.lang.String,java.lang.String> additionalProperties()
-
checkDestinationCatalog
protected org.apache.spark.sql.connector.catalog.StagingTableCatalog checkDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
-
stageDestTable
protected StagedSparkTable stageDestTable()
-
ensureNameMappingPresent
protected void ensureNameMappingPresent(Table table)
-
getMetadataLocation
protected java.lang.String getMetadataLocation(Table table)
-
spark
protected org.apache.spark.sql.SparkSession spark()
-
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext()
-
option
public ThisT option(java.lang.String name, java.lang.String value)
-
options
public ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions)
-
options
protected java.util.Map<java.lang.String,java.lang.String> options()
-
withJobGroupInfo
protected <T> T withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)
-
newJobGroupInfo
protected JobGroupInfo newJobGroupInfo(java.lang.String groupId, java.lang.String desc)
-
newStaticTable
protected Table newStaticTable(TableMetadata metadata, FileIO io)
-
contentFileDS
protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)
-
manifestDS
protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table, java.util.Set<java.lang.Long> snapshotIds)
-
manifestListDS
protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table, java.util.Set<java.lang.Long> snapshotIds)
-
statisticsFileDS
protected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)
-
otherMetadataFileDS
protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table)
-
allReachableOtherMetadataFileDS
protected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS(Table table)
-
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type)
-
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(java.util.concurrent.ExecutorService executorService, java.util.function.Consumer<java.lang.String> deleteFunc, java.util.Iterator<FileInfo> files)
Deletes files and keeps track of how many files were removed for each file type.- Parameters:
executorService
- an executor service to use for parallel deletesdeleteFunc
- a delete funcfiles
- an iterator of Spark rows of the structure (path: String, type: String)- Returns:
- stats on which files were deleted
-
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, java.util.Iterator<FileInfo> files)
-
-