Package org.apache.iceberg.spark.actions
Class MigrateTableSparkAction
- java.lang.Object
- 
- org.apache.iceberg.spark.actions.MigrateTableSparkAction
 
- 
- All Implemented Interfaces:
- Action<MigrateTable,MigrateTable.Result>,- MigrateTable
 
 public class MigrateTableSparkAction extends java.lang.Object implements MigrateTable Takes a Spark table in the source catalog and attempts to transform it into an Iceberg table in the same location with the same identifier. Once complete the identifier which previously referred to a non-Iceberg table will refer to the newly migrated Iceberg table.
- 
- 
Nested Class Summary- 
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.MigrateTableMigrateTable.Result
 
- 
 - 
Field SummaryFields Modifier and Type Field Description protected static org.apache.iceberg.relocated.com.google.common.base.JoinerCOMMA_JOINERprotected static org.apache.iceberg.relocated.com.google.common.base.SplitterCOMMA_SPLITTERprotected static java.util.List<java.lang.String>EXCLUDED_PROPERTIESprotected static java.lang.StringFILE_PATHprotected static java.lang.StringICEBERG_METADATA_FOLDERprotected static java.lang.StringLAST_MODIFIEDprotected static java.lang.StringLOCATIONprotected static java.lang.StringMANIFESTprotected static java.lang.StringMANIFEST_LISTprotected static java.lang.StringOTHERSprotected static java.lang.StringSTATISTICS_FILES
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description protected java.util.Map<java.lang.String,java.lang.String>additionalProperties()protected org.apache.spark.sql.Dataset<FileInfo>allReachableOtherMetadataFileDS(Table table)MigrateTableSparkActionbackupTableName(java.lang.String tableName)Sets a table name for the backup of the original table.protected org.apache.spark.sql.connector.catalog.StagingTableCatalogcheckDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)protected org.apache.spark.sql.connector.catalog.TableCatalogcheckSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table)protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(java.util.concurrent.ExecutorService executorService, java.util.function.Consumer<java.lang.String> deleteFunc, java.util.Iterator<FileInfo> files)Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(SupportsBulkOperations io, java.util.Iterator<FileInfo> files)protected org.apache.spark.sql.connector.catalog.StagingTableCatalogdestCatalog()protected org.apache.spark.sql.connector.catalog.IdentifierdestTableIdent()protected java.util.Map<java.lang.String,java.lang.String>destTableProps()MigrateTableSparkActiondropBackup()Drops the backup of the original table after a successful migrationprotected voidensureNameMappingPresent(Table table)MigrateTable.Resultexecute()Executes this action.MigrateTableSparkActionexecuteWith(java.util.concurrent.ExecutorService service)Sets the executor service to use for parallel file reading.protected java.lang.StringgetMetadataLocation(Table table)protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>loadMetadataTable(Table table, MetadataTableType type)protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table)protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table, java.util.Set<java.lang.Long> snapshotIds)protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table)protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table, java.util.Set<java.lang.Long> snapshotIds)protected JobGroupInfonewJobGroupInfo(java.lang.String groupId, java.lang.String desc)protected TablenewStaticTable(TableMetadata metadata, FileIO io)ThisToption(java.lang.String name, java.lang.String value)protected java.util.Map<java.lang.String,java.lang.String>options()ThisToptions(java.util.Map<java.lang.String,java.lang.String> newOptions)protected org.apache.spark.sql.Dataset<FileInfo>otherMetadataFileDS(Table table)protected MigrateTableSparkActionself()protected voidsetProperties(java.util.Map<java.lang.String,java.lang.String> properties)protected voidsetProperty(java.lang.String key, java.lang.String value)protected org.apache.spark.sql.connector.catalog.TableCatalogsourceCatalog()protected org.apache.spark.sql.connector.catalog.IdentifiersourceTableIdent()protected java.lang.StringsourceTableLocation()protected org.apache.spark.sql.SparkSessionspark()protected org.apache.spark.api.java.JavaSparkContextsparkContext()protected StagedSparkTablestageDestTable()protected org.apache.spark.sql.Dataset<FileInfo>statisticsFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)MigrateTableSparkActiontableProperties(java.util.Map<java.lang.String,java.lang.String> properties)Sets table properties in the newly created Iceberg table.MigrateTableSparkActiontableProperty(java.lang.String property, java.lang.String value)Sets a table property in the newly created Iceberg table.protected org.apache.spark.sql.catalyst.catalog.CatalogTablev1SourceTable()protected <T> TwithJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)
 
- 
- 
- 
Field Detail- 
LOCATIONprotected static final java.lang.String LOCATION - See Also:
- Constant Field Values
 
 - 
ICEBERG_METADATA_FOLDERprotected static final java.lang.String ICEBERG_METADATA_FOLDER - See Also:
- Constant Field Values
 
 - 
EXCLUDED_PROPERTIESprotected static final java.util.List<java.lang.String> EXCLUDED_PROPERTIES 
 - 
MANIFESTprotected static final java.lang.String MANIFEST - See Also:
- Constant Field Values
 
 - 
MANIFEST_LISTprotected static final java.lang.String MANIFEST_LIST - See Also:
- Constant Field Values
 
 - 
STATISTICS_FILESprotected static final java.lang.String STATISTICS_FILES - See Also:
- Constant Field Values
 
 - 
OTHERSprotected static final java.lang.String OTHERS - See Also:
- Constant Field Values
 
 - 
FILE_PATHprotected static final java.lang.String FILE_PATH - See Also:
- Constant Field Values
 
 - 
LAST_MODIFIEDprotected static final java.lang.String LAST_MODIFIED - See Also:
- Constant Field Values
 
 - 
COMMA_SPLITTERprotected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER 
 - 
COMMA_JOINERprotected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER 
 
- 
 - 
Method Detail- 
selfprotected MigrateTableSparkAction self() 
 - 
destCatalogprotected org.apache.spark.sql.connector.catalog.StagingTableCatalog destCatalog() 
 - 
destTableIdentprotected org.apache.spark.sql.connector.catalog.Identifier destTableIdent() 
 - 
tablePropertiespublic MigrateTableSparkAction tableProperties(java.util.Map<java.lang.String,java.lang.String> properties) Description copied from interface:MigrateTableSets table properties in the newly created Iceberg table. Any properties with the same key name will be overwritten.- Specified by:
- tablePropertiesin interface- MigrateTable
- Parameters:
- properties- a map of properties to set
- Returns:
- this for method chaining
 
 - 
tablePropertypublic MigrateTableSparkAction tableProperty(java.lang.String property, java.lang.String value) Description copied from interface:MigrateTableSets a table property in the newly created Iceberg table. Any properties with the same key will be overwritten.- Specified by:
- tablePropertyin interface- MigrateTable
- Parameters:
- property- a table property name
- value- a table property value
- Returns:
- this for method chaining
 
 - 
dropBackuppublic MigrateTableSparkAction dropBackup() Description copied from interface:MigrateTableDrops the backup of the original table after a successful migration- Specified by:
- dropBackupin interface- MigrateTable
- Returns:
- this for method chaining
 
 - 
backupTableNamepublic MigrateTableSparkAction backupTableName(java.lang.String tableName) Description copied from interface:MigrateTableSets a table name for the backup of the original table.- Specified by:
- backupTableNamein interface- MigrateTable
- Parameters:
- tableName- the table name for backup
- Returns:
- this for method chaining
 
 - 
executeWithpublic MigrateTableSparkAction executeWith(java.util.concurrent.ExecutorService service) Description copied from interface:MigrateTableSets the executor service to use for parallel file reading. The default is not using executor service.- Specified by:
- executeWithin interface- MigrateTable
- Parameters:
- service- executor service
- Returns:
- this for method chaining
 
 - 
executepublic MigrateTable.Result execute() Description copied from interface:ActionExecutes this action.- Specified by:
- executein interface- Action<MigrateTable,MigrateTable.Result>
- Returns:
- the result of this action
 
 - 
destTablePropsprotected java.util.Map<java.lang.String,java.lang.String> destTableProps() 
 - 
checkSourceCatalogprotected org.apache.spark.sql.connector.catalog.TableCatalog checkSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) 
 - 
sourceTableLocationprotected java.lang.String sourceTableLocation() 
 - 
v1SourceTableprotected org.apache.spark.sql.catalyst.catalog.CatalogTable v1SourceTable() 
 - 
sourceCatalogprotected org.apache.spark.sql.connector.catalog.TableCatalog sourceCatalog() 
 - 
sourceTableIdentprotected org.apache.spark.sql.connector.catalog.Identifier sourceTableIdent() 
 - 
setPropertiesprotected void setProperties(java.util.Map<java.lang.String,java.lang.String> properties) 
 - 
setPropertyprotected void setProperty(java.lang.String key, java.lang.String value)
 - 
additionalPropertiesprotected java.util.Map<java.lang.String,java.lang.String> additionalProperties() 
 - 
checkDestinationCatalogprotected org.apache.spark.sql.connector.catalog.StagingTableCatalog checkDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) 
 - 
stageDestTableprotected StagedSparkTable stageDestTable() 
 - 
ensureNameMappingPresentprotected void ensureNameMappingPresent(Table table) 
 - 
getMetadataLocationprotected java.lang.String getMetadataLocation(Table table) 
 - 
sparkprotected org.apache.spark.sql.SparkSession spark() 
 - 
sparkContextprotected org.apache.spark.api.java.JavaSparkContext sparkContext() 
 - 
optionpublic ThisT option(java.lang.String name, java.lang.String value)
 - 
optionspublic ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions) 
 - 
optionsprotected java.util.Map<java.lang.String,java.lang.String> options() 
 - 
withJobGroupInfoprotected <T> T withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier) 
 - 
newJobGroupInfoprotected JobGroupInfo newJobGroupInfo(java.lang.String groupId, java.lang.String desc) 
 - 
newStaticTableprotected Table newStaticTable(TableMetadata metadata, FileIO io) 
 - 
contentFileDSprotected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds) 
 - 
manifestDSprotected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table, java.util.Set<java.lang.Long> snapshotIds) 
 - 
manifestListDSprotected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table, java.util.Set<java.lang.Long> snapshotIds) 
 - 
statisticsFileDSprotected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds) 
 - 
otherMetadataFileDSprotected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table) 
 - 
allReachableOtherMetadataFileDSprotected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS(Table table) 
 - 
loadMetadataTableprotected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type) 
 - 
deleteFilesprotected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(java.util.concurrent.ExecutorService executorService, java.util.function.Consumer<java.lang.String> deleteFunc, java.util.Iterator<FileInfo> files)Deletes files and keeps track of how many files were removed for each file type.- Parameters:
- executorService- an executor service to use for parallel deletes
- deleteFunc- a delete func
- files- an iterator of Spark rows of the structure (path: String, type: String)
- Returns:
- stats on which files were deleted
 
 - 
deleteFilesprotected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, java.util.Iterator<FileInfo> files) 
 
- 
 
-