Package org.apache.iceberg.spark.actions
Class SnapshotTableSparkAction
java.lang.Object
org.apache.iceberg.spark.actions.SnapshotTableSparkAction
- All Implemented Interfaces:
 Action<SnapshotTable,,SnapshotTable.Result> SnapshotTable
Creates a new Iceberg table based on a source Spark table. The new Iceberg table will have a
 different data and metadata directory allowing it to exist independently of the source table.
- 
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.SnapshotTable
SnapshotTable.Result - 
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final org.apache.iceberg.relocated.com.google.common.base.Joinerprotected static final org.apache.iceberg.relocated.com.google.common.base.Splitterprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final String - 
Method Summary
Modifier and TypeMethodDescriptionprotected org.apache.spark.sql.Dataset<FileInfo>Sets the table identifier for the newly created Iceberg table.protected org.apache.spark.sql.connector.catalog.StagingTableCatalogcheckDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) protected org.apache.spark.sql.connector.catalog.TableCatalogcheckSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table, Set<Long> snapshotIds) protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files) protected org.apache.spark.sql.connector.catalog.StagingTableCatalogprotected org.apache.spark.sql.connector.catalog.Identifierprotected voidensureNameMappingPresent(Table table) execute()Executes this action.executeWith(ExecutorService service) Sets the executor service to use for parallel file reading.protected StringgetMetadataLocation(Table table) protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>loadMetadataTable(Table table, MetadataTableType type) protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table, Set<Long> snapshotIds) protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table, Set<Long> snapshotIds) protected JobGroupInfonewJobGroupInfo(String groupId, String desc) protected TablenewStaticTable(TableMetadata metadata, FileIO io) options()protected org.apache.spark.sql.Dataset<FileInfo>otherMetadataFileDS(Table table) protected SnapshotTableSparkActionself()protected voidsetProperties(Map<String, String> properties) protected voidsetProperty(String key, String value) protected org.apache.spark.sql.connector.catalog.TableCatalogprotected org.apache.spark.sql.connector.catalog.Identifierprotected Stringprotected org.apache.spark.sql.SparkSessionspark()protected org.apache.spark.api.java.JavaSparkContextprotected StagedSparkTableprotected org.apache.spark.sql.Dataset<FileInfo>statisticsFileDS(Table table, Set<Long> snapshotIds) tableLocation(String location) Sets the table location for the newly created Iceberg table.tableProperties(Map<String, String> properties) Sets table properties in the newly created Iceberg table.tableProperty(String property, String value) Sets a table property in the newly created Iceberg table.protected org.apache.spark.sql.catalyst.catalog.CatalogTableprotected <T> TwithJobGroupInfo(JobGroupInfo info, Supplier<T> supplier)  
- 
Field Details
- 
LOCATION
- See Also:
 
 - 
ICEBERG_METADATA_FOLDER
- See Also:
 
 - 
EXCLUDED_PROPERTIES
 - 
MANIFEST
- See Also:
 
 - 
MANIFEST_LIST
- See Also:
 
 - 
STATISTICS_FILES
- See Also:
 
 - 
OTHERS
- See Also:
 
 - 
FILE_PATH
- See Also:
 
 - 
LAST_MODIFIED
- See Also:
 
 - 
COMMA_SPLITTER
protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER - 
COMMA_JOINER
protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER 
 - 
 - 
Method Details
- 
self
 - 
destCatalog
protected org.apache.spark.sql.connector.catalog.StagingTableCatalog destCatalog() - 
destTableIdent
protected org.apache.spark.sql.connector.catalog.Identifier destTableIdent() - 
as
Description copied from interface:SnapshotTableSets the table identifier for the newly created Iceberg table.- Specified by:
 asin interfaceSnapshotTable- Parameters:
 ident- the destination table identifier- Returns:
 - this for method chaining
 
 - 
tableProperties
Description copied from interface:SnapshotTableSets table properties in the newly created Iceberg table. Any properties with the same key name will be overwritten.- Specified by:
 tablePropertiesin interfaceSnapshotTable- Parameters:
 properties- a map of properties to be included- Returns:
 - this for method chaining
 
 - 
tableProperty
Description copied from interface:SnapshotTableSets a table property in the newly created Iceberg table. Any properties with the same key name will be overwritten.- Specified by:
 tablePropertyin interfaceSnapshotTable- Parameters:
 property- the key of the property to addvalue- the value of the property to add- Returns:
 - this for method chaining
 
 - 
executeWith
Description copied from interface:SnapshotTableSets the executor service to use for parallel file reading. The default is not using executor service.- Specified by:
 executeWithin interfaceSnapshotTable- Parameters:
 service- executor service- Returns:
 - this for method chaining
 
 - 
execute
Description copied from interface:ActionExecutes this action.- Specified by:
 executein interfaceAction<SnapshotTable,SnapshotTable.Result> - Returns:
 - the result of this action
 
 - 
destTableProps
 - 
checkSourceCatalog
protected org.apache.spark.sql.connector.catalog.TableCatalog checkSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)  - 
tableLocation
Description copied from interface:SnapshotTableSets the table location for the newly created Iceberg table.- Specified by:
 tableLocationin interfaceSnapshotTable- Parameters:
 location- a table location- Returns:
 - this for method chaining
 
 - 
sourceTableLocation
 - 
v1SourceTable
protected org.apache.spark.sql.catalyst.catalog.CatalogTable v1SourceTable() - 
sourceCatalog
protected org.apache.spark.sql.connector.catalog.TableCatalog sourceCatalog() - 
sourceTableIdent
protected org.apache.spark.sql.connector.catalog.Identifier sourceTableIdent() - 
setProperties
 - 
setProperty
 - 
additionalProperties
 - 
checkDestinationCatalog
protected org.apache.spark.sql.connector.catalog.StagingTableCatalog checkDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)  - 
stageDestTable
 - 
ensureNameMappingPresent
 - 
getMetadataLocation
 - 
spark
protected org.apache.spark.sql.SparkSession spark() - 
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext() - 
option
 - 
options
 - 
options
 - 
withJobGroupInfo
 - 
newJobGroupInfo
 - 
newStaticTable
 - 
contentFileDS
 - 
contentFileDS
 - 
manifestDS
 - 
manifestDS
 - 
manifestListDS
 - 
manifestListDS
 - 
statisticsFileDS
 - 
otherMetadataFileDS
 - 
allReachableOtherMetadataFileDS
 - 
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type)  - 
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.- Parameters:
 executorService- an executor service to use for parallel deletesdeleteFunc- a delete funcfiles- an iterator of Spark rows of the structure (path: String, type: String)- Returns:
 - stats on which files were deleted
 
 - 
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files)  
 -