Class BaseMigrateTableSparkAction

  • All Implemented Interfaces:
    Action<MigrateTable,​MigrateTable.Result>, MigrateTable

    public class BaseMigrateTableSparkAction
    extends java.lang.Object
    implements MigrateTable
    Takes a Spark table in the source catalog and attempts to transform it into an Iceberg table in the same location with the same identifier. Once complete the identifier which previously referred to a non-Iceberg table will refer to the newly migrated Iceberg table.
    • Field Detail

      • ICEBERG_METADATA_FOLDER

        protected static final java.lang.String ICEBERG_METADATA_FOLDER
        See Also:
        Constant Field Values
      • EXCLUDED_PROPERTIES

        protected static final java.util.List<java.lang.String> EXCLUDED_PROPERTIES
    • Constructor Detail

      • BaseMigrateTableSparkAction

        public BaseMigrateTableSparkAction​(org.apache.spark.sql.SparkSession spark,
                                           org.apache.spark.sql.connector.catalog.CatalogPlugin sourceCatalog,
                                           org.apache.spark.sql.connector.catalog.Identifier sourceTableIdent)
    • Method Detail

      • destCatalog

        protected org.apache.spark.sql.connector.catalog.StagingTableCatalog destCatalog()
      • destTableIdent

        protected org.apache.spark.sql.connector.catalog.Identifier destTableIdent()
      • tableProperties

        public MigrateTable tableProperties​(java.util.Map<java.lang.String,​java.lang.String> properties)
        Description copied from interface: MigrateTable
        Sets table properties in the newly created Iceberg table. Any properties with the same key name will be overwritten.
        Specified by:
        tableProperties in interface MigrateTable
        Parameters:
        properties - a map of properties to set
        Returns:
        this for method chaining
      • tableProperty

        public MigrateTable tableProperty​(java.lang.String property,
                                          java.lang.String value)
        Description copied from interface: MigrateTable
        Sets a table property in the newly created Iceberg table. Any properties with the same key will be overwritten.
        Specified by:
        tableProperty in interface MigrateTable
        Parameters:
        property - a table property name
        value - a table property value
        Returns:
        this for method chaining
      • destTableProps

        protected java.util.Map<java.lang.String,​java.lang.String> destTableProps()
      • checkSourceCatalog

        protected org.apache.spark.sql.connector.catalog.TableCatalog checkSourceCatalog​(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
      • sourceTableLocation

        protected java.lang.String sourceTableLocation()
      • v1SourceTable

        protected org.apache.spark.sql.catalyst.catalog.CatalogTable v1SourceTable()
      • sourceCatalog

        protected org.apache.spark.sql.connector.catalog.TableCatalog sourceCatalog()
      • sourceTableIdent

        protected org.apache.spark.sql.connector.catalog.Identifier sourceTableIdent()
      • setProperties

        protected void setProperties​(java.util.Map<java.lang.String,​java.lang.String> properties)
      • setProperty

        protected void setProperty​(java.lang.String key,
                                   java.lang.String value)
      • additionalProperties

        protected java.util.Map<java.lang.String,​java.lang.String> additionalProperties()
      • checkDestinationCatalog

        protected org.apache.spark.sql.connector.catalog.StagingTableCatalog checkDestinationCatalog​(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
      • ensureNameMappingPresent

        protected void ensureNameMappingPresent​(Table table)
      • getMetadataLocation

        protected java.lang.String getMetadataLocation​(Table table)
      • spark

        protected org.apache.spark.sql.SparkSession spark()
      • sparkContext

        protected org.apache.spark.api.java.JavaSparkContext sparkContext()
      • option

        public ThisT option​(java.lang.String name,
                            java.lang.String value)
        Description copied from interface: Action
        Configures this action with an extra option.

        Certain actions allow users to control internal details of their execution via options.

        Specified by:
        option in interface Action<ThisT,​R>
        Parameters:
        name - an option name
        value - an option value
        Returns:
        this for method chaining
      • options

        public ThisT options​(java.util.Map<java.lang.String,​java.lang.String> newOptions)
        Description copied from interface: Action
        Configures this action with extra options.

        Certain actions allow users to control internal details of their execution via options.

        Specified by:
        options in interface Action<ThisT,​R>
        Parameters:
        newOptions - a map of extra options
        Returns:
        this for method chaining
      • options

        protected java.util.Map<java.lang.String,​java.lang.String> options()
      • withJobGroupInfo

        protected <T> T withJobGroupInfo​(JobGroupInfo info,
                                         java.util.function.Supplier<T> supplier)
      • newJobGroupInfo

        protected JobGroupInfo newJobGroupInfo​(java.lang.String groupId,
                                               java.lang.String desc)
      • buildValidDataFileDF

        protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidDataFileDF​(Table table)
      • buildManifestFileDF

        protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestFileDF​(Table table)
      • buildManifestListDF

        protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestListDF​(Table table)
      • buildOtherMetadataFileDF

        protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildOtherMetadataFileDF​(Table table)
      • buildValidMetadataFileDF

        protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidMetadataFileDF​(Table table)
      • loadMetadataTable

        protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable​(Table table,
                                                                                           MetadataTableType type)