Package org.apache.iceberg.spark
Class Spark3Util
- java.lang.Object
-
- org.apache.iceberg.spark.Spark3Util
-
public class Spark3Util extends java.lang.Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classSpark3Util.CatalogAndIdentifierThis mimics a class inside of Spark which is private inside of LookupCatalog.static classSpark3Util.DescribeSchemaVisitor
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static UpdatePropertiesapplyPropertyChanges(UpdateProperties pendingUpdate, java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)Applies a list of Spark table changes to anUpdatePropertiesoperation.static UpdateSchemaapplySchemaChanges(UpdateSchema pendingUpdate, java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)Applies a list of Spark table changes to anUpdateSchemaoperation.static intbatchSize(java.util.Map<java.lang.String,java.lang.String> properties, org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)static DistributionbuildRequiredDistribution(Table table)static SortOrder[]buildRequiredOrdering(Distribution distribution, Table table)static Spark3Util.CatalogAndIdentifiercatalogAndIdentifier(java.lang.String description, org.apache.spark.sql.SparkSession spark, java.lang.String name)static Spark3Util.CatalogAndIdentifiercatalogAndIdentifier(java.lang.String description, org.apache.spark.sql.SparkSession spark, java.lang.String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)static Spark3Util.CatalogAndIdentifiercatalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.lang.String name)static Spark3Util.CatalogAndIdentifiercatalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.lang.String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)static Spark3Util.CatalogAndIdentifiercatalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.util.List<java.lang.String> nameParts)static Spark3Util.CatalogAndIdentifiercatalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.util.List<java.lang.String> nameParts, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier representsstatic SortOrder[]convert(SortOrder sortOrder)static java.lang.Stringdescribe(Expression expr)static java.lang.Stringdescribe(Schema schema)static java.lang.Stringdescribe(SortOrder order)static java.lang.Stringdescribe(Type type)static DistributionModedistributionModeFor(Table table)static java.util.List<SparkTableUtil.SparkPartition>getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.hadoop.fs.Path rootPath, java.lang.String format)Use Spark to list all partitions in the table.static TableIdentifieridentifierToTableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)static booleanisLocalityEnabled(FileIO io, java.lang.String location, org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)static booleanisVectorizationEnabled(FileFormat fileFormat, java.util.Map<java.lang.String,java.lang.String> properties, org.apache.spark.sql.RuntimeConfig sessionConf, org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)static java.lang.BooleanpropertyAsBoolean(org.apache.spark.sql.util.CaseInsensitiveStringMap options, java.lang.String property, java.lang.Boolean defaultValue)static java.lang.IntegerpropertyAsInt(org.apache.spark.sql.util.CaseInsensitiveStringMap options, java.lang.String property, java.lang.Integer defaultValue)static java.lang.LongpropertyAsLong(org.apache.spark.sql.util.CaseInsensitiveStringMap options, java.lang.String property, java.lang.Long defaultValue)static java.util.Map<java.lang.String,java.lang.String>rebuildCreateProperties(java.util.Map<java.lang.String,java.lang.String> createProperties)static TabletoIcebergTable(org.apache.spark.sql.connector.catalog.Table table)static TermtoIcebergTerm(org.apache.spark.sql.connector.expressions.Transform transform)static PartitionSpectoPartitionSpec(Schema schema, org.apache.spark.sql.connector.expressions.Transform[] partitioning)Converts Spark transforms into aPartitionSpec.static org.apache.spark.sql.connector.expressions.Transform[]toTransforms(PartitionSpec spec)Converts a PartitionSpec to Spark transforms.static org.apache.spark.sql.catalyst.TableIdentifiertoV1TableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)
-
-
-
Method Detail
-
rebuildCreateProperties
public static java.util.Map<java.lang.String,java.lang.String> rebuildCreateProperties(java.util.Map<java.lang.String,java.lang.String> createProperties)
-
applyPropertyChanges
public static UpdateProperties applyPropertyChanges(UpdateProperties pendingUpdate, java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)
Applies a list of Spark table changes to anUpdatePropertiesoperation.- Parameters:
pendingUpdate- an uncommitted UpdateProperties operation to configurechanges- a list of Spark table changes- Returns:
- the UpdateProperties operation configured with the changes
-
applySchemaChanges
public static UpdateSchema applySchemaChanges(UpdateSchema pendingUpdate, java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)
Applies a list of Spark table changes to anUpdateSchemaoperation.- Parameters:
pendingUpdate- an uncommitted UpdateSchema operation to configurechanges- a list of Spark table changes- Returns:
- the UpdateSchema operation configured with the changes
-
toIcebergTable
public static Table toIcebergTable(org.apache.spark.sql.connector.catalog.Table table)
-
toTransforms
public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms(PartitionSpec spec)
Converts a PartitionSpec to Spark transforms.- Parameters:
spec- a PartitionSpec- Returns:
- an array of Transforms
-
buildRequiredDistribution
public static Distribution buildRequiredDistribution(Table table)
-
buildRequiredOrdering
public static SortOrder[] buildRequiredOrdering(Distribution distribution, Table table)
-
distributionModeFor
public static DistributionMode distributionModeFor(Table table)
-
toIcebergTerm
public static Term toIcebergTerm(org.apache.spark.sql.connector.expressions.Transform transform)
-
toPartitionSpec
public static PartitionSpec toPartitionSpec(Schema schema, org.apache.spark.sql.connector.expressions.Transform[] partitioning)
Converts Spark transforms into aPartitionSpec.- Parameters:
schema- the table schemapartitioning- Spark Transforms- Returns:
- a PartitionSpec
-
describe
public static java.lang.String describe(Expression expr)
-
describe
public static java.lang.String describe(Schema schema)
-
describe
public static java.lang.String describe(Type type)
-
describe
public static java.lang.String describe(SortOrder order)
-
isLocalityEnabled
public static boolean isLocalityEnabled(FileIO io, java.lang.String location, org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)
-
isVectorizationEnabled
public static boolean isVectorizationEnabled(FileFormat fileFormat, java.util.Map<java.lang.String,java.lang.String> properties, org.apache.spark.sql.RuntimeConfig sessionConf, org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)
-
batchSize
public static int batchSize(java.util.Map<java.lang.String,java.lang.String> properties, org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)
-
propertyAsLong
public static java.lang.Long propertyAsLong(org.apache.spark.sql.util.CaseInsensitiveStringMap options, java.lang.String property, java.lang.Long defaultValue)
-
propertyAsInt
public static java.lang.Integer propertyAsInt(org.apache.spark.sql.util.CaseInsensitiveStringMap options, java.lang.String property, java.lang.Integer defaultValue)
-
propertyAsBoolean
public static java.lang.Boolean propertyAsBoolean(org.apache.spark.sql.util.CaseInsensitiveStringMap options, java.lang.String property, java.lang.Boolean defaultValue)
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.lang.String name) throws org.apache.spark.sql.catalyst.parser.ParseException
- Throws:
org.apache.spark.sql.catalyst.parser.ParseException
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.lang.String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) throws org.apache.spark.sql.catalyst.parser.ParseException
- Throws:
org.apache.spark.sql.catalyst.parser.ParseException
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(java.lang.String description, org.apache.spark.sql.SparkSession spark, java.lang.String name)
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(java.lang.String description, org.apache.spark.sql.SparkSession spark, java.lang.String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.util.List<java.lang.String> nameParts)
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.util.List<java.lang.String> nameParts, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents- Parameters:
spark- Spark session to use for resolutionnameParts- Multipart identifier representing a tabledefaultCatalog- Catalog to use if none is specified- Returns:
- The CatalogPlugin and Identifier for the table
-
identifierToTableIdentifier
public static TableIdentifier identifierToTableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)
-
getPartitions
public static java.util.List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.hadoop.fs.Path rootPath, java.lang.String format)
Use Spark to list all partitions in the table.- Parameters:
spark- a Spark sessionrootPath- a table identifierformat- format of the file- Returns:
- all table's partitions
-
toV1TableIdentifier
public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)
-
-