java.lang.Object
- org.apache.iceberg.spark.Spark3Util

public class Spark3Util
extends java.lang.Object

Nested Class Summary

Nested Classes
Modifier and Type	Class	Description
`static class`	`Spark3Util.CatalogAndIdentifier`	This mimics a class inside of Spark which is private inside of LookupCatalog.
`static class`	`Spark3Util.DescribeSchemaVisitor`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method	Description
`static UpdateProperties`	`applyPropertyChanges(UpdateProperties pendingUpdate, java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)`	Applies a list of Spark table changes to an `UpdateProperties` operation.
`static UpdateSchema`	`applySchemaChanges(UpdateSchema pendingUpdate, java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)`	Applies a list of Spark table changes to an `UpdateSchema` operation.
`static int`	`batchSize(java.util.Map<java.lang.String,java.lang.String> properties, org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)`
`static Distribution`	`buildRequiredDistribution(Table table)`
`static SortOrder[]`	`buildRequiredOrdering(Distribution distribution, Table table)`
`static Spark3Util.CatalogAndIdentifier`	`catalogAndIdentifier(java.lang.String description, org.apache.spark.sql.SparkSession spark, java.lang.String name)`
`static Spark3Util.CatalogAndIdentifier`	`catalogAndIdentifier(java.lang.String description, org.apache.spark.sql.SparkSession spark, java.lang.String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)`
`static Spark3Util.CatalogAndIdentifier`	`catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.lang.String name)`
`static Spark3Util.CatalogAndIdentifier`	`catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.lang.String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)`
`static Spark3Util.CatalogAndIdentifier`	`catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.util.List<java.lang.String> nameParts)`
`static Spark3Util.CatalogAndIdentifier`	`catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.util.List<java.lang.String> nameParts, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)`	A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
`static SortOrder[]`	`convert(SortOrder sortOrder)`
`static java.lang.String`	`describe(Expression expr)`
`static java.lang.String`	`describe(Schema schema)`
`static java.lang.String`	`describe(SortOrder order)`
`static java.lang.String`	`describe(Type type)`
`static DistributionMode`	`distributionModeFor(Table table)`
`static java.util.List<SparkTableUtil.SparkPartition>`	`getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.hadoop.fs.Path rootPath, java.lang.String format)`	Use Spark to list all partitions in the table.
`static TableIdentifier`	`identifierToTableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)`
`static boolean`	`isLocalityEnabled(FileIO io, java.lang.String location, org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)`
`static boolean`	`isVectorizationEnabled(FileFormat fileFormat, java.util.Map<java.lang.String,java.lang.String> properties, org.apache.spark.sql.RuntimeConfig sessionConf, org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)`
`static java.lang.Boolean`	`propertyAsBoolean(org.apache.spark.sql.util.CaseInsensitiveStringMap options, java.lang.String property, java.lang.Boolean defaultValue)`
`static java.lang.Integer`	`propertyAsInt(org.apache.spark.sql.util.CaseInsensitiveStringMap options, java.lang.String property, java.lang.Integer defaultValue)`
`static java.lang.Long`	`propertyAsLong(org.apache.spark.sql.util.CaseInsensitiveStringMap options, java.lang.String property, java.lang.Long defaultValue)`
`static java.util.Map<java.lang.String,java.lang.String>`	`rebuildCreateProperties(java.util.Map<java.lang.String,java.lang.String> createProperties)`
`static Table`	`toIcebergTable(org.apache.spark.sql.connector.catalog.Table table)`
`static Term`	`toIcebergTerm(org.apache.spark.sql.connector.expressions.Transform transform)`
`static PartitionSpec`	`toPartitionSpec(Schema schema, org.apache.spark.sql.connector.expressions.Transform[] partitioning)`	Converts Spark transforms into a `PartitionSpec`.
`static org.apache.spark.sql.connector.expressions.Transform[]`	`toTransforms(PartitionSpec spec)`	Converts a PartitionSpec to Spark transforms.
`static org.apache.spark.sql.catalyst.TableIdentifier`	`toV1TableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Detail

rebuildCreateProperties

public static java.util.Map<java.lang.String,java.lang.String> rebuildCreateProperties(java.util.Map<java.lang.String,java.lang.String> createProperties)

applyPropertyChanges
```
public static UpdateProperties applyPropertyChanges(UpdateProperties pendingUpdate,
                                                    java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)
```
Applies a list of Spark table changes to an UpdateProperties operation.

Parameters:

pendingUpdate - an uncommitted UpdateProperties operation to configure

changes - a list of Spark table changes

Returns:

the UpdateProperties operation configured with the changes

applySchemaChanges
```
public static UpdateSchema applySchemaChanges(UpdateSchema pendingUpdate,
                                              java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)
```
Applies a list of Spark table changes to an UpdateSchema operation.

Parameters:

pendingUpdate - an uncommitted UpdateSchema operation to configure

changes - a list of Spark table changes

Returns:

the UpdateSchema operation configured with the changes

toIcebergTable

public static Table toIcebergTable(org.apache.spark.sql.connector.catalog.Table table)

toTransforms
```
public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms(PartitionSpec spec)
```
Converts a PartitionSpec to Spark transforms.

Parameters:

spec - a PartitionSpec

Returns:

an array of Transforms

buildRequiredDistribution

public static Distribution buildRequiredDistribution(Table table)

buildRequiredOrdering

public static SortOrder[] buildRequiredOrdering(Distribution distribution,
                                                Table table)

distributionModeFor

public static DistributionMode distributionModeFor(Table table)

convert

public static SortOrder[] convert(SortOrder sortOrder)

toIcebergTerm

public static Term toIcebergTerm(org.apache.spark.sql.connector.expressions.Transform transform)

toPartitionSpec

public static PartitionSpec toPartitionSpec(Schema schema,
                                            org.apache.spark.sql.connector.expressions.Transform[] partitioning)

Converts Spark transforms into a PartitionSpec.

Parameters:: schema - the table schema; partitioning - Spark Transforms
Returns:: a PartitionSpec

describe

public static java.lang.String describe(Expression expr)

describe

public static java.lang.String describe(Schema schema)

describe

public static java.lang.String describe(Type type)

describe

public static java.lang.String describe(SortOrder order)

isLocalityEnabled

public static boolean isLocalityEnabled(FileIO io,
                                        java.lang.String location,
                                        org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)

isVectorizationEnabled

public static boolean isVectorizationEnabled(FileFormat fileFormat,
                                             java.util.Map<java.lang.String,java.lang.String> properties,
                                             org.apache.spark.sql.RuntimeConfig sessionConf,
                                             org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)

batchSize

public static int batchSize(java.util.Map<java.lang.String,java.lang.String> properties,
                            org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)

propertyAsLong

public static java.lang.Long propertyAsLong(org.apache.spark.sql.util.CaseInsensitiveStringMap options,
                                            java.lang.String property,
                                            java.lang.Long defaultValue)

propertyAsInt

public static java.lang.Integer propertyAsInt(org.apache.spark.sql.util.CaseInsensitiveStringMap options,
                                              java.lang.String property,
                                              java.lang.Integer defaultValue)

propertyAsBoolean

public static java.lang.Boolean propertyAsBoolean(org.apache.spark.sql.util.CaseInsensitiveStringMap options,
                                                  java.lang.String property,
                                                  java.lang.Boolean defaultValue)

catalogAndIdentifier

public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark,
                                                                   java.lang.String name)
                                                            throws org.apache.spark.sql.catalyst.parser.ParseException

Throws:: org.apache.spark.sql.catalyst.parser.ParseException

catalogAndIdentifier

public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark,
                                                                   java.lang.String name,
                                                                   org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
                                                            throws org.apache.spark.sql.catalyst.parser.ParseException

Throws:: org.apache.spark.sql.catalyst.parser.ParseException

catalogAndIdentifier

public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(java.lang.String description,
                                                                   org.apache.spark.sql.SparkSession spark,
                                                                   java.lang.String name)

catalogAndIdentifier

public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(java.lang.String description,
                                                                   org.apache.spark.sql.SparkSession spark,
                                                                   java.lang.String name,
                                                                   org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)

catalogAndIdentifier

public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark,
                                                                   java.util.List<java.lang.String> nameParts)

catalogAndIdentifier
```
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark,
                                                                   java.util.List<java.lang.String> nameParts,
                                                                   org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
```
A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents

Parameters:

spark - Spark session to use for resolution

nameParts - Multipart identifier representing a table

defaultCatalog - Catalog to use if none is specified

Returns:

The CatalogPlugin and Identifier for the table

identifierToTableIdentifier

public static TableIdentifier identifierToTableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)

getPartitions

public static java.util.List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark,
                                                                          org.apache.hadoop.fs.Path rootPath,
                                                                          java.lang.String format)

Use Spark to list all partitions in the table.

Parameters:: spark - a Spark session; rootPath - a table identifier; format - format of the file
Returns:: all table's partitions

toV1TableIdentifier

public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)

Class Spark3Util

Nested Class Summary

Method Summary

Methods inherited from class java.lang.Object

Method Detail

rebuildCreateProperties

applyPropertyChanges

applySchemaChanges

toIcebergTable

toTransforms

buildRequiredDistribution

buildRequiredOrdering

distributionModeFor

convert

toIcebergTerm

toPartitionSpec

describe

describe

describe

describe

isLocalityEnabled

isVectorizationEnabled

batchSize

propertyAsLong

propertyAsInt

propertyAsBoolean

catalogAndIdentifier

catalogAndIdentifier

catalogAndIdentifier

catalogAndIdentifier

catalogAndIdentifier

catalogAndIdentifier

identifierToTableIdentifier

getPartitions

toV1TableIdentifier