java.lang.Object
- org.apache.iceberg.spark.SparkUtil

public class SparkUtil
extends java.lang.Object

Field Summary

Fields
Modifier and Type	Field	Description
`static java.lang.String`	`HANDLE_TIMESTAMP_WITHOUT_TIMEZONE`
`static java.lang.String`	`TIMESTAMP_WITHOUT_TIMEZONE_ERROR`
`static java.lang.String`	`USE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method	Description
`static boolean`	`canHandleTimestampWithoutZone(java.util.Map<java.lang.String,java.lang.String> readerConfig, org.apache.spark.sql.RuntimeConfig sessionConf)`	Allow reading/writing timestamp without time zone as timestamp with time zone.
`static <C,T> Pair<C,T>`	`catalogAndIdentifier(java.util.List<java.lang.String> nameParts, java.util.function.Function<java.lang.String,C> catalogProvider, java.util.function.BiFunction<java.lang.String[],java.lang.String,T> identiferProvider, C currentCatalog, java.lang.String[] currentNamespace)`	A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
`static org.apache.hadoop.conf.Configuration`	`hadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)`	Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.` Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.`.
`static boolean`	`hasTimestampWithoutZone(Schema schema)`	Responsible for checking if the table schema has a timestamp without timezone column
`static FileIO`	`serializableFileIO(Table table)`
`static boolean`	`useTimestampWithoutZoneInNewTables(org.apache.spark.sql.RuntimeConfig sessionConf)`	Check whether the spark session config contains a `USE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES` property.
`static void`	`validatePartitionTransforms(PartitionSpec spec)`	Check whether the partition transforms in a spec can be used to write data.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - HANDLE_TIMESTAMP_WITHOUT_TIMEZONE
```
public static final java.lang.String HANDLE_TIMESTAMP_WITHOUT_TIMEZONE
```
    See Also:
    
    Constant Field Values
  - TIMESTAMP_WITHOUT_TIMEZONE_ERROR
```
public static final java.lang.String TIMESTAMP_WITHOUT_TIMEZONE_ERROR
```
  - USE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES
```
public static final java.lang.String USE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES
```
    See Also:
    
    Constant Field Values
- Method Detail
  - serializableFileIO
```
public static FileIO serializableFileIO(Table table)
```
  - validatePartitionTransforms
```
public static void validatePartitionTransforms(PartitionSpec spec)
```
    Check whether the partition transforms in a spec can be used to write data.
    
    Parameters:
    
    spec - a PartitionSpec
    
    Throws:
    
    java.lang.UnsupportedOperationException - if the spec contains unknown partition transforms
  - catalogAndIdentifier
```
public static <C,T> Pair<C,T> catalogAndIdentifier(java.util.List<java.lang.String> nameParts,
                                                               java.util.function.Function<java.lang.String,C> catalogProvider,
                                                               java.util.function.BiFunction<java.lang.String[],java.lang.String,T> identiferProvider,
                                                               C currentCatalog,
                                                               java.lang.String[] currentNamespace)
```
    A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
    
    Parameters:
    
    nameParts - Multipart identifier representing a table
    
    Returns:
    
    The CatalogPlugin and Identifier for the table
  - hasTimestampWithoutZone
```
public static boolean hasTimestampWithoutZone(Schema schema)
```
    Responsible for checking if the table schema has a timestamp without timezone column
    
    Parameters:
    
    schema - table schema to check if it contains a timestamp without timezone column
    
    Returns:
    
    boolean indicating if the schema passed in has a timestamp field without a timezone
  - canHandleTimestampWithoutZone
```
public static boolean canHandleTimestampWithoutZone(java.util.Map<java.lang.String,java.lang.String> readerConfig,
                                                    org.apache.spark.sql.RuntimeConfig sessionConf)
```
    Allow reading/writing timestamp without time zone as timestamp with time zone. Generally, this is not safe as timestamp without time zone is supposed to represent wall clock time semantics, i.e. no matter the reader/writer timezone 3PM should always be read as 3PM, but timestamp with time zone represents instant semantics, i.e the timestamp is adjusted so that the corresponding time in the reader timezone is displayed. When set to false (default), we throw an exception at runtime "Spark does not support timestamp without time zone fields" if reading timestamp without time zone fields
    
    Parameters:
    
    readerConfig - table read options
    
    sessionConf - spark session configurations
    
    Returns:
    
    boolean indicating if reading timestamps without timezone is allowed
  - useTimestampWithoutZoneInNewTables
```
public static boolean useTimestampWithoutZoneInNewTables(org.apache.spark.sql.RuntimeConfig sessionConf)
```
    Check whether the spark session config contains a USE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES property. Default value - false If true in new table all timestamp fields will be stored as Types.TimestampType.withoutZone(), otherwise Types.TimestampType.withZone() will be used
    
    Parameters:
    
    sessionConf - a spark runtime config
    
    Returns:
    
    true if the session config has USE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES property and this property is set to true
  - hadoopConfCatalogOverrides
```
public static org.apache.hadoop.conf.Configuration hadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark,
                                                                              java.lang.String catalogName)
```
    Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*` Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`. The SparkCatalog allows for hadoop configurations to be overridden per catalog, by setting them on the SQLConf, where the following will add the property "fs.default.name" with value "hdfs://hanksnamenode:8020" to the catalog's hadoop configuration. SparkSession.builder() .config(s"spark.sql.catalog.$catalogName.hadoop.fs.default.name", "hdfs://hanksnamenode:8020") .getOrCreate()
    
    Parameters:
    
    spark - The current Spark session
    
    catalogName - Name of the catalog to find overrides for.
    
    Returns:
    
    the Hadoop Configuration that should be used for this catalog, with catalog specific overrides applied.

Class SparkUtil

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

HANDLE_TIMESTAMP_WITHOUT_TIMEZONE

TIMESTAMP_WITHOUT_TIMEZONE_ERROR

USE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES

Method Detail

serializableFileIO

validatePartitionTransforms

catalogAndIdentifier

hasTimestampWithoutZone

canHandleTimestampWithoutZone

useTimestampWithoutZoneInNewTables

hadoopConfCatalogOverrides