Package org.apache.iceberg.spark
Class SparkUtil
- java.lang.Object
-
- org.apache.iceberg.spark.SparkUtil
-
public class SparkUtil extends java.lang.Object
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
HANDLE_TIMESTAMP_WITHOUT_TIMEZONE
static java.lang.String
TIMESTAMP_WITHOUT_TIMEZONE_ERROR
static java.lang.String
USE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static boolean
canHandleTimestampWithoutZone(java.util.Map<java.lang.String,java.lang.String> readerConfig, org.apache.spark.sql.RuntimeConfig sessionConf)
Allow reading/writing timestamp without time zone as timestamp with time zone.static <C,T>
Pair<C,T>catalogAndIdentifier(java.util.List<java.lang.String> nameParts, java.util.function.Function<java.lang.String,C> catalogProvider, java.util.function.BiFunction<java.lang.String[],java.lang.String,T> identiferProvider, C currentCatalog, java.lang.String[] currentNamespace)
A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier representsstatic org.apache.hadoop.conf.Configuration
hadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)
Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*` Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`.static boolean
hasTimestampWithoutZone(Schema schema)
Responsible for checking if the table schema has a timestamp without timezone columnstatic FileIO
serializableFileIO(Table table)
static boolean
useTimestampWithoutZoneInNewTables(org.apache.spark.sql.RuntimeConfig sessionConf)
Check whether the spark session config contains aUSE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES
property.static void
validatePartitionTransforms(PartitionSpec spec)
Check whether the partition transforms in a spec can be used to write data.
-
-
-
Field Detail
-
HANDLE_TIMESTAMP_WITHOUT_TIMEZONE
public static final java.lang.String HANDLE_TIMESTAMP_WITHOUT_TIMEZONE
- See Also:
- Constant Field Values
-
TIMESTAMP_WITHOUT_TIMEZONE_ERROR
public static final java.lang.String TIMESTAMP_WITHOUT_TIMEZONE_ERROR
-
USE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES
public static final java.lang.String USE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES
- See Also:
- Constant Field Values
-
-
Method Detail
-
validatePartitionTransforms
public static void validatePartitionTransforms(PartitionSpec spec)
Check whether the partition transforms in a spec can be used to write data.- Parameters:
spec
- a PartitionSpec- Throws:
java.lang.UnsupportedOperationException
- if the spec contains unknown partition transforms
-
catalogAndIdentifier
public static <C,T> Pair<C,T> catalogAndIdentifier(java.util.List<java.lang.String> nameParts, java.util.function.Function<java.lang.String,C> catalogProvider, java.util.function.BiFunction<java.lang.String[],java.lang.String,T> identiferProvider, C currentCatalog, java.lang.String[] currentNamespace)
A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents- Parameters:
nameParts
- Multipart identifier representing a table- Returns:
- The CatalogPlugin and Identifier for the table
-
hasTimestampWithoutZone
public static boolean hasTimestampWithoutZone(Schema schema)
Responsible for checking if the table schema has a timestamp without timezone column- Parameters:
schema
- table schema to check if it contains a timestamp without timezone column- Returns:
- boolean indicating if the schema passed in has a timestamp field without a timezone
-
canHandleTimestampWithoutZone
public static boolean canHandleTimestampWithoutZone(java.util.Map<java.lang.String,java.lang.String> readerConfig, org.apache.spark.sql.RuntimeConfig sessionConf)
Allow reading/writing timestamp without time zone as timestamp with time zone. Generally, this is not safe as timestamp without time zone is supposed to represent wall clock time semantics, i.e. no matter the reader/writer timezone 3PM should always be read as 3PM, but timestamp with time zone represents instant semantics, i.e the timestamp is adjusted so that the corresponding time in the reader timezone is displayed. When set to false (default), we throw an exception at runtime "Spark does not support timestamp without time zone fields" if reading timestamp without time zone fields- Parameters:
readerConfig
- table read optionssessionConf
- spark session configurations- Returns:
- boolean indicating if reading timestamps without timezone is allowed
-
useTimestampWithoutZoneInNewTables
public static boolean useTimestampWithoutZoneInNewTables(org.apache.spark.sql.RuntimeConfig sessionConf)
Check whether the spark session config contains aUSE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES
property. Default value - false If true in new table all timestamp fields will be stored asTypes.TimestampType.withoutZone()
, otherwiseTypes.TimestampType.withZone()
will be used- Parameters:
sessionConf
- a spark runtime config- Returns:
- true if the session config has
USE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES
property and this property is set to true
-
hadoopConfCatalogOverrides
public static org.apache.hadoop.conf.Configuration hadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)
Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*` Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`. The SparkCatalog allows for hadoop configurations to be overridden per catalog, by setting them on the SQLConf, where the following will add the property "fs.default.name" with value "hdfs://hanksnamenode:8020" to the catalog's hadoop configuration. SparkSession.builder() .config(s"spark.sql.catalog.$catalogName.hadoop.fs.default.name", "hdfs://hanksnamenode:8020") .getOrCreate()- Parameters:
spark
- The current Spark sessioncatalogName
- Name of the catalog to find overrides for.- Returns:
- the Hadoop Configuration that should be used for this catalog, with catalog specific overrides applied.
-
-