org.apache.iceberg.spark.actions.ComputePartitionStatsSparkAction

All Implemented Interfaces:: Action<ComputePartitionStats,ComputePartitionStats.Result>, ComputePartitionStats

public class ComputePartitionStatsSparkAction extends Object implements ComputePartitionStats

Computes the stats incrementally after the snapshot that has partition stats file till the given snapshot (uses current snapshot if not specified) and writes the combined result into a PartitionStatisticsFile after merging the stats for a given snapshot. Does a full compute if previous statistics file does not exist. Also registers the PartitionStatisticsFile to table metadata.

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.iceberg.actions.ComputePartitionStats
ComputePartitionStats.Result
Field Summary

Fields

Modifier and Type

Field

Description

protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner

COMMA_JOINER

protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter

COMMA_SPLITTER

protected static final String

FILE_PATH

protected static final String

LAST_MODIFIED

protected static final String

MANIFEST

protected static final String

MANIFEST_LIST

protected static final String

OTHERS

protected static final String

STATISTICS_FILES
Method Summary

Modifier and Type

Method

Description

protected org.apache.spark.sql.Dataset<FileInfo>

allReachableOtherMetadataFileDS(Table table)

protected org.apache.spark.sql.Dataset<FileInfo>

contentFileDS(Table table)

protected org.apache.spark.sql.Dataset<FileInfo>

contentFileDS(Table table, Set<Long> snapshotIds)

protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary

deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files)

Deletes files and keeps track of how many files were removed for each file type.

protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary

deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files)

ComputePartitionStats.Result

execute()

Executes this action.

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>

loadMetadataTable(Table table, MetadataTableType type)

protected org.apache.spark.sql.Dataset<FileInfo>

manifestDS(Table table)

protected org.apache.spark.sql.Dataset<FileInfo>

manifestDS(Table table, Set<Long> snapshotIds)

protected org.apache.spark.sql.Dataset<FileInfo>

manifestListDS(Table table)

protected org.apache.spark.sql.Dataset<FileInfo>

manifestListDS(Table table, Set<Long> snapshotIds)

protected JobGroupInfo

newJobGroupInfo(String groupId, String desc)

protected Table

newStaticTable(String metadataFileLocation, FileIO io)

protected Table

newStaticTable(TableMetadata metadata, FileIO io)

ComputePartitionStatsSparkAction

option(String name, String value)

protected Map<String,String>

options()

ComputePartitionStatsSparkAction

options(Map<String,String> newOptions)

protected org.apache.spark.sql.Dataset<FileInfo>

otherMetadataFileDS(Table table)

protected ComputePartitionStatsSparkAction

self()

ComputePartitionStats

snapshot(long newSnapshotId)

Choose the table snapshot to compute partition stats.

protected org.apache.spark.sql.SparkSession

spark()

protected org.apache.spark.api.java.JavaSparkContext

sparkContext()

protected org.apache.spark.sql.Dataset<FileInfo>

statisticsFileDS(Table table, Set<Long> snapshotIds)

protected <T> T

withJobGroupInfo(JobGroupInfo info, Supplier<T> supplier)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.iceberg.actions.Action
option, options

Field Details
- MANIFEST
  
  protected static final String MANIFEST
  See Also:
  
  Constant Field Values
- MANIFEST_LIST
  
  protected static final String MANIFEST_LIST
  See Also:
  
  Constant Field Values
- STATISTICS_FILES
  
  protected static final String STATISTICS_FILES
  See Also:
  
  Constant Field Values
- OTHERS
  
  protected static final String OTHERS
  See Also:
  
  Constant Field Values
- FILE_PATH
  
  protected static final String FILE_PATH
  See Also:
  
  Constant Field Values
- LAST_MODIFIED
  
  protected static final String LAST_MODIFIED
  See Also:
  
  Constant Field Values
- COMMA_SPLITTER
  
  protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER
- COMMA_JOINER
  
  protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
Method Details
- self
  
  protected ComputePartitionStatsSparkAction self()
- snapshot
  
  public ComputePartitionStats snapshot(long newSnapshotId)
  
  Description copied from interface: ComputePartitionStats
  
  Choose the table snapshot to compute partition stats.
  
  Specified by:
  
  snapshot in interface ComputePartitionStats
  
  Parameters:
  
  newSnapshotId - long ID of the snapshot for which stats need to be computed
  
  Returns:
  
  this for method chaining
- execute
  
  public ComputePartitionStats.Result execute()
  
  Description copied from interface: Action
  
  Executes this action.
  
  Specified by:
  
  execute in interface Action<ComputePartitionStats,ComputePartitionStats.Result>
  
  Returns:
  
  the result of this action
- spark
  
  protected org.apache.spark.sql.SparkSession spark()
- sparkContext
  
  protected org.apache.spark.api.java.JavaSparkContext sparkContext()
- option
  
  public ComputePartitionStatsSparkAction option(String name, String value)
- options
  
  public ComputePartitionStatsSparkAction options(Map<String,String> newOptions)
- options
  
  protected Map<String,String> options()
- withJobGroupInfo
  
  protected <T> T withJobGroupInfo(JobGroupInfo info, Supplier<T> supplier)
- newJobGroupInfo
  
  protected JobGroupInfo newJobGroupInfo(String groupId, String desc)
- newStaticTable
  
  protected Table newStaticTable(TableMetadata metadata, FileIO io)
- newStaticTable
  
  protected Table newStaticTable(String metadataFileLocation, FileIO io)
- contentFileDS
  
  protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table)
- contentFileDS
  
  protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table, Set<Long> snapshotIds)
- manifestDS
  
  protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table)
- manifestDS
  
  protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table, Set<Long> snapshotIds)
- manifestListDS
  
  protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table)
- manifestListDS
  
  protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table, Set<Long> snapshotIds)
- statisticsFileDS
  
  protected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table, Set<Long> snapshotIds)
- otherMetadataFileDS
  
  protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table)
- allReachableOtherMetadataFileDS
  
  protected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS(Table table)
- loadMetadataTable
  
  protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type)
- deleteFiles
  
  protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files)
  
  Deletes files and keeps track of how many files were removed for each file type.
  
  Parameters:
  
  executorService - an executor service to use for parallel deletes
  
  deleteFunc - a delete func
  
  files - an iterator of Spark rows of the structure (path: String, type: String)
  
  Returns:
  
  stats on which files were deleted
- deleteFiles
  
  protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files)

Class ComputePartitionStatsSparkAction

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.iceberg.actions.ComputePartitionStats

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.iceberg.actions.Action

Field Details

MANIFEST

MANIFEST_LIST

STATISTICS_FILES

OTHERS

FILE_PATH

LAST_MODIFIED

COMMA_SPLITTER

COMMA_JOINER

Method Details

self

snapshot

execute

spark

sparkContext

option

options

options

withJobGroupInfo

newJobGroupInfo

newStaticTable

newStaticTable

contentFileDS

contentFileDS

manifestDS

manifestDS

manifestListDS

manifestListDS

statisticsFileDS

otherMetadataFileDS

allReachableOtherMetadataFileDS

loadMetadataTable

deleteFiles

deleteFiles