java.lang.Object

org.apache.iceberg.SnapshotScan<ThisT,T,G>

Type Parameters:: ThisT - actual BaseScan implementation class type; T - type of ScanTask returned; G - type of ScanTaskGroup returned

All Implemented Interfaces:: Scan<ThisT,T,G>

Direct Known Subclasses:: AllDataFilesTable.AllDataFilesTableScan, AllDeleteFilesTable.AllDeleteFilesTableScan, AllFilesTable.AllFilesTableScan, AllManifestsTable.AllManifestsTableScan, DataFilesTable.DataFilesTableScan, DataTableScan, DeleteFilesTable.DeleteFilesTableScan, FilesTable.FilesTableScan, PositionDeletesTable.PositionDeletesBatchScan, SparkDistributedDataScan

public abstract class SnapshotScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>> extends Object

This is a common base class to share code between different BaseScan implementations that handle scans of a particular snapshot.

Field Summary

Fields

Modifier and Type

Field

Description

protected static final List<String>

DELETE_SCAN_COLUMNS

protected static final List<String>

DELETE_SCAN_WITH_STATS_COLUMNS

protected static final boolean

PLAN_SCANS_WITH_WORKER_POOL

protected static final List<String>

SCAN_COLUMNS

protected static final List<String>

SCAN_WITH_STATS_COLUMNS
Constructor Summary

Constructors

Modifier

Constructor

Description

protected

SnapshotScan(Table table, Schema schema, org.apache.iceberg.TableScanContext context)
Method Summary

Modifier and Type

Method

Description

ThisT

asOfTime(long timestampMillis)

ThisT

caseSensitive(boolean caseSensitive)

Create a new scan from this that, if data columns where selected via Scan.select(java.util.Collection), controls whether the match to the schema will be done with case sensitivity.

protected Set<Integer>

columnsToKeepStats()

protected org.apache.iceberg.TableScanContext

context()

protected abstract CloseableIterable<T>

doPlanFiles()

Expression

filter()

Returns this scan's filter Expression.

ThisT

filter(Expression expr)

Create a new scan from the results of this filtered by the Expression.

ThisT

ignoreResiduals()

Create a new scan from this that applies data filtering to files but not to rows in those files.

ThisT

includeColumnStats()

Create a new scan from this that loads the column stats with each data file.

ThisT

includeColumnStats(Collection<String> requestedColumns)

Create a new scan from this that loads the column stats for the specific columns with each data file.

protected FileIO

io()

boolean

isCaseSensitive()

Returns whether this scan is case-sensitive with respect to column names.

ThisT

metricsReporter(MetricsReporter reporter)

Create a new scan that will report scan metrics to the provided reporter in addition to reporters maintained by the scan.

protected abstract ThisT

newRefinedScan(Table newTable, Schema newSchema, org.apache.iceberg.TableScanContext newContext)

ThisT

option(String property, String value)

Create a new scan from this scan's configuration that will override the Table's behavior based on the incoming pair.

protected Map<String,String>

options()

protected ExecutorService

planExecutor()

CloseableIterable<T>

planFiles()

Plan tasks for this scan where each task reads a single file.

ThisT

planWith(ExecutorService executorService)

Create a new scan to use a particular executor to plan.

ThisT

project(Schema projectedSchema)

Create a new scan from this with the schema as its projection.

protected Expression

residualFilter()

protected List<String>

scanColumns()

protected ScanMetrics

scanMetrics()

Schema

schema()

Returns this scan's projection Schema.

ThisT

select(Collection<String> columns)

Create a new scan from this that will read the given data columns.

protected boolean

shouldIgnoreResiduals()

protected boolean

shouldPlanWithExecutor()

protected boolean

shouldReturnColumnStats()

Snapshot

snapshot()

protected Long

snapshotId()

int

splitLookback()

Returns the split lookback for this scan.

long

splitOpenFileCost()

Returns the split open file cost for this scan.

Table

table()

protected Schema

tableSchema()

long

targetSplitSize()

Returns the target split size for this scan.

String

toString()

ThisT

useRef(String name)

ThisT

useSnapshot(long scanSnapshotId)

protected boolean

useSnapshotSchema()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.iceberg.Scan
planTasks, select

Field Details
- SCAN_COLUMNS
  
  protected static final List<String> SCAN_COLUMNS
- SCAN_WITH_STATS_COLUMNS
  
  protected static final List<String> SCAN_WITH_STATS_COLUMNS
- DELETE_SCAN_COLUMNS
  
  protected static final List<String> DELETE_SCAN_COLUMNS
- DELETE_SCAN_WITH_STATS_COLUMNS
  
  protected static final List<String> DELETE_SCAN_WITH_STATS_COLUMNS
- PLAN_SCANS_WITH_WORKER_POOL
  
  protected static final boolean PLAN_SCANS_WITH_WORKER_POOL
Constructor Details
- SnapshotScan
  
  protected SnapshotScan(Table table, Schema schema, org.apache.iceberg.TableScanContext context)
Method Details
- snapshotId
  
  protected Long snapshotId()
- doPlanFiles
  
  protected abstract CloseableIterable<T> doPlanFiles()
- useSnapshotSchema
  
  protected boolean useSnapshotSchema()
- scanMetrics
  
  protected ScanMetrics scanMetrics()
- useSnapshot
  
  public ThisT useSnapshot(long scanSnapshotId)
- useRef
  
  public ThisT useRef(String name)
- asOfTime
  
  public ThisT asOfTime(long timestampMillis)
- planFiles
  
  public CloseableIterable<T> planFiles()
  
  Description copied from interface: Scan
  
  Plan tasks for this scan where each task reads a single file.
  Use Scan.planTasks() for planning balanced tasks where each task will read either a single file, a part of a file, or multiple files.
  
  Returns:
  
  an Iterable of tasks scanning entire files required by this scan
- snapshot
  
  public Snapshot snapshot()
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- table
  
  public Table table()
- io
  
  protected FileIO io()
- tableSchema
  
  protected Schema tableSchema()
- context
  
  protected org.apache.iceberg.TableScanContext context()
- options
  
  protected Map<String,String> options()
- scanColumns
  
  protected List<String> scanColumns()
- shouldReturnColumnStats
  
  protected boolean shouldReturnColumnStats()
- columnsToKeepStats
  
  protected Set<Integer> columnsToKeepStats()
- shouldIgnoreResiduals
  
  protected boolean shouldIgnoreResiduals()
- residualFilter
  
  protected Expression residualFilter()
- shouldPlanWithExecutor
  
  protected boolean shouldPlanWithExecutor()
- planExecutor
  
  protected ExecutorService planExecutor()
- newRefinedScan
  
  protected abstract ThisT newRefinedScan(Table newTable, Schema newSchema, org.apache.iceberg.TableScanContext newContext)
- option
  
  public ThisT option(String property, String value)
  
  Description copied from interface: Scan
  
  Create a new scan from this scan's configuration that will override the Table's behavior based on the incoming pair. Unknown properties will be ignored.
  
  Specified by:
  
  option in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
  
  Parameters:
  
  property - name of the table property to be overridden
  
  value - value to override with
  
  Returns:
  
  a new scan based on this with overridden behavior
- project
  
  public ThisT project(Schema projectedSchema)
  
  Description copied from interface: Scan
  
  Create a new scan from this with the schema as its projection.
  
  Specified by:
  
  project in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
  
  Parameters:
  
  projectedSchema - a projection schema
  
  Returns:
  
  a new scan based on this with the given projection
- caseSensitive
  
  public ThisT caseSensitive(boolean caseSensitive)
  
  Description copied from interface: Scan
  
  Create a new scan from this that, if data columns where selected via Scan.select(java.util.Collection), controls whether the match to the schema will be done with case sensitivity. Default is true.
  
  Specified by:
  
  caseSensitive in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
  
  Returns:
  
  a new scan based on this with case sensitivity as stated
- isCaseSensitive
  
  public boolean isCaseSensitive()
  
  Description copied from interface: Scan
  
  Returns whether this scan is case-sensitive with respect to column names.
  
  Specified by:
  
  isCaseSensitive in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
  
  Returns:
  
  true if case-sensitive, false otherwise.
- includeColumnStats
  
  public ThisT includeColumnStats()
  
  Description copied from interface: Scan
  
  Create a new scan from this that loads the column stats with each data file.
  Column stats include: value count, null value count, lower bounds, and upper bounds.
  
  Specified by:
  
  includeColumnStats in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
  
  Returns:
  
  a new scan based on this that loads column stats.
- includeColumnStats
  
  public ThisT includeColumnStats(Collection<String> requestedColumns)
  
  Description copied from interface: Scan
  
  Create a new scan from this that loads the column stats for the specific columns with each data file.
  Column stats include: value count, null value count, lower bounds, and upper bounds.
  
  Specified by:
  
  includeColumnStats in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
  
  Parameters:
  
  requestedColumns - column names for which to keep the stats.
  
  Returns:
  
  a new scan based on this that loads column stats for specific columns.
- select
  
  public ThisT select(Collection<String> columns)
  
  Description copied from interface: Scan
  
  Create a new scan from this that will read the given data columns. This produces an expected schema that includes all fields that are either selected or used by this scan's filter expression.
  
  Specified by:
  
  select in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
  
  Parameters:
  
  columns - column names from the table's schema
  
  Returns:
  
  a new scan based on this with the given projection columns
- filter
  
  public ThisT filter(Expression expr)
  
  Description copied from interface: Scan
  
  Create a new scan from the results of this filtered by the Expression.
  
  Specified by:
  
  filter in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
  
  Parameters:
  
  expr - a filter expression
  
  Returns:
  
  a new scan based on this with results filtered by the expression
- filter
  
  public Expression filter()
  
  Description copied from interface: Scan
  
  Returns this scan's filter Expression.
  
  Specified by:
  
  filter in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
  
  Returns:
  
  this scan's filter expression
- ignoreResiduals
  
  public ThisT ignoreResiduals()
  
  Description copied from interface: Scan
  
  Create a new scan from this that applies data filtering to files but not to rows in those files.
  
  Specified by:
  
  ignoreResiduals in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
  
  Returns:
  
  a new scan based on this that does not filter rows in files.
- planWith
  
  public ThisT planWith(ExecutorService executorService)
  
  Description copied from interface: Scan
  
  Create a new scan to use a particular executor to plan. The default worker pool will be used by default.
  
  Specified by:
  
  planWith in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
  
  Parameters:
  
  executorService - the provided executor
  
  Returns:
  
  a table scan that uses the provided executor to access manifests
- schema
  
  public Schema schema()
  
  Description copied from interface: Scan
  
  Returns this scan's projection Schema.
  If the projection schema was set directly using Scan.project(Schema), returns that schema.
  If the projection schema was set by calling Scan.select(Collection), returns a projection schema that includes the selected data fields and any fields used in the filter expression.
  
  Specified by:
  
  schema in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
  
  Returns:
  
  this scan's projection schema
- targetSplitSize
  
  public long targetSplitSize()
  
  Description copied from interface: Scan
  
  Returns the target split size for this scan.
  
  Specified by:
  
  targetSplitSize in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- splitLookback
  
  public int splitLookback()
  
  Description copied from interface: Scan
  
  Returns the split lookback for this scan.
  
  Specified by:
  
  splitLookback in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- splitOpenFileCost
  
  public long splitOpenFileCost()
  
  Description copied from interface: Scan
  
  Returns the split open file cost for this scan.
  
  Specified by:
  
  splitOpenFileCost in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- metricsReporter
  
  public ThisT metricsReporter(MetricsReporter reporter)
  
  Description copied from interface: Scan
  
  Create a new scan that will report scan metrics to the provided reporter in addition to reporters maintained by the scan.
  
  Specified by:
  
  metricsReporter in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>

Class SnapshotScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.iceberg.Scan

Field Details

SCAN_COLUMNS

SCAN_WITH_STATS_COLUMNS

DELETE_SCAN_COLUMNS

DELETE_SCAN_WITH_STATS_COLUMNS

PLAN_SCANS_WITH_WORKER_POOL

Constructor Details

SnapshotScan

Method Details

snapshotId

doPlanFiles

useSnapshotSchema

scanMetrics

useSnapshot

useRef

asOfTime

planFiles

snapshot

toString

table

io

tableSchema

context

options

scanColumns

shouldReturnColumnStats

columnsToKeepStats

shouldIgnoreResiduals

residualFilter

shouldPlanWithExecutor

planExecutor

newRefinedScan

option

project

caseSensitive

isCaseSensitive

includeColumnStats

includeColumnStats

select

filter

filter

ignoreResiduals

planWith

schema

targetSplitSize

splitLookback

splitOpenFileCost

metricsReporter