java.lang.Object

org.apache.iceberg.spark.data.vectorized.ColumnarBatchUtil

public class ColumnarBatchUtil extends Object

Method Summary

Modifier and Type

Method

Description

static boolean[]

buildIsDeleted(org.apache.spark.sql.vectorized.ColumnVector[] columnVectors, DeleteFilter<org.apache.spark.sql.catalyst.InternalRow> deletes, long rowStartPosInBatch, int batchSize)

Builds a boolean array to indicate if a row is deleted or not.

static Pair<int[],Integer>

buildRowIdMapping(org.apache.spark.sql.vectorized.ColumnVector[] columnVectors, DeleteFilter<org.apache.spark.sql.catalyst.InternalRow> deletes, long rowStartPosInBatch, int batchSize)

Builds a row ID mapping inside a batch to skip deleted rows.

static org.apache.spark.sql.vectorized.ColumnVector[]

removeExtraColumns(DeleteFilter<org.apache.spark.sql.catalyst.InternalRow> deletes, org.apache.spark.sql.vectorized.ColumnVector[] columnVectors)

Removes extra column vectors added for processing equality delete filters that are not part of the final query output.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- buildRowIdMapping
  
  public static Pair<int[],Integer> buildRowIdMapping(org.apache.spark.sql.vectorized.ColumnVector[] columnVectors, DeleteFilter<org.apache.spark.sql.catalyst.InternalRow> deletes, long rowStartPosInBatch, int batchSize)
  Builds a row ID mapping inside a batch to skip deleted rows.
  Initial state Data values: [v0, v1, v2, v3, v4, v5, v6, v7] Row ID mapping: [0, 1, 2, 3, 4, 5, 6, 7] Apply position deletes Position deletes: 2, 6 Row ID mapping: [0, 1, 3, 4, 5, 7, -, -] (6 live records) Apply equality deletes Equality deletes: v1, v2, v3 Row ID mapping: [0, 4, 5, 7, -, -, -, -] (4 live records)
  Parameters:
  
  columnVectors - the array of column vectors for the batch
  
  deletes - the delete filter containing delete information
  
  rowStartPosInBatch - the starting position of the row in the batch
  
  batchSize - the size of the batch
  
  Returns:
  
  the mapping array and the number of live rows, or null if nothing is deleted
- buildIsDeleted
  
  public static boolean[] buildIsDeleted(org.apache.spark.sql.vectorized.ColumnVector[] columnVectors, DeleteFilter<org.apache.spark.sql.catalyst.InternalRow> deletes, long rowStartPosInBatch, int batchSize)
  Builds a boolean array to indicate if a row is deleted or not.
  Initial state Data values: [v0, v1, v2, v3, v4, v5, v6, v7] Is deleted array: [F, F, F, F, F, F, F, F] Apply position deletes Position deletes: 2, 6 Is deleted array: [F, F, T, F, F, F, T, F] (6 live records) Apply equality deletes Equality deletes: v1, v2, v3 Is deleted array: [F, T, T, T, F, F, T, F] (4 live records)
  Parameters:
  
  columnVectors - the array of column vectors for the batch.
  
  deletes - the delete filter containing information about which rows should be deleted.
  
  rowStartPosInBatch - the starting position of the row in the batch, used to calculate the absolute position of the rows in the context of the entire dataset.
  
  batchSize - the number of rows in the current batch.
  
  Returns:
  
  an array of boolean values to indicate if a row is deleted or not
- removeExtraColumns
  
  public static org.apache.spark.sql.vectorized.ColumnVector[] removeExtraColumns(DeleteFilter<org.apache.spark.sql.catalyst.InternalRow> deletes, org.apache.spark.sql.vectorized.ColumnVector[] columnVectors)
  
  Removes extra column vectors added for processing equality delete filters that are not part of the final query output.
  During query execution, additional columns may be included in the schema to evaluate equality delete filters. For example, if the table schema contains columns C1, C2, C3, C4, and C5, and the query is 'SELECT C5 FROM table'. While equality delete filters are applied on C3 and C4, the processing schema includes C5, C3, and C4. These extra columns (C3 and C4) are needed to identify rows to delete but are not included in the final result.
  This method removes the extra column vectors from the end of column vectors array, ensuring only the expected column vectors remain.
  
  Parameters:
  
  deletes - the delete filter containing delete information.
  
  columnVectors - the array of column vectors representing query result data
  
  Returns:
  
  a new column vectors array with extra column vectors removed, or the original column vectors array if no extra column vectors are found

Class ColumnarBatchUtil

Method Summary

Methods inherited from class java.lang.Object

Method Details

buildRowIdMapping

buildIsDeleted

removeExtraColumns