Package org.apache.iceberg.spark
Class ChangelogIterator
- java.lang.Object
-
- org.apache.iceberg.spark.ChangelogIterator
-
- All Implemented Interfaces:
java.util.Iterator<org.apache.spark.sql.Row>
- Direct Known Subclasses:
ComputeUpdateIterator
,RemoveNetCarryoverIterator
public abstract class ChangelogIterator extends java.lang.Object implements java.util.Iterator<org.apache.spark.sql.Row>
An iterator that transforms rows from changelog tables within a single Spark task.
-
-
Field Summary
Fields Modifier and Type Field Description protected static java.lang.String
DELETE
protected static java.lang.String
INSERT
protected static java.lang.String
UPDATE_AFTER
protected static java.lang.String
UPDATE_BEFORE
-
Constructor Summary
Constructors Modifier Constructor Description protected
ChangelogIterator(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected java.lang.String
changeType(org.apache.spark.sql.Row row)
protected int
changeTypeIndex()
static java.util.Iterator<org.apache.spark.sql.Row>
computeUpdates(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType, java.lang.String[] identifierFields)
Creates an iterator composingRemoveCarryoverIterator
andComputeUpdateIterator
to remove carry-over rows and compute update rowsprotected static int[]
generateIndicesToIdentifySameRow(int totalColumnCount, java.util.Set<java.lang.Integer> metadataColumnIndices)
protected boolean
isDifferentValue(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int idx)
protected boolean
isSameRecord(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int[] indicesToIdentifySameRow)
static java.util.Iterator<org.apache.spark.sql.Row>
removeCarryovers(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)
Creates an iterator that removes carry-over rows from a changelog table.static java.util.Iterator<org.apache.spark.sql.Row>
removeNetCarryovers(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)
protected java.util.Iterator<org.apache.spark.sql.Row>
rowIterator()
protected org.apache.spark.sql.types.StructType
rowType()
-
-
-
Method Detail
-
changeTypeIndex
protected int changeTypeIndex()
-
rowType
protected org.apache.spark.sql.types.StructType rowType()
-
changeType
protected java.lang.String changeType(org.apache.spark.sql.Row row)
-
rowIterator
protected java.util.Iterator<org.apache.spark.sql.Row> rowIterator()
-
computeUpdates
public static java.util.Iterator<org.apache.spark.sql.Row> computeUpdates(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType, java.lang.String[] identifierFields)
Creates an iterator composingRemoveCarryoverIterator
andComputeUpdateIterator
to remove carry-over rows and compute update rows- Parameters:
rowIterator
- the iterator of rows from a changelog tablerowType
- the schema of the rowsidentifierFields
- the names of the identifier columns, which determine if rows are the same- Returns:
- a new iterator instance
-
removeCarryovers
public static java.util.Iterator<org.apache.spark.sql.Row> removeCarryovers(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)
Creates an iterator that removes carry-over rows from a changelog table.- Parameters:
rowIterator
- the iterator of rows from a changelog tablerowType
- the schema of the rows- Returns:
- a new iterator instance
-
removeNetCarryovers
public static java.util.Iterator<org.apache.spark.sql.Row> removeNetCarryovers(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)
-
isSameRecord
protected boolean isSameRecord(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int[] indicesToIdentifySameRow)
-
isDifferentValue
protected boolean isDifferentValue(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int idx)
-
generateIndicesToIdentifySameRow
protected static int[] generateIndicesToIdentifySameRow(int totalColumnCount, java.util.Set<java.lang.Integer> metadataColumnIndices)
-
-