java.lang.Object
- org.apache.iceberg.spark.ChangelogIterator

All Implemented Interfaces:

java.util.Iterator<org.apache.spark.sql.Row>

Direct Known Subclasses:

ComputeUpdateIterator, RemoveNetCarryoverIterator
```
public abstract class ChangelogIterator
extends java.lang.Object
implements java.util.Iterator<org.apache.spark.sql.Row>
```
An iterator that transforms rows from changelog tables within a single Spark task.

Field Summary

Fields
Modifier and Type	Field	Description
`protected static java.lang.String`	`DELETE`
`protected static java.lang.String`	`INSERT`
`protected static java.lang.String`	`UPDATE_AFTER`
`protected static java.lang.String`	`UPDATE_BEFORE`

Constructor Summary

Constructors
Modifier	Constructor	Description
`protected`	`ChangelogIterator(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`protected java.lang.String`	`changeType(org.apache.spark.sql.Row row)`
`protected int`	`changeTypeIndex()`
`static java.util.Iterator<org.apache.spark.sql.Row>`	`computeUpdates(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType, java.lang.String[] identifierFields)`	Creates an iterator composing `RemoveCarryoverIterator` and `ComputeUpdateIterator` to remove carry-over rows and compute update rows
`protected static int[]`	`generateIndicesToIdentifySameRow(int totalColumnCount, java.util.Set<java.lang.Integer> metadataColumnIndices)`
`protected boolean`	`isDifferentValue(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int idx)`
`protected boolean`	`isSameRecord(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int[] indicesToIdentifySameRow)`
`static java.util.Iterator<org.apache.spark.sql.Row>`	`removeCarryovers(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)`	Creates an iterator that removes carry-over rows from a changelog table.
`static java.util.Iterator<org.apache.spark.sql.Row>`	`removeNetCarryovers(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)`
`protected java.util.Iterator<org.apache.spark.sql.Row>`	`rowIterator()`
`protected org.apache.spark.sql.types.StructType`	`rowType()`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface java.util.Iterator
forEachRemaining, hasNext, next, remove

Field Detail

DELETE

protected static final java.lang.String DELETE

INSERT

protected static final java.lang.String INSERT

UPDATE_BEFORE

protected static final java.lang.String UPDATE_BEFORE

UPDATE_AFTER

protected static final java.lang.String UPDATE_AFTER

Constructor Detail

ChangelogIterator

protected ChangelogIterator(java.util.Iterator<org.apache.spark.sql.Row> rowIterator,
                            org.apache.spark.sql.types.StructType rowType)

Method Detail

changeTypeIndex
```
protected int changeTypeIndex()
```

rowType

protected org.apache.spark.sql.types.StructType rowType()

changeType

protected java.lang.String changeType(org.apache.spark.sql.Row row)

rowIterator

protected java.util.Iterator<org.apache.spark.sql.Row> rowIterator()

computeUpdates

public static java.util.Iterator<org.apache.spark.sql.Row> computeUpdates(java.util.Iterator<org.apache.spark.sql.Row> rowIterator,
                                                                          org.apache.spark.sql.types.StructType rowType,
                                                                          java.lang.String[] identifierFields)

Creates an iterator composing RemoveCarryoverIterator and ComputeUpdateIterator to remove carry-over rows and compute update rows

Parameters:: rowIterator - the iterator of rows from a changelog table; rowType - the schema of the rows; identifierFields - the names of the identifier columns, which determine if rows are the same
Returns:: a new iterator instance

removeCarryovers

public static java.util.Iterator<org.apache.spark.sql.Row> removeCarryovers(java.util.Iterator<org.apache.spark.sql.Row> rowIterator,
                                                                            org.apache.spark.sql.types.StructType rowType)

Creates an iterator that removes carry-over rows from a changelog table.

Parameters:: rowIterator - the iterator of rows from a changelog table; rowType - the schema of the rows
Returns:: a new iterator instance

removeNetCarryovers

public static java.util.Iterator<org.apache.spark.sql.Row> removeNetCarryovers(java.util.Iterator<org.apache.spark.sql.Row> rowIterator,
                                                                               org.apache.spark.sql.types.StructType rowType)

isSameRecord

protected boolean isSameRecord(org.apache.spark.sql.Row currentRow,
                               org.apache.spark.sql.Row nextRow,
                               int[] indicesToIdentifySameRow)

isDifferentValue

protected boolean isDifferentValue(org.apache.spark.sql.Row currentRow,
                                   org.apache.spark.sql.Row nextRow,
                                   int idx)

generateIndicesToIdentifySameRow

protected static int[] generateIndicesToIdentifySameRow(int totalColumnCount,
                                                        java.util.Set<java.lang.Integer> metadataColumnIndices)

Class ChangelogIterator