Class VectorizedArrowReader
java.lang.Object
org.apache.iceberg.arrow.vectorized.VectorizedArrowReader
- All Implemented Interfaces:
VectorizedReader<VectorHolder>
- Direct Known Subclasses:
VectorizedArrowReader.ConstantVectorReader
,VectorizedArrowReader.DeletedVectorReader
VectorReader(s)
that read in a batch of values into Arrow vectors. It
also takes care of allocating the right kind of Arrow vectors depending on the corresponding
Iceberg/Parquet data types.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
A Dummy Vector Reader which doesn't actually read files, instead it returns a dummy VectorHolder which indicates the constant value which should be used for this column.static class
A Dummy Vector Reader which doesn't actually read files. -
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionVectorizedArrowReader
(org.apache.parquet.column.ColumnDescriptor desc, Types.NestedField icebergField, org.apache.arrow.memory.BufferAllocator ra, boolean setArrowValidityVector) -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Release any resources allocated.protected Types.NestedField
static VectorizedArrowReader
lastUpdated
(Long baseRowId, Long fileLastUpdated, VectorizedArrowReader seqReader) static VectorizedArrowReader
nulls()
static VectorizedArrowReader
static VectorizedArrowReader
read
(VectorHolder reuse, int numValsToRead) Reads a batch of type @param <T> and of size numRowsstatic VectorizedReader
<?> replaceWithMetadataReader
(Types.NestedField icebergField, VectorizedReader<?> reader, Map<Integer, ?> idToConstant, boolean setArrowValidityVector) static VectorizedArrowReader
rowIds
(Long baseRowId, VectorizedArrowReader idReader) void
setBatchSize
(int batchSize) void
setRowGroupInfo
(org.apache.parquet.column.page.PageReadStore source, Map<org.apache.parquet.hadoop.metadata.ColumnPath, org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metadata) Sets the row group information to be used with this readertoString()
-
Field Details
-
DEFAULT_BATCH_SIZE
public static final int DEFAULT_BATCH_SIZE- See Also:
-
-
Constructor Details
-
VectorizedArrowReader
public VectorizedArrowReader(org.apache.parquet.column.ColumnDescriptor desc, Types.NestedField icebergField, org.apache.arrow.memory.BufferAllocator ra, boolean setArrowValidityVector)
-
-
Method Details
-
icebergField
-
setBatchSize
public void setBatchSize(int batchSize) - Specified by:
setBatchSize
in interfaceVectorizedReader<VectorHolder>
-
read
Description copied from interface:VectorizedReader
Reads a batch of type @param <T> and of size numRows- Specified by:
read
in interfaceVectorizedReader<VectorHolder>
- Parameters:
reuse
- container for the last batch to be reused for next batchnumValsToRead
- number of rows to read- Returns:
- batch of records of type @param <T>
-
setRowGroupInfo
public void setRowGroupInfo(org.apache.parquet.column.page.PageReadStore source, Map<org.apache.parquet.hadoop.metadata.ColumnPath, org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metadata) Description copied from interface:VectorizedReader
Sets the row group information to be used with this reader- Specified by:
setRowGroupInfo
in interfaceVectorizedReader<VectorHolder>
- Parameters:
source
- row group information for all the columnsmetadata
- map ofColumnPath
->ColumnChunkMetaData
for the row group
-
close
public void close()Description copied from interface:VectorizedReader
Release any resources allocated.- Specified by:
close
in interfaceVectorizedReader<VectorHolder>
-
toString
-
nulls
-
positions
-
positionsWithSetArrowValidityVector
-
rowIds
-
lastUpdated
public static VectorizedArrowReader lastUpdated(Long baseRowId, Long fileLastUpdated, VectorizedArrowReader seqReader) -
replaceWithMetadataReader
public static VectorizedReader<?> replaceWithMetadataReader(Types.NestedField icebergField, VectorizedReader<?> reader, Map<Integer, ?> idToConstant, boolean setArrowValidityVector)
-