Class VectorizedArrowReader
java.lang.Object
org.apache.iceberg.arrow.vectorized.VectorizedArrowReader
- All Implemented Interfaces:
VectorizedReader<VectorHolder>
- Direct Known Subclasses:
VectorizedArrowReader.ConstantVectorReader,VectorizedArrowReader.DeletedVectorReader
VectorReader(s) that read in a batch of values into Arrow vectors. It
also takes care of allocating the right kind of Arrow vectors depending on the corresponding
Iceberg/Parquet data types.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classA Dummy Vector Reader which doesn't actually read files, instead it returns a dummy VectorHolder which indicates the constant value which should be used for this column.static classA Dummy Vector Reader which doesn't actually read files. -
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionVectorizedArrowReader(org.apache.parquet.column.ColumnDescriptor desc, Types.NestedField icebergField, org.apache.arrow.memory.BufferAllocator ra, boolean setArrowValidityVector) -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Release any resources allocated.protected Types.NestedFieldstatic VectorizedArrowReaderlastUpdated(Long baseRowId, Long fileLastUpdated, VectorizedArrowReader seqReader) static VectorizedArrowReadernulls()static VectorizedArrowReaderstatic VectorizedArrowReaderread(VectorHolder reuse, int numValsToRead) Reads a batch of type @param <T> and of size numRowsstatic VectorizedReader<?> replaceWithMetadataReader(Types.NestedField icebergField, VectorizedReader<?> reader, Map<Integer, ?> idToConstant, boolean setArrowValidityVector) static VectorizedArrowReaderrowIds(Long baseRowId, VectorizedArrowReader idReader) voidsetBatchSize(int batchSize) voidsetRowGroupInfo(org.apache.parquet.column.page.PageReadStore source, Map<org.apache.parquet.hadoop.metadata.ColumnPath, org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metadata) Sets the row group information to be used with this readertoString()
-
Field Details
-
DEFAULT_BATCH_SIZE
public static final int DEFAULT_BATCH_SIZE- See Also:
-
-
Constructor Details
-
VectorizedArrowReader
public VectorizedArrowReader(org.apache.parquet.column.ColumnDescriptor desc, Types.NestedField icebergField, org.apache.arrow.memory.BufferAllocator ra, boolean setArrowValidityVector)
-
-
Method Details
-
icebergField
-
setBatchSize
public void setBatchSize(int batchSize) - Specified by:
setBatchSizein interfaceVectorizedReader<VectorHolder>
-
read
Description copied from interface:VectorizedReaderReads a batch of type @param <T> and of size numRows- Specified by:
readin interfaceVectorizedReader<VectorHolder>- Parameters:
reuse- container for the last batch to be reused for next batchnumValsToRead- number of rows to read- Returns:
- batch of records of type @param <T>
-
setRowGroupInfo
public void setRowGroupInfo(org.apache.parquet.column.page.PageReadStore source, Map<org.apache.parquet.hadoop.metadata.ColumnPath, org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metadata) Description copied from interface:VectorizedReaderSets the row group information to be used with this reader- Specified by:
setRowGroupInfoin interfaceVectorizedReader<VectorHolder>- Parameters:
source- row group information for all the columnsmetadata- map ofColumnPath->ColumnChunkMetaDatafor the row group
-
close
public void close()Description copied from interface:VectorizedReaderRelease any resources allocated.- Specified by:
closein interfaceVectorizedReader<VectorHolder>
-
toString
-
nulls
-
positions
-
positionsWithSetArrowValidityVector
-
rowIds
-
lastUpdated
public static VectorizedArrowReader lastUpdated(Long baseRowId, Long fileLastUpdated, VectorizedArrowReader seqReader) -
replaceWithMetadataReader
public static VectorizedReader<?> replaceWithMetadataReader(Types.NestedField icebergField, VectorizedReader<?> reader, Map<Integer, ?> idToConstant, boolean setArrowValidityVector)
-