Class ArrowReader
- All Implemented Interfaces:
Closeable,AutoCloseable
ColumnarBatch. See open(CloseableIterable) ()} to learn about the behavior of the iterator.
The following Iceberg data types are supported and have been tested:
- Iceberg:
Types.BooleanType, Arrow:Types.MinorType.BIT - Iceberg:
Types.IntegerType, Arrow:Types.MinorType.INT - Iceberg:
Types.LongType, Arrow:Types.MinorType.BIGINT - Iceberg:
Types.FloatType, Arrow:Types.MinorType.FLOAT4 - Iceberg:
Types.DoubleType, Arrow:Types.MinorType.FLOAT8 - Iceberg:
Types.DecimalType, Arrow:Types.MinorType.DECIMAL - Iceberg:
Types.StringType, Arrow:Types.MinorType.VARCHAR - Iceberg:
Types.TimestampType(both with and without timezone), Arrow:Types.MinorType.TIMESTAMPMICROandTypes.MinorType.TIMESTAMPMICROTZ - Iceberg:
Types.TimestampNanoType(both with and without timezone), Arrow:Types.MinorType.TIMESTAMPNANOandTypes.MinorType.TIMESTAMPNANOTZ - Iceberg:
Types.BinaryType, Arrow:Types.MinorType.VARBINARY - Iceberg:
Types.FixedType, Arrow:Types.MinorType.FIXEDSIZEBINARY - Iceberg:
Types.DateType, Arrow:Types.MinorType.DATEDAY - Iceberg:
Types.TimeType, Arrow:Types.MinorType.TIMEMICRO - Iceberg:
Types.UUIDType, Arrow:Types.MinorType.FIXEDSIZEBINARY(16)
Features that don't work in this implementation:
- Type promotion: In case of type promotion, the Arrow vector corresponding to the data type in the parquet file is returned instead of the data type in the latest schema. See https://github.com/apache/iceberg/issues/2483.
- Columns with constant values are physically encoded as a dictionary. The Arrow vector type is int32 instead of the type as per the schema. See https://github.com/apache/iceberg/issues/2484.
- Data types:
Types.ListType,Types.MapTypeandTypes.StructTypeSee https://github.com/apache/iceberg/issues/2485 and https://github.com/apache/iceberg/issues/2486. - Delete files are not supported. See https://github.com/apache/iceberg/issues/2487.
-
Constructor Summary
ConstructorsConstructorDescriptionArrowReader(TableScan scan, int batchSize, boolean reuseContainers) Create a new instance of the reader. -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Close all the registered resources.open(CloseableIterable<CombinedScanTask> tasks) Returns a new iterator ofColumnarBatchobjects.Methods inherited from class org.apache.iceberg.io.CloseableGroup
addCloseable, addCloseable, setSuppressCloseFailure
-
Constructor Details
-
ArrowReader
Create a new instance of the reader.- Parameters:
scan- the table scan object.batchSize- the maximum number of rows per Arrow batch.reuseContainers- whether to reuse Arrow vectors when iterating through the data. If set tofalse, everyIterator.next()call creates new instances of Arrow vectors. If set totrue, the Arrow vectors in the previousIterator.next()may be reused for the data returned in the currentIterator.next(). This option avoids allocating memory again and again. Irrespective of the value ofreuseContainers, the Arrow vectors in the previousIterator.next()call are closed before creating new instances if the currentIterator.next().
-
-
Method Details
-
open
Returns a new iterator ofColumnarBatchobjects.Note that the reader owns the
ColumnarBatchobjects and takes care of closing them. The caller should not hold onto aColumnarBatchor try to close them.If
reuseContainersisfalse, the Arrow vectors in the previousColumnarBatchare closed before returning the nextColumnarBatchobject. This implies that the caller should either use theColumnarBatchor transfer the ownership ofColumnarBatchbefore getting the nextColumnarBatch.If
reuseContainersistrue, the Arrow vectors in the previousColumnarBatchmay be reused for the nextColumnarBatch. This implies that the caller should either use theColumnarBatchor deep copy theColumnarBatchbefore getting the nextColumnarBatch.This method works for only when the following conditions are true:
- At least one column is queried,
- There are no delete files, and
- Supported data types are queried (see
SUPPORTED_TYPES).
UnsupportedOperationExceptionis thrown. -
close
Description copied from class:CloseableGroupClose all the registered resources. Close method of each resource will only be called once. Checked exception from AutoCloseable will be wrapped to runtime exception.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classCloseableGroup- Throws:
IOException
-