public class ArrowReader extends CloseableGroup
ColumnarBatch
. See open(CloseableIterable)
()} to learn about the behavior of the iterator.
The following Iceberg data types are supported and have been tested:
Types.BooleanType
, Arrow: Types.MinorType.BIT
Types.IntegerType
, Arrow: Types.MinorType.INT
Types.LongType
, Arrow: Types.MinorType.BIGINT
Types.FloatType
, Arrow: Types.MinorType.FLOAT4
Types.DoubleType
, Arrow: Types.MinorType.FLOAT8
Types.StringType
, Arrow: Types.MinorType.VARCHAR
Types.TimestampType
(both with and without timezone), Arrow: Types.MinorType.TIMEMICRO
Types.BinaryType
, Arrow: Types.MinorType.VARBINARY
Types.DateType
, Arrow: Types.MinorType.DATEDAY
Types.TimeType
, Arrow: Types.MinorType.TIMEMICRO
Types.UUIDType
, Arrow: Types.MinorType.FIXEDSIZEBINARY
(16)
Features that don't work in this implementation:
Types.ListType
, Types.MapType
, Types.StructType
, Types.FixedType
and Types.DecimalType
See
https://github.com/apache/iceberg/issues/2485 and
https://github.com/apache/iceberg/issues/2486.
Constructor and Description |
---|
ArrowReader(TableScan scan,
int batchSize,
boolean reuseContainers)
Create a new instance of the reader.
|
Modifier and Type | Method and Description |
---|---|
void |
close()
Close all the registered resources.
|
CloseableIterator<ColumnarBatch> |
open(CloseableIterable<CombinedScanTask> tasks)
Returns a new iterator of
ColumnarBatch objects. |
addCloseable, addCloseable, setSuppressCloseFailure
public ArrowReader(TableScan scan, int batchSize, boolean reuseContainers)
scan
- the table scan object.batchSize
- the maximum number of rows per Arrow batch.reuseContainers
- whether to reuse Arrow vectors when iterating through the data. If set
to false
, every Iterator.next()
call creates new instances of Arrow
vectors. If set to true
, the Arrow vectors in the previous Iterator.next()
may be reused for the data returned in the current Iterator.next()
. This option
avoids allocating memory again and again. Irrespective of the value of reuseContainers
, the Arrow vectors in the previous Iterator.next()
call are closed
before creating new instances if the current Iterator.next()
.public CloseableIterator<ColumnarBatch> open(CloseableIterable<CombinedScanTask> tasks)
ColumnarBatch
objects.
Note that the reader owns the ColumnarBatch
objects and takes care of closing them.
The caller should not hold onto a ColumnarBatch
or try to close them.
If reuseContainers
is false
, the Arrow vectors in the previous ColumnarBatch
are closed before returning the next ColumnarBatch
object. This implies
that the caller should either use the ColumnarBatch
or transfer the ownership of ColumnarBatch
before getting the next ColumnarBatch
.
If reuseContainers
is true
, the Arrow vectors in the previous ColumnarBatch
may be reused for the next ColumnarBatch
. This implies that the caller
should either use the ColumnarBatch
or deep copy the ColumnarBatch
before
getting the next ColumnarBatch
.
This method works for only when the following conditions are true:
SUPPORTED_TYPES
).
UnsupportedOperationException
is thrown.public void close() throws java.io.IOException
CloseableGroup
close
in interface java.io.Closeable
close
in interface java.lang.AutoCloseable
close
in class CloseableGroup
java.io.IOException