Class ColumnarBatchReader

  • All Implemented Interfaces:
    VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>

    public class ColumnarBatchReader
    extends java.lang.Object
    implements VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
    VectorizedReader that returns Spark's ColumnarBatch to support Spark's vectorized read path. The ColumnarBatch returned is created by passing in the Arrow vectors populated via delegated read calls to VectorReader(s).
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void close()
      Release any resources allocated.
      org.apache.spark.sql.vectorized.ColumnarBatch read​(org.apache.spark.sql.vectorized.ColumnarBatch reuse, int numRowsToRead)
      Reads a batch of type @param <T> and of size numRows
      void setBatchSize​(int batchSize)  
      void setRowGroupInfo​(org.apache.parquet.column.page.PageReadStore pageStore, java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,​org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData, long rowPosition)
      Sets the row group information to be used with this reader
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • ColumnarBatchReader

        public ColumnarBatchReader​(java.util.List<VectorizedReader<?>> readers)
    • Method Detail

      • setRowGroupInfo

        public final void setRowGroupInfo​(org.apache.parquet.column.page.PageReadStore pageStore,
                                          java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,​org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData,
                                          long rowPosition)
        Description copied from interface: VectorizedReader
        Sets the row group information to be used with this reader
        Specified by:
        setRowGroupInfo in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
        Parameters:
        pageStore - row group information for all the columns
        metaData - map of ColumnPath -> ColumnChunkMetaData for the row group
        rowPosition - the row group's row offset in the parquet file
      • read

        public final org.apache.spark.sql.vectorized.ColumnarBatch read​(org.apache.spark.sql.vectorized.ColumnarBatch reuse,
                                                                        int numRowsToRead)
        Description copied from interface: VectorizedReader
        Reads a batch of type @param <T> and of size numRows
        Specified by:
        read in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
        Parameters:
        reuse - container for the last batch to be reused for next batch
        numRowsToRead - number of rows to read
        Returns:
        batch of records of type @param <T>
      • close

        public void close()
        Description copied from interface: VectorizedReader
        Release any resources allocated.
        Specified by:
        close in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
      • setBatchSize

        public void setBatchSize​(int batchSize)
        Specified by:
        setBatchSize in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>