Interface AddedRowsScanTask

All Superinterfaces:
ChangelogScanTask, ContentScanTask<DataFile>, PartitionScanTask, ScanTask, Serializable

public interface AddedRowsScanTask extends ChangelogScanTask, ContentScanTask<DataFile>
A scan task for inserts generated by adding a data file to the table.

Note that added data files may have matching delete files. This may happen if a matching position delete file is committed in the same snapshot or if changes for multiple snapshots are squashed together.

Suppose snapshot S1 adds data files F1, F2, F3 and a position delete file, D1, that marks particular records in F1 as deleted. A scan for changes generated by S1 should include the following tasks:

  • AddedRowsScanTask(file=F1, deletes=[D1], snapshot=S1)
  • AddedRowsScanTask(file=F2, deletes=[], snapshot=S1)
  • AddedRowsScanTask(file=F3, deletes=[], snapshot=S1)

Readers consuming these tasks should produce added records with metadata like change ordinal and commit snapshot ID.

  • Method Details

    • deletes

      List<DeleteFile> deletes()
      A list of delete files to apply when reading the data file in this task.
      Returns:
      a list of delete files to apply
    • operation

      default ChangelogOperation operation()
      Description copied from interface: ChangelogScanTask
      Returns the type of changes produced by this task (i.e. insert/delete).
      Specified by:
      operation in interface ChangelogScanTask
    • sizeBytes

      default long sizeBytes()
      Description copied from interface: ScanTask
      The number of bytes that should be read by this scan task.
      Specified by:
      sizeBytes in interface ContentScanTask<DataFile>
      Specified by:
      sizeBytes in interface ScanTask
      Returns:
      the total number of bytes to read
    • filesCount

      default int filesCount()
      Description copied from interface: ScanTask
      The number of files that will be opened by this scan task.
      Specified by:
      filesCount in interface ScanTask
      Returns:
      the number of files to open