Class BaseDeleteLoader

java.lang.Object
org.apache.iceberg.data.BaseDeleteLoader
All Implemented Interfaces:
DeleteLoader

public class BaseDeleteLoader extends Object implements DeleteLoader
  • Constructor Details

  • Method Details

    • canCache

      protected boolean canCache(long size)
      Checks if the given number of bytes can be cached.

      Implementations should override this method if they support caching. It is also recommended to use the provided size as a guideline to decide whether the value is eligible for caching. For instance, it may be beneficial to discard values that are too large to optimize the cache performance and utilization.

    • getOrLoad

      protected <V> V getOrLoad(String key, Supplier<V> valueSupplier, long valueSize)
      Gets the cached value for the key or populates the cache with a new mapping.

      If the value for the specified key is in the cache, it should be returned. If the value is not in the cache, implementations should compute the value using the provided supplier, cache it, and then return it.

      This method will be called only if canCache(long) returned true.

    • loadEqualityDeletes

      public StructLikeSet loadEqualityDeletes(Iterable<DeleteFile> deleteFiles, Schema projection)
      Description copied from interface: DeleteLoader
      Loads the content of equality delete files into a set.
      Specified by:
      loadEqualityDeletes in interface DeleteLoader
      Parameters:
      deleteFiles - equality delete files
      projection - a projection of columns to load
      Returns:
      a set of equality deletes
    • loadPositionDeletes

      public PositionDeleteIndex loadPositionDeletes(Iterable<DeleteFile> deleteFiles, CharSequence filePath)
      Loads the content of a deletion vector or position delete files for a given data file path into a position index.

      The deletion vector is currently loaded without caching as the existing Puffin reader requires at least 3 requests to fetch the entire file. Caching a single deletion vector may only be useful when multiple data file splits are processed on the same node, which is unlikely as task locality is not guaranteed.

      For position delete files, however, there is no efficient way to read deletes for a particular data file. Therefore, caching may be more effective as such delete files potentially apply to many data files, especially in unpartitioned tables and tables with deep partitions. If a position delete file qualifies for caching, this method will attempt to cache a position index for each referenced data file.

      Specified by:
      loadPositionDeletes in interface DeleteLoader
      Parameters:
      deleteFiles - a deletion vector or position delete files
      filePath - the data file path for which to load deletes
      Returns:
      a position delete index for the provided data file path