Class BaseDeleteLoader
- All Implemented Interfaces:
DeleteLoader
-
Constructor Summary
ConstructorsConstructorDescriptionBaseDeleteLoader
(Function<DeleteFile, InputFile> loadInputFile) BaseDeleteLoader
(Function<DeleteFile, InputFile> loadInputFile, ExecutorService workerPool) -
Method Summary
Modifier and TypeMethodDescriptionprotected boolean
canCache
(long size) Checks if the given number of bytes can be cached.protected <V> V
Gets the cached value for the key or populates the cache with a new mapping.loadEqualityDeletes
(Iterable<DeleteFile> deleteFiles, Schema projection) Loads the content of equality delete files into a set.loadPositionDeletes
(Iterable<DeleteFile> deleteFiles, CharSequence filePath) Loads the content of a deletion vector or position delete files for a given data file path into a position index.
-
Constructor Details
-
BaseDeleteLoader
-
BaseDeleteLoader
-
-
Method Details
-
canCache
protected boolean canCache(long size) Checks if the given number of bytes can be cached.Implementations should override this method if they support caching. It is also recommended to use the provided size as a guideline to decide whether the value is eligible for caching. For instance, it may be beneficial to discard values that are too large to optimize the cache performance and utilization.
-
getOrLoad
Gets the cached value for the key or populates the cache with a new mapping.If the value for the specified key is in the cache, it should be returned. If the value is not in the cache, implementations should compute the value using the provided supplier, cache it, and then return it.
This method will be called only if
canCache(long)
returned true. -
loadEqualityDeletes
Description copied from interface:DeleteLoader
Loads the content of equality delete files into a set.- Specified by:
loadEqualityDeletes
in interfaceDeleteLoader
- Parameters:
deleteFiles
- equality delete filesprojection
- a projection of columns to load- Returns:
- a set of equality deletes
-
loadPositionDeletes
public PositionDeleteIndex loadPositionDeletes(Iterable<DeleteFile> deleteFiles, CharSequence filePath) Loads the content of a deletion vector or position delete files for a given data file path into a position index.The deletion vector is currently loaded without caching as the existing Puffin reader requires at least 3 requests to fetch the entire file. Caching a single deletion vector may only be useful when multiple data file splits are processed on the same node, which is unlikely as task locality is not guaranteed.
For position delete files, however, there is no efficient way to read deletes for a particular data file. Therefore, caching may be more effective as such delete files potentially apply to many data files, especially in unpartitioned tables and tables with deep partitions. If a position delete file qualifies for caching, this method will attempt to cache a position index for each referenced data file.
- Specified by:
loadPositionDeletes
in interfaceDeleteLoader
- Parameters:
deleteFiles
- a deletion vector or position delete filesfilePath
- the data file path for which to load deletes- Returns:
- a position delete index for the provided data file path
-