Package org.apache.iceberg.actions
Class SizeBasedDataRewriter
- java.lang.Object
-
- org.apache.iceberg.actions.SizeBasedFileRewriter<FileScanTask,DataFile>
-
- org.apache.iceberg.actions.SizeBasedDataRewriter
-
- All Implemented Interfaces:
FileRewriter<FileScanTask,DataFile>
public abstract class SizeBasedDataRewriter extends SizeBasedFileRewriter<FileScanTask,DataFile>
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
DELETE_FILE_THRESHOLD
The minimum number of deletes that needs to be associated with a data file for it to be considered for rewriting.static int
DELETE_FILE_THRESHOLD_DEFAULT
-
Fields inherited from class org.apache.iceberg.actions.SizeBasedFileRewriter
MAX_FILE_GROUP_SIZE_BYTES, MAX_FILE_GROUP_SIZE_BYTES_DEFAULT, MAX_FILE_SIZE_BYTES, MAX_FILE_SIZE_DEFAULT_RATIO, MIN_FILE_SIZE_BYTES, MIN_FILE_SIZE_DEFAULT_RATIO, MIN_INPUT_FILES, MIN_INPUT_FILES_DEFAULT, REWRITE_ALL, REWRITE_ALL_DEFAULT, TARGET_FILE_SIZE_BYTES
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
SizeBasedDataRewriter(Table table)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected long
defaultTargetFileSize()
protected java.lang.Iterable<java.util.List<FileScanTask>>
filterFileGroups(java.util.List<java.util.List<FileScanTask>> groups)
protected java.lang.Iterable<FileScanTask>
filterFiles(java.lang.Iterable<FileScanTask> tasks)
void
init(java.util.Map<java.lang.String,java.lang.String> options)
Initializes this rewriter using provided options.java.util.Set<java.lang.String>
validOptions()
Returns a set of supported options for this rewriter.-
Methods inherited from class org.apache.iceberg.actions.SizeBasedFileRewriter
enoughContent, enoughInputFiles, inputSize, numOutputFiles, outputSpec, outputSpecId, planFileGroups, splitSize, table, tooMuchContent, writeMaxFileSize, wronglySized
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.iceberg.actions.FileRewriter
description, rewrite
-
-
-
-
Field Detail
-
DELETE_FILE_THRESHOLD
public static final java.lang.String DELETE_FILE_THRESHOLD
The minimum number of deletes that needs to be associated with a data file for it to be considered for rewriting. If a data file has this number of deletes or more, it will be rewritten regardless of its file size determined bySizeBasedFileRewriter.MIN_FILE_SIZE_BYTES
andSizeBasedFileRewriter.MAX_FILE_SIZE_BYTES
. If a file group contains a file that satisfies this condition, the file group will be rewritten regardless of the number of files in the file group determined bySizeBasedFileRewriter.MIN_INPUT_FILES
.Defaults to Integer.MAX_VALUE, which means this feature is not enabled by default.
- See Also:
- Constant Field Values
-
DELETE_FILE_THRESHOLD_DEFAULT
public static final int DELETE_FILE_THRESHOLD_DEFAULT
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
SizeBasedDataRewriter
protected SizeBasedDataRewriter(Table table)
-
-
Method Detail
-
validOptions
public java.util.Set<java.lang.String> validOptions()
Description copied from interface:FileRewriter
Returns a set of supported options for this rewriter. Only options specified in this list will be accepted at runtime. Any other options will be rejected.- Specified by:
validOptions
in interfaceFileRewriter<FileScanTask,DataFile>
- Overrides:
validOptions
in classSizeBasedFileRewriter<FileScanTask,DataFile>
-
init
public void init(java.util.Map<java.lang.String,java.lang.String> options)
Description copied from interface:FileRewriter
Initializes this rewriter using provided options.- Specified by:
init
in interfaceFileRewriter<FileScanTask,DataFile>
- Overrides:
init
in classSizeBasedFileRewriter<FileScanTask,DataFile>
- Parameters:
options
- options to initialize this rewriter
-
filterFiles
protected java.lang.Iterable<FileScanTask> filterFiles(java.lang.Iterable<FileScanTask> tasks)
- Specified by:
filterFiles
in classSizeBasedFileRewriter<FileScanTask,DataFile>
-
filterFileGroups
protected java.lang.Iterable<java.util.List<FileScanTask>> filterFileGroups(java.util.List<java.util.List<FileScanTask>> groups)
- Specified by:
filterFileGroups
in classSizeBasedFileRewriter<FileScanTask,DataFile>
-
defaultTargetFileSize
protected long defaultTargetFileSize()
- Specified by:
defaultTargetFileSize
in classSizeBasedFileRewriter<FileScanTask,DataFile>
-
-