public interface RewriteDataFiles extends SnapshotUpdate<RewriteDataFiles,RewriteDataFiles.Result>
Modifier and Type | Interface and Description |
---|---|
static interface |
RewriteDataFiles.FileGroupFailureResult
For a file group that failed to rewrite.
|
static interface |
RewriteDataFiles.FileGroupInfo
A description of a file group, when it was processed, and within which partition.
|
static interface |
RewriteDataFiles.FileGroupRewriteResult
For a particular file group, the number of files which are newly created and the number of
files which were formerly part of the table but have been rewritten.
|
static interface |
RewriteDataFiles.Result
A map of file group information to the results of rewriting that file group.
|
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
MAX_CONCURRENT_FILE_GROUP_REWRITES
The max number of file groups to be simultaneously rewritten by the rewrite strategy.
|
static int |
MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT |
static java.lang.String |
MAX_FILE_GROUP_SIZE_BYTES
The entire rewrite operation is broken down into pieces based on partitioning and within
partitions based on size into groups.
|
static long |
MAX_FILE_GROUP_SIZE_BYTES_DEFAULT |
static java.lang.String |
PARTIAL_PROGRESS_ENABLED
Enable committing groups of files (see max-file-group-size-bytes) prior to the entire rewrite
completing.
|
static boolean |
PARTIAL_PROGRESS_ENABLED_DEFAULT |
static java.lang.String |
PARTIAL_PROGRESS_MAX_COMMITS
The maximum amount of Iceberg commits that this rewrite is allowed to produce if partial
progress is enabled.
|
static int |
PARTIAL_PROGRESS_MAX_COMMITS_DEFAULT |
static java.lang.String |
REWRITE_JOB_ORDER
Forces the rewrite job order based on the value.
|
static java.lang.String |
REWRITE_JOB_ORDER_DEFAULT |
static java.lang.String |
TARGET_FILE_SIZE_BYTES
The output file size that this rewrite strategy will attempt to generate when rewriting files.
|
static java.lang.String |
USE_STARTING_SEQUENCE_NUMBER
If the compaction should use the sequence number of the snapshot at compaction start time for
new data files, instead of using the sequence number of the newly produced snapshot.
|
static boolean |
USE_STARTING_SEQUENCE_NUMBER_DEFAULT |
Modifier and Type | Method and Description |
---|---|
default RewriteDataFiles |
binPack()
Choose BINPACK as a strategy for this rewrite operation
|
RewriteDataFiles |
filter(Expression expression)
A user provided filter for determining which files will be considered by the rewrite strategy.
|
default RewriteDataFiles |
sort()
Choose SORT as a strategy for this rewrite operation using the table's sortOrder
|
default RewriteDataFiles |
sort(SortOrder sortOrder)
Choose SORT as a strategy for this rewrite operation and manually specify the sortOrder to use
|
default RewriteDataFiles |
zOrder(java.lang.String... columns)
Choose Z-ORDER as a strategy for this rewrite operation with a specified list of columns to use
|
snapshotProperty
static final java.lang.String PARTIAL_PROGRESS_ENABLED
The default is false, which produces a single commit when the entire job has completed.
static final boolean PARTIAL_PROGRESS_ENABLED_DEFAULT
static final java.lang.String PARTIAL_PROGRESS_MAX_COMMITS
static final int PARTIAL_PROGRESS_MAX_COMMITS_DEFAULT
static final java.lang.String MAX_FILE_GROUP_SIZE_BYTES
MAX_FILE_GROUP_SIZE_BYTES
. This helps with breaking down the rewriting of very large
partitions which may not be rewritable otherwise due to the resource constraints of the
cluster. For example a sort based rewrite may not scale to terabyte sized partitions, those
partitions need to be worked on in small subsections to avoid exhaustion of resources.
When grouping files, the underlying rewrite strategy will use this value as to limit the files which will be included in a single file group. A group will be processed by a single framework "action". For example, in Spark this means that each group would be rewritten in its own Spark action. A group will never contain files for multiple output partitions.
static final long MAX_FILE_GROUP_SIZE_BYTES_DEFAULT
static final java.lang.String MAX_CONCURRENT_FILE_GROUP_REWRITES
static final int MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT
static final java.lang.String TARGET_FILE_SIZE_BYTES
static final java.lang.String USE_STARTING_SEQUENCE_NUMBER
This avoids commit conflicts with updates that add newer equality deletes at a higher sequence number.
Defaults to true.
static final boolean USE_STARTING_SEQUENCE_NUMBER_DEFAULT
static final java.lang.String REWRITE_JOB_ORDER
Defaults to none.
static final java.lang.String REWRITE_JOB_ORDER_DEFAULT
default RewriteDataFiles binPack()
default RewriteDataFiles sort()
default RewriteDataFiles sort(SortOrder sortOrder)
sortOrder
- user defined sortOrderdefault RewriteDataFiles zOrder(java.lang.String... columns)
columns
- Columns to be used to generate Z-ValuesRewriteDataFiles filter(Expression expression)
expression
- An iceberg expression used to determine which files will be considered for
rewriting