Class PartitionKeySelector

java.lang.Object
org.apache.iceberg.flink.sink.PartitionKeySelector
All Implemented Interfaces:
Serializable, org.apache.flink.api.common.functions.Function, org.apache.flink.api.java.functions.KeySelector<org.apache.flink.table.data.RowData,String>

@Internal public class PartitionKeySelector extends Object implements org.apache.flink.api.java.functions.KeySelector<org.apache.flink.table.data.RowData,String>
Create a KeySelector to shuffle by partition key, then each partition/bucket will be wrote by only one task. That will reduce lots of small files in partitioned fanout write policy for FlinkSink.
See Also:
  • Constructor Details

    • PartitionKeySelector

      public PartitionKeySelector(PartitionSpec spec, Schema schema, org.apache.flink.table.types.logical.RowType flinkSchema)
  • Method Details

    • getKey

      public String getKey(org.apache.flink.table.data.RowData row)
      Specified by:
      getKey in interface org.apache.flink.api.java.functions.KeySelector<org.apache.flink.table.data.RowData,String>