Class DynamicIcebergSink.Builder<T>

java.lang.Object
org.apache.iceberg.flink.sink.dynamic.DynamicIcebergSink.Builder<T>
Enclosing class:
DynamicIcebergSink

public static class DynamicIcebergSink.Builder<T> extends Object
  • Method Details

    • forInput

      public DynamicIcebergSink.Builder<T> forInput(org.apache.flink.streaming.api.datastream.DataStream<T> inputStream)
    • generator

      public DynamicIcebergSink.Builder<T> generator(DynamicRecordGenerator<T> inputGenerator)
    • catalogLoader

      public DynamicIcebergSink.Builder<T> catalogLoader(CatalogLoader newCatalogLoader)
      The catalog loader is used for loading tables in DynamicCommitter lazily, we need this loader because Table is not serializable and could not just use the loaded table from Builder#table in the remote task manager.
      Parameters:
      newCatalogLoader - to load iceberg table inside tasks.
      Returns:
      DynamicIcebergSink.Builder to connect the iceberg table.
    • set

      public DynamicIcebergSink.Builder<T> set(String property, String value)
      Set the write properties for IcebergSink. View the supported properties in FlinkWriteOptions
    • setAll

      public DynamicIcebergSink.Builder<T> setAll(Map<String,String> properties)
      Set the write properties for IcebergSink. View the supported properties in FlinkWriteOptions
    • overwrite

      public DynamicIcebergSink.Builder<T> overwrite(boolean newOverwrite)
    • flinkConf

      public DynamicIcebergSink.Builder<T> flinkConf(org.apache.flink.configuration.ReadableConfig config)
    • writeParallelism

      public DynamicIcebergSink.Builder<T> writeParallelism(int newWriteParallelism)
      Configuring the write parallel number for iceberg stream writer.
      Parameters:
      newWriteParallelism - the number of parallel iceberg stream writer.
      Returns:
      DynamicIcebergSink.Builder to connect the iceberg table.
    • uidPrefix

      public DynamicIcebergSink.Builder<T> uidPrefix(String newPrefix)
      Set the uid prefix for IcebergSink operators. Note that IcebergSink internally consists of multiple operators (like writer, committer, aggregator) Actual operator uid will be appended with a suffix like "uidPrefix-writer".

      If provided, this prefix is also applied to operator names.

      Flink auto generates operator uid if not set explicitly. It is a recommended best-practice to set uid for all operators before deploying to production. Flink has an option to pipeline.auto-generate-uid=false to disable auto-generation and force explicit setting of all operator uid.

      Be careful with setting this for an existing job, because now we are changing the operator uid from an auto-generated one to this new value. When deploying the change with a checkpoint, Flink won't be able to restore the previous IcebergSink operator state (more specifically the committer operator state). You need to use --allowNonRestoredState to ignore the previous sink state. During restore IcebergSink state is used to check if last commit was actually successful or not. --allowNonRestoredState can lead to data loss if the Iceberg commit failed in the last completed checkpoint.

      Parameters:
      newPrefix - prefix for Flink sink operator uid and name
      Returns:
      DynamicIcebergSink.Builder to connect the iceberg table.
    • snapshotProperties

      public DynamicIcebergSink.Builder<T> snapshotProperties(Map<String,String> properties)
    • setSnapshotProperty

      public DynamicIcebergSink.Builder<T> setSnapshotProperty(String property, String value)
    • toBranch

      public DynamicIcebergSink.Builder<T> toBranch(String branch)
    • immediateTableUpdate

      public DynamicIcebergSink.Builder<T> immediateTableUpdate(boolean newImmediateUpdate)
    • cacheMaxSize

      public DynamicIcebergSink.Builder<T> cacheMaxSize(int maxSize)
      Maximum size of the caches used in Dynamic Sink for table data and serializers.
    • cacheRefreshMs

      public DynamicIcebergSink.Builder<T> cacheRefreshMs(long refreshMs)
      Maximum interval for cache items renewals.
    • inputSchemasPerTableCacheMaxSize

      public DynamicIcebergSink.Builder<T> inputSchemasPerTableCacheMaxSize(int inputSchemasPerTableCacheMaxSize)
      Maximum input Schema objects to cache per each Iceberg table. The cache improves Dynamic Sink performance by storing Schema comparison results.
    • append

      public org.apache.flink.streaming.api.datastream.DataStreamSink<org.apache.iceberg.flink.sink.dynamic.DynamicRecordInternal> append()
      Append the iceberg sink operators to write records to iceberg table.
      Returns:
      DataStreamSink for sink.