Iceberg Java API

Tables

The main purpose of the Iceberg API is to manage table metadata, like schema, partition spec, metadata, and data files that store table data.

Table metadata and operations are accessed through the Table interface. This interface will return table information,

Table metadata

The Table interface provides access table metadata:

Tables also provide refresh to update the table to the latest version, and expose helpers:

Scanning

Iceberg table scans start by creating a TableScan object with newScan.

TableScan scan = table.newScan();

To configure a scan, call filter and select on the TableScan to get a new TableScan with those changes.

TableScan filteredScan = scan.filter(Expressions.equal("id", 5))

Calls to configuration methods create a new TableScan so that each TableScan is immutable and won’t change unexpectedly if shared across threads.

When a scan is configured, planFiles, planTasks, and schema are used to return files, tasks, and the read projection.

TableScan scan = table.newScan()
    .filter(Expressions.equal("id", 5))
    .select("id", "data");

Schema projection = scan.schema();
Iterable<CombinedScanTask> tasks = scan.planTasks();

Use asOfTime or useSnapshot to configure the table snapshot for time travel queries.

Update operations

Table also exposes operations that update the table. These operations use a builder pattern, PendingUpdate, that commits when PendingUpdate#commit is called.

For example, updating the table schema is done by calling updateSchema, adding updates to the builder, and finally calling commit to commit the pending changes to the table:

table.updateSchema()
    .addColumn("count", Types.LongType.get())
    .commit();

Available operations to update a table are:

Transactions

Transactions are used to commit multiple table changes in a single atomic operation. A transaction is used to create individual operations using factory methods, like newAppend, just like working with a Table. Operations created by a transaction are committed as a group when commitTransaction is called.

For example, deleting and appending a file in the same transaction:

Transaction t = table.newTrasaction();

// commit operations to the transaction
t.newDelete().deleteFromRowFilter(filter).commit();
t.newAppend().appendFile(data).commit();

// commit all the changes to the table
t.commitTransaction();

Types

Iceberg data types are located in the org.apache.iceberg.types package.

Primitives

Primitive type instances are available from static methods in each type class. Types without parameters use get, and types like decimal use factory methods:

Types.IntegerType.get()    // int
Types.DoubleType.get()     // double
Types.DecimalType.of(9, 2) // decimal(9, 2)

Nested types

Structs, maps, and lists are created using factory methods in type classes.

Like struct fields, map keys or values and list elements are tracked as nested fields. Nested fields track field IDs and nullability.

Struct fields are created using NestedField.optional or NestedField.required. Map value and list element nullability is set in the map and list factory methods.

// struct<1 id: int, 2 data: optional string>
StructType struct = Struct.of(
    Types.NestedField.required(1, "id", Types.IntegerType.get()),
    Types.NestedField.optional(2, "data", Types.StringType.get())
  )
// map<1 key: int, 2 value: optional string>
MapType map = MapType.ofOptional(
    1, Types.IntegerType.get(),
    2, Types.StringType.get()
  )
// array<1 element: int>
ListType list = ListType.ofRequired(1, IntegerType.get());

Expressions

Iceberg’s expressions are used to configure table scans. To create expressions, use the factory methods in Expressions.

Supported predicate expressions are:

Supported expression operations are:

Constant expressions are:

Expression binding

When created, expressions are unbound. Before an expression is used, it will be bound to a data type to find the field ID the expression name represents, and to convert predicate literals.

For example, before using the expression lessThan("x", 10), Iceberg needs to determine which column "x" refers to and convert 10 to that colunn’s data type.

If the expression could be bound to the type struct<1 x: long, 2 y: long> or to struct<11 x: int, 12 y: int>.

Expression example

table.newScan()
    .filter(Expressions.greaterThanOrEqual("x", 5))
    .filter(Expressions.lessThan("x", 10))

Modules

Iceberg table support is organized in library modules:

This project Iceberg also has modules for adding Iceberg support to processing engines: