Apache Iceberg is a new table format for large, slow-moving tabular data. It is designed to improve on the de-facto standard table layout built into Hive, Presto, and Spark.

Iceberg Overview

Iceberg tracks individual data files in a table instead of directories. This allows writers to create data files in-place and only adds files to the table in an explicit commit.

Table state is maintained in metadata files. All changes to table state create a new metadata file and replace the old metadata with an atomic operation. The table metadata file tracks the table schema, partitioning config, other properties, and snapshots of the table contents.

The atomic transitions from one table metadata file to the next provide snapshot isolation. Readers use the latest table state (snapshot) that was current when they load the table metadata and are not affected by changes until they refresh and pick up a new metadata location.

A snapshot is a complete set of data files in the table at some point in time. Snapshots are listed in the metadata file, but the files in a snapshot are stored across a separate set of manifest files.

Data files in snapshots are stored in one or more manifest files that contain a row for each data file in the table, its partition data, and its metrics. A snapshot is the union of all files in its manifests. Manifest files can be shared between snapshots to avoid rewriting metadata that is slow-changing.

Design benefits

This design addresses specific problems with the hive layout: file listing is no longer used to plan jobs and files are created in place without renaming.

This also provides improved guarantees and performance:

Why a new table format?

The central metastore can be a scale bottleneck and the file system doesn’t—and shouldn’t—provide transactions to isolate concurrent reads and writes.

There are several problems with the current format:

The current format’s dependence on listing and rename cannot be changed, so a new format is needed.

Other design goals

In addition to changes in how table contents are tracked, Iceberg’s design improves a few other areas: