Skip to content
Lakehouse Catalogs & Governance Last updated: May 29, 2026

Single Source of Truth (SSOT)

A data design principle where a central repository serves as the definitive reference for the state and metadata definitions of all tables across an enterprise.

single source of truthssotcatalog source of truthdata lakehouse ssot

Single Source of Truth (SSOT)

In data lakehouse architectures, a Single Source of Truth (SSOT) is the architectural goal of having a single repository define the exact composition, schema, and location of valid data files. In legacy data lakes, separate query engines often maintained their own copy of table definitions, leading to synchronization errors, stale schema configurations, and data inconsistency.

How Apache Iceberg Achieves SSOT

Apache Iceberg implements a Single Source of Truth at two key levels:

  1. Catalog Level: The catalog (such as Apache Polaris, Project Nessie, or AWS Glue Catalog) holds a single, atomic reference pointer to the current .metadata.json file for every table. Regardless of which engine queries the table, they must retrieve this pointer from the central catalog, ensuring that all engines see the exact same snapshot version of the data at any given moment.
  2. Metadata File Level: The table’s state is recorded inside its metadata files. This includes schema definitions, partition specifications, and snapshots. By keeping the schema state explicitly linked to the snapshot history in the metadata file, Iceberg prevents engines from misinterpreting column layouts or scanning old files.

Architectural Benefits

Establishing a central source of truth resolves several operational challenges:

πŸ“š Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base