Skip to content
Lakehouse Catalogs & Governance Last updated: May 29, 2026

Catalog Federation

The capability of a query engine to connect to multiple disjoint metadata catalogs simultaneously, presenting them as a single logical database hierarchy.

catalog federationfederated queriesmulti catalog querydata federation

Catalog Federation

Catalog Federation is the ability of an analytical query engine to query multiple independent catalogs simultaneously. In large enterprises, tables are rarely stored in a single catalog. A company might store active transaction records in AWS Glue, developmental data in Project Nessie, and partner share tables in Apache Polaris. Catalog federation allows users to query and join these disparate tables without migrating the metadata.

How Catalog Federation Works

The query engine acts as the coordinator across the federated catalogs:

  1. Multiple Connections: The engine’s coordinator establishes active client connections to each defined catalog provider (e.g. AWS Glue, JDBC, and REST).
  2. Unified Namespace: The engine represents these catalogs as top-level namespaces within its logical database structure (for example, glue_prod, nessie_dev, and polaris_shared).
  3. Cross-Catalog Join Resolution: When a user runs a cross-catalog query, the engine reads metadata from each catalog, plans parallel scans, and executes the joins in its own memory space:
/* Query joining an S3 table in Glue and a branch table in Nessie */
SELECT c.name, o.amount
FROM glue_prod.db.customers c
JOIN nessie_dev.db.orders AT BRANCH main o ON c.id = o.customer_id;

Advantages of Federation

πŸ“š Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base