Iceberg Hive Catalog Lock Manager
The Iceberg Hive Catalog Lock Manager is the mechanism used to coordinate concurrent updates when using a Hive Metastore (HMS) as the catalog. Because standard Hive Metastores do not have native atomic compare-and-swap features for tables that bypass the Hive engine, Iceberg uses Hive’s internal transactional locking APIs to serialize writes and prevent engines from overwriting each other’s commits.
Locking Mechanism
When an Iceberg client commits to a Hive-backed table, it interacts with the Hive Metastore lock service:
- Lock Request: The client requests an exclusive write lock on the target table using the HMS client API (
lockcall). - HMS Lock DB: The Hive Metastore records the lock request in its backend relational database (typically using tables like
HIVE_LOCKS). - Polling: The Iceberg client blocks and polls the metastore, waiting until the lock status changes to
ACQUIRED. - Metadata Update: Once the lock is acquired, the client retrieves the current table parameters, validates that the schema and snapshot have not changed, updates the table parameters to point to the new metadata JSON path, and commits.
- Unlock: The client sends an
unlockcommand to HMS, releasing the table lock for subsequent writers.
Configuration Properties
If the Hive Metastore is configured with ACID transactions enabled, Iceberg uses the metastore’s database locks automatically. If HMS transactions are disabled, developers can configure alternative lock manager implementations (such as ZooKeeper lock managers) via Hadoop configuration properties:
/* Example Hadoop catalog properties for ZooKeeper lock coordination */
iceberg.catalog.hive.lock-manager = org.apache.iceberg.util.LockManagers$ZooKeeperLockManager
iceberg.catalog.hive.lock.zk.connect-string = localhost:2181
Using ZooKeeper offloads the lock coordination from the Hive Metastore database, resolving common bottleneck issues in high-concurrency environments.