Iceberg Optimistic Concurrency Control (OCC)
Iceberg Optimistic Concurrency Control (OCC) is the concurrency model defined by the Apache Iceberg specification to manage concurrent write operations. Rather than placing locks on tables during the entire write transaction (pessimistic locking), OCC assumes that conflicts between concurrent writers are rare. It allows multiple writers to plan, write, and stage data files in isolation, coordinating conflicts only during the final commit phase.
The OCC Workflow
When a writer initiates a transaction on an Iceberg table, it follows a structured sequence:
- Read Table State: The writer reads the current table metadata, establishing a base snapshot ID.
- Write and Stage Files: The writer writes new data or delete files and generates manifest files in isolation.
- Attempt Commit: The writer attempts to swap the catalogβs pointer from the base metadata file to a new metadata file containing the updated snapshot.
- Conflict Validation:
- If no other writer committed changes since the base snapshot was read, the pointer swap succeeds.
- If another writer committed a new snapshot in the meantime, the commit fails, and the writer initiates a retry loop.
Conflict Resolution and Commit Retries
When a commit fails due to a concurrent write, Iceberg query engines attempt to resolve the conflict automatically without failing the user query:
/* Query engines execute retry loops transparently behind the scenes */
INSERT INTO sales.transactions VALUES (101, 'US', 250.00);
During a retry, the engine reads the newly committed snapshot, checks if the changes overlap with the staged files, and, if safe (e.g. they write to different partitions), writes a new metadata file incorporating both changes and retries the pointer swap. If the conflict is irreconcilable (such as two transactions updating the same row), the transaction fails, preventing data corruption.