Iceberg Spec V3 File Encryption
Iceberg Spec V3 File Encryption refers to the native secure encryption specification introduced in Version 3 of the Apache Iceberg format. It defines an envelope encryption model using the AES-GCM (Galois/Counter Mode) standard to protect both table data and critical metadata files stored in cloud environments. This ensures data confidentiality and structural tamper-proofing at the file layer.
Envelope Encryption Architecture
To protect files without causing performance bottlenecks, Iceberg uses a multi-tiered key management architecture:
- Table Master Key (TMK): Managed securely inside an external Key Management Service (KMS) like AWS KMS or HashiCorp Vault.
- Key Encryption Keys (KEKs): Generated per transaction or table scope, encrypted by the TMK, and stored within the table metadata.
- Data Encryption Keys (DEKs): Unique keys generated using a secure random number generator for each individual file. Each data file, delete file, and manifest is encrypted with its own unique DEK. The DEK itself is encrypted using the table KEK and stored in the manifest entry.
AES-GCM Stream and Additional Authenticated Data (AAD)
For non-columnar files such as manifest lists and manifest files, Iceberg specifies the AES-GCM Stream format. This format divides files into equal-sized blocks and encrypts each block individually.
The encryption process includes two security features:
- Authentication Tags: Every block contains a 16-byte authentication tag generated by the GCM cipher. This tag allows readers to verify that the block has not been altered or tampered with.
- Additional Authenticated Data (AAD): Iceberg uses AAD (incorporating parameters like file path and block offset) to sign each block. This protects against block-swapping attacks, where an attacker attempt to replace an encrypted block with a different block from the same table.
Structural Metadata Protection
A primary objective of V3 file encryption is protecting table metadata. Because manifest files store sensitive information like column statistics, file paths, and partition values, encrypting manifests prevents unauthorized parties from analyzing table structure or querying statistics without correct authorization keys.