Skip to content
Table Format Maintenance & Operations Last updated: May 29, 2026

Iceberg Bin-Packing Compaction

A fast, non-sorting compaction strategy in Apache Iceberg that combines small data files into larger files to reduce read amplification.

binpack compactioniceberg binpackingrewrite data files binpack

Iceberg Bin-Packing Compaction

Iceberg Bin-Packing Compaction is the default optimization strategy used in Apache Iceberg to consolidate small data files. When tables are written to by streaming pipelines or small write batches, they accumulate many small files on disk. The bin-packing algorithm combines these small files into larger ones (matching target sizes like 512 MB) without sorting the rows or changing their physical order.

Algorithm and Efficiency

The bin-packing algorithm groups files using a greedy approach:

  1. Sizing Boundaries: The system defines a minimum and maximum size threshold for files (e.g. min 100 MB, max 800 MB).
  2. Identifying Candidates: The optimizer scans manifests to find active files that fall below the minimum size.
  3. Greedy Packing: It packs these small candidate files into β€œbins” (representing target files) until the target size is reached.
  4. Consolidated Writes: The engine rewrites the contents of each bin into a single large file.

When to Use Bin-Packing

Bin-packing is the fastest compaction strategy because it avoids the overhead of sorting rows:

πŸ“š Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base