Skip to content
Table Format Maintenance & Operations Last updated: May 29, 2026

Iceberg Spark Procedure remove_orphan_files

A Spark SQL procedure in Apache Iceberg used to identify and delete files in table storage that are not referenced in any metadata snapshot.

remove_orphan_files sparkclean orphan files icebergspark sql call remove_orphan_files

Iceberg Spark Procedure remove_orphan_files

The Iceberg Spark Procedure remove_orphan_files is a storage maintenance function executed via Spark SQL. Failed transactions, aborted compaction jobs, or client crashes can write data or metadata files to the table’s directory without successfully committing them to the catalog. These untracked files are called orphan files. This procedure scans the physical storage directory, compares it against the active files logged in table metadata, and deletes any unreferenced files.

Syntax and Parameters

The procedure is run using Spark SQL CALL syntax. It includes a safety age filter (older_than) to prevent deleting files from active, in-progress write transactions:

/* Remove orphan files that were created more than 3 days ago */
CALL prod.system.remove_orphan_files(
    table => 'db.web_logs',
    older_than => TIMESTAMP '2026-05-26 14:00:00.000'
);

Safety and Configuration

πŸ“š Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base