Deleting orphan files - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Deleting orphan files

Amazon Glue Data Catalog allows you to remove orphan files from your Iceberg tables. Orphan files are files that exist in your Amazon S3 data source under the specified table location, are not tracked by the Iceberg table metadata, and are older than your configured age limit. These orphan files can accumulate over time due to operations like compaction, partition drops, or table rewrites, and take up unnecessary storage space.

The orphan file deletion optimizer in Amazon Glue scans the table metadata and the actual data files, identifies the orphan files, and deletes them to reclaim storage space.

You can initiate the orphan file deletion by creating an orphan file deletion table optimizer in the Data Catalog.

Important

By default, orphan file deletion evaluates files across your Amazon Glue table location. While you can configure a sub-prefix to limit the scope of evaluation, you must ensure your table location doesn't contain files from other data sources or tables. If your table location overlaps with other data sources, the service might identify and delete unrelated files as orphans.