Optimizing Iceberg tables - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Optimizing Iceberg tables

Amazon Glue supports various table optimization options to help manage and optimize Iceberg tables used by the Amazon analytical engines and ETL jobs. These optimizers provide efficient disk space utilization, improved query performance, and data management. There are three types of table optimizers available in Amazon Glue:

  • Compaction – Data compaction compacts small data files to reduce disk space usage and improve read performance. Data files are merged and rewritten to remove obsolete data and consolidate fragmented data into larger, more efficient files. Compaction can be configured to run automatically or manually triggered as needed.

  • Snapshot retention – Snapshots are timestamped versions of an iceberg table. Snapshot retention configurations allow customers to enforce how long to retain snapshots and how many snapshots to retain. Configuring a snapshot retention optimizer can help manage storage overhead by removing older, unnecessary snapshots and their underlying files.

  • Orphan file deletion – Orphan files are files that are no longer referenced by the Iceberg table metadata. These files can accumulate over time, especially after operations like table deletions or failed ETL jobs. Enabling orphan file deletion allows Amazon Glue to periodically identify and remove these unnecessary files, freeing up storage.

You can enable or disable compaction, snapshot retention, and orphan file deletion for individual Iceberg tables in the Amazon Glue Data Catalog using the Amazon Glue console, Amazon CLI, or Amazon Glue API operations.