Optimizing Iceberg tables - Amazon Lake Formation
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Optimizing Iceberg tables

The Amazon S3 data lakes using open table formats such as Apache Iceberg store the data as Amazon S3 objects. Having thousands of small Amazon S3 objects in a data lake table increases metadata overhead on Iceberg tables and affects the read performance. For better read performance by Amazon analytics services such as Amazon Athena and Amazon EMR, and Amazon Glue ETL jobs, Amazon Glue Data Catalog provides managed compaction (a process that compacts small Amazon S3 objects into larger objects) for Iceberg tables in Data Catalog. You can use Lake Formation console, Amazon Glue console, Amazon CLI, or Amazon API to enable or disable compaction for individual Iceberg tables that are in the Data Catalog.

For more information, see Compaction management.