Optimizing Iceberg tables
The Amazon S3 data lakes using open table formats such as Apache Iceberg store the data as Amazon S3 objects. Having thousands of small Amazon S3 objects in a data lake table increases metadata overhead on Iceberg tables and affects the read performance. For better read performance by Amazon analytics services such as Amazon Athena and Amazon EMR, and Amazon Glue ETL jobs, Amazon Glue Data Catalog provides managed compaction (a process that compacts small Amazon S3 objects into larger objects) for Iceberg tables in Data Catalog. You can use Lake Formation console, Amazon Glue console, Amazon CLI, or Amazon API to enable or disable compaction for individual Iceberg tables that are in the Data Catalog.
For more information, see Compaction management.