Supported formats and limitations for managed data compaction - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Supported formats and limitations for managed data compaction

For better read performance by Amazon analytics services such as Amazon Athena, Amazon EMR, and Amazon Glue ETL jobs, Amazon Glue Data Catalog provides managed compaction (a process that compacts small Amazon S3 objects into larger objects) for Iceberg tables in Data Catalog.

Data compaction supports a variety of data types and compression formats for reading and writing data, including reading data from encrypted tables.

Data compaction supports:

  • File types – Parquet

  • Data types – Boolean, Integer, Long, Float, Double, String, Decimal, Date, Time, Timestamp, String, UUID, Binary

  • Compression – zstd, gzip, snappy, uncompressed

  • Encryption – Data compaction only supports default Amazon S3 encryption (SSE-S3) and server-side KMS encryption (SSE-KMS).

  • Bin pack compaction

  • Schema evolution

  • Tables with target file size ( property in iceberg configuration) within the inclusive range 128MB to 512 MB.

  • Regions

    • Asia Pacific (Tokyo)

    • Asia Pacific (Seoul)

    • Asia Pacific (Mumbai)

    • Asia Pacific (Singapore)

    • Europe (Ireland)

    • Europe (London)

    • Europe (Frankfurt)

    • US East (N. Virginia)

    • US East (Ohio)

    • US West (N. California)

    • South America (São Paulo)

  • You can run compaction from the account where Data Catalog resides when the Amazon S3 bucket that stores the underlying data is in another account. To do this, the compaction role requires access to the Amazon S3 bucket.

Data compaction currently doesn’t support:

  • File types – Avro, ORC

  • Data types – Fixed

  • Compression – brotli, lz4

  • Compaction of files while the partition spec evolves.

  • Regular sorting or z-order sorting

  • Merge or delete files – The compaction process skips data files that have delete files associated with them.

  • Compaction on cross-account tables – You can't run compaction on cross-account tables.

  • Compaction on cross-Region tables – You can't run compaction on cross-Region tables.

  • Enabling compaction on resource links

  • VPC endpoints for Amazon S3 buckets

  • DynamoDB lock manager – When using data compaction, no other data loading jobs should use lock-impl as