Limitations - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Limitations

Consider the following limitations before you use data lake frameworks with Amazon Glue.

  • The following Amazon Glue GlueContext methods for DynamicFrame don't support reading and writing data lake framework tables. Use the GlueContext methods for DataFrame or Spark DataFrame API instead.

    • The following GlueContext methods for DynamicFrame are not supported with Lake Formation permission control:

      • create_dynamic_frame.from_catalog

      • write_dynamic_frame.from_catalog

      • getDynamicFrame

      • writeDynamicFrame

    • The following GlueContext methods for DataFrame are supported with Lake Formation permission control:

      • create_data_frame.from_catalog

      • write_data_frame.from_catalog

      • getDataFrame

      • writeDataFrame

  • Grouping small files is not supported.

  • Job bookmarks are not supported.

  • Apache Hudi 0.10.1 for Amazon Glue 3.0 doesn't support Hudi Merge on Read (MoR) tables.

  • ALTER TABLE … RENAME TO is not available for Apache Iceberg 0.13.1 for Amazon Glue 3.0.

Limitations for data lake format tables managed by Lake Formation permissions

The data lake formats are integrated with Amazon Glue ETL via Lake Formation permissions. Creating a DynamicFrame using create_dynamic_frame is not supported. For more information, see the following examples:

Note

The integration with Amazon Glue ETL via Lake Formation permissions for Apache Hudi, Apache Iceberg, and Delta Lake is supported only in Amazon Glue version 4.0.

Apache Iceberg has the best integration with Amazon Glue ETL via Lake Formation permissions. It supports almost all operations and includes SQL support.

Hudi supports most basic operations with the exception of administrative operations. This is because these options generally are done via writing of dataframes and specified via additional_options. You need to use Amazon Glue APIs to create DataFrames for your operations as SparkSQL is not supported.

Delta Lake only supports the reading and appending and overwriting of table data. Delta Lake requires the use of their own libraries to be able to perform various tasks such as updates.

The following features are not available for Iceberg tables managed by Lake Formation permissions.

  • Compaction using Amazon Glue ETL

  • Spark SQL support via Amazon Glue ETL

The following are limitations of Hudi tables managed by Lake Formation permissions:

  • Removal of orphaned files

The following are limitations of Delta Lake tables managed by Lake Formation permissions:

  • All features other than inserting and reading from Delta Lake tables.