Using Data Lake frameworks with Amazon Glue Studio - Amazon Glue Studio
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Using Data Lake frameworks with Amazon Glue Studio

Overview

Open source data lake frameworks simplify incremental data processing for files stored in data lakes built on Amazon S3. Amazon Glue3.0 and later supports the following open-source data lake storage frameworks:

  • Apache Hudi

  • Linux Foundation Delta Lake

  • Apache Iceberg

Amazon Glue provides native support for these frameworks so that you can read and write data that you store in Amazon S3 in a transactionally consistent manner. There's no need to install a separate connector or complete extra configuration steps in order to use these frameworks in Amazon Glue jobs.

Data Lake frameworks can be used as a source or a target within Amazon Glue Studio through Spark Script Editor jobs. For more information on using Apache Hudi, Apache Iceberg and Delta Lake see: Using data lake frameworks with Amazon Glue ETL jobs.