Using data lake frameworks with Amazon Glue ETL jobs - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Using data lake frameworks with Amazon Glue ETL jobs

Open-source data lake frameworks simplify incremental data processing for files that you store in data lakes built on Amazon S3. Amazon Glue 3.0 and later supports the following open-source data lake frameworks:

  • Apache Hudi

  • Linux Foundation Delta Lake

  • Apache Iceberg

We provide native support for these frameworks so that you can read and write data that you store in Amazon S3 in a transactionally consistent manner. There's no need to install a separate connector or complete extra configuration steps in order to use these frameworks in Amazon Glue ETL jobs.

When you manage datasets through the Amazon Glue Data Catalog, you can use Amazon Glue methods to read and write data lake tables with Spark DataFrames. You can also read and write Amazon S3 data using the Spark DataFrame API.

In this video, you can learn about the basics of how Apache Hudi, Apache Iceberg, and Delta Lake work. You'll see how to insert, update, and delete data in your data lake and how each of these frameworks works.