Using Data Lake frameworks with Amazon Glue Studio - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Using Data Lake frameworks with Amazon Glue Studio

Overview

Open source data lake frameworks simplify incremental data processing for files stored in data lakes built on Amazon S3. Amazon Glue 3.0 and later supports the following open-source data lake storage frameworks:

  • Apache Hudi

  • Linux Foundation Delta Lake

  • Apache Iceberg

As of Amazon Glue 4.0, Amazon Glue provides native support for these frameworks so that you can read and write data that you store in Amazon S3 in a transactionally consistent manner. There's no need to install a separate connector or complete extra configuration steps in order to use these frameworks in Amazon Glue jobs.

Data Lake frameworks can be used as a source or a target within Amazon Glue Studio through Spark Script Editor jobs. For more information on using Apache Hudi, Apache Iceberg and Delta Lake see: Using data lake frameworks with Amazon Glue ETL jobs.

Creating open table formats from an Amazon Glue Streaming source

Amazon Glue streaming ETL jobs continuously consume data from streaming sources, clean and transform the data in-flight, and make it available for analysis in seconds.

Amazon offers a broad selection of services to support your needs. A database replication service such as Amazon Database Migration Service can replicate the data from your source systems to Amazon S3, which commonly hosts the storage layer of the data lake. Although it’s straightforward to apply updates on a relational database management system (RDBMS) that backs an online source application, it's difficult to apply this CDC process on your data lakes. The open-source data management frameworks simplify incremental data processing and data pipeline development, and are a good option to solve this problem.

For more information, see: