Support for transactional table types Additional resources

Using Amazon Lake Formation with Amazon Glue

Data engineers and DevOps professionals use Amazon Glue with Extract, Transform and Load (ETL) with Apache Spark to perform transformations on their data sets in Amazon S3 and load the transformed data into data lakes and data warehouses for analytics, machine learning, and application development. With different teams accessing the same data set in Amazon S3, it is imperative to grant and restrict permissions based on their roles.

Amazon Lake Formation is built on Amazon Glue, and the services interact in the following ways:

Lake Formation and Amazon Glue share the same Data Catalog.
The following Lake Formation console features invoke the Amazon Glue console:
- Jobs – For more information, see Adding Jobs in the Amazon Glue Developer Guide.
- Crawlers – For more information, see Cataloging Tables with a Crawler in the Amazon Glue Developer Guide.
The workflows generated when you use a Lake Formation blueprint are Amazon Glue workflows. You can view and manage these workflows in both the Lake Formation console and the Amazon Glue console.
Machine learning transforms are provided with Lake Formation and are built on Amazon Glue API operations. You create and manage machine learning transforms on the Amazon Glue console. For more information, see Machine Learning Transforms in the Amazon Glue Developer Guide.

You can use the Lake Formation fine-grained access control to manage your existing Data Catalog resources and Amazon S3 data locations.

Note

Amazon Glue 5.0 or higher supports fine-grained access controls on Iceberg and Hive tables that are backed by S3. This capability lets you configure table, row, column, and cell level access controls for read queries within your Amazon Glue for Apache Spark jobs.

Support for transactional table types

Applying Lake Formation permissions allows you to secure your transactional data in your Amazon S3 based data lakes. The table below lists transactional table formats supported in Amazon Glue and the Lake Formation permissions. Lake Formation enforces these permissions for Amazon Glue operations.

Supported table formats
Table format	Description and allowed operations	Lake Formation permissions supported in Amazon Glue
Apache Hudi	A open table format used to simplify incremental data processing and data pipeline development. For examples, see Using the Hudi framework in Amazon Glue.	Table-level permissions are available for Hudi tables. For more information, see Limitations.
Apache Iceberg	An open table format that manages large collections of files as tables. For examples, see Using the Iceberg framework in Amazon Glue.	Amazon Glue version 5.0 and higher lets you configure table, row, column, and cell level access controls for read queries within your Amazon Glue for Apache Spark jobs for Iceberg tables. For more information, see Limitations.
Linux Foundation Delta Lake	Delta Lake is an open-source project that helps implement modern data lake architectures commonly built on Amazon S3 or Hadoop Distributed File System (HDFS). For examples, see Using the Delta Lake framework in Amazon Glue.	Table-level permissions are available for Delta Lake tables. For more information, see Limitations.

Additional resources

Blog posts and repositories

Use the Amazon Glue connector to read and write Apache Iceberg tables with ACID transactions and perform time travel
Writing to Apache Hudi tables using Amazon Glue custom connector
Amazon repository of Cloudformation template and pyspark code sample to analyze streaming data using Amazon Glue, Apache Hudi, and Amazon S3.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Amazon Redshift Spectrum

Amazon EMR