Using Amazon Lake Formation with Amazon EMR - Amazon Lake Formation
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Using Amazon Lake Formation with Amazon EMR

Amazon EMR is a flexible Amazon managed cluster platform on which you can run any custom code on supported big data frameworks like Hadoop Map-Reduce, Spark, Hive, Presto, etc. Organizations also use Amazon EMR to run both batch and stream data processing applications across a highly distributed cluster. Using Apache Spark on Amazon EMR, you can run your data transformations and custom code on database and tables whose permissions are managed by Lake Formation.

There are three options for deploying Amazon EMR:

  • EMR on EC2

  • EMR Serverless

  • Amazon EMR on EKS

For more information, see Integrate Amazon EMR with Lake Formation or Using EMR Serverless with Amazon Lake Formation for fine-grained access control

Support for transactional table formats

Amazon EMR releases 6.15.0 and higher include support for Lake Formation table, row, column, and cell-level access control permissions on Apache Hudi , Apache Iceberg and Delta Lake table formats when you read and write data with Spark SQL.

For limitations, see Considerations for Amazon EMR with Lake Formation.

Supported table formats
Table format Description and allowed operations Lake Formation permissions supported in Amazon EMR

Apache Hudi

A open table format used to simplify incremental data processing and data pipeline development.

For a list of supported operations, see Apache Hudi and Lake Formation.

Amazon EMR supports table, row, column, and cell-level access control with Apache Hudi.

Apache Iceberg

An open table format that manages large collections of files as tables.

For a list of supported operations, see Apache Iceberg and Lake Formation.

Amazon EMR supports table, row, column, and cell-level access control with Apache Iceberg.

Linux Foundation Delta Lake

Delta Lake is an open-source project that helps implement modern data lake architectures commonly built on Amazon S3 or Hadoop Distributed File System (HDFS).

For a list of supported operations, see Delta Lake and Lake Formation.

Amazon EMR supports table, row, column, and cell-level access control with Delta Lake tables.

Additional resources