Apache Iceberg and Lake Formation - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Apache Iceberg and Lake Formation

Amazon EMR releases 6.15.0 and higher include support for fine-grained access control based on Amazon Lake Formation with Apache Iceberg when you read and write data with Spark SQL. Amazon EMR supports table, row, column, and cell-level access control with Apache Iceberg. With this feature, you can run snapshot queries on copy-on-write tables to query the latest snapshot of the table at a given commit or compaction instant.

If you want to use Iceberg format, set the following configurations. Replace DB_LOCATION with the Amazon S3 path where your Iceberg tables are located, and replace the Region and account ID placeholders with your own values.

spark-sql \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,com.amazonaws.emr.recordserver.connector.spark.sql.RecordServerSQLExtension --conf spark.sql.catalog.iceberg_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.iceberg_catalog.warehouse=s3://DB_LOCATION --conf spark.sql.catalog.iceberg_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.iceberg_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.catalog.iceberg_catalog.glue.account-id=ACCOUNT_ID --conf spark.sql.catalog.iceberg_catalog.glue.id=ACCOUNT_ID --conf spark.sql.catalog.iceberg_catalog.client.assume-role.region=Amazon_REGION --conf spark.sql.secureCatalog=iceberg_catalog

If you want Lake Formation to use record server to manage your Spark catalog, set spark.sql.catalog.<managed_catalog_name>.lf.managed to true.

You should also be careful NOT to pass the following assume role settings:

--conf spark.sql.catalog.my_catalog.client.assume-role.region --conf spark.sql.catalog.my_catalog.client.assume-role.arn --conf spark.sql.catalog.my_catalog.client.assume-role.tags.LakeFormationAuthorizedCaller

The following support matrix lists some core features of Apache Iceberg with Lake Formation:

Copy on Write Merge on Read

Snapshot queries - Spark SQL

Read-optimized queries - Spark SQL

Incremental queries

Time travel queries

Metadata tables

DML INSERT commands

DDL commands

Spark datasource queries

Spark datasource writes