Creating a Delta Lake table in Amazon Glue Data Catalog

Delta Lake and Lake Formation on Amazon EMR

Amazon EMR releases 6.15.0 and higher include support for fine-grained access control based on Amazon Lake Formation with Delta Lake when you read and write data with Spark SQL. Amazon EMR supports table, row, column, and cell-level access control with Delta Lake. With this feature, you can run snapshot queries on copy-on-write tables to query the latest snapshot of the table at a given commit or compaction instant.

To use Delta Lake with Lake Formation, run the following command.


spark-sql \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension,com.amazonaws.emr.recordserver.connector.spark.sql.RecordServerSQLExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
--conf spark.sql.catalog.spark_catalog.lf.managed=true

If you want Lake Formation to use record server to manage your Spark catalog, set spark.sql.catalog.<managed_catalog_name>.lf.managed to true.

The following support matrix lists some core features of Delta Lake with Lake Formation:

	Copy on Write	Merge on Read
Snapshot queries - Spark SQL	✓	✓
Read-optimized queries - Spark SQL	✓	✓
Incremental queries	Not supported	Not supported
Time travel queries	Not supported	Not supported
Metadata tables	✓	✓
DML `INSERT` commands	✓	✓
DDL commands
Spark datasource queries
Spark datasource writes

Creating a Delta Lake table in Amazon Glue Data Catalog

Amazon EMR with Lake Formation doesn't support DDL commands and Delta table creation. Follow these steps to create tables in the Amazon Glue Data Catalog.

Use the following example to create a Delta table. Make sure that your S3 location exists.


spark-sql \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

> CREATE DATABASE if not exists <DATABASE_NAME> LOCATION 's3://<S3_LOCATION>/transactionaldata/native-delta/<DATABASE_NAME>/';
> CREATE TABLE <TABLE_NAME> (x INT, y STRING, z STRING) USING delta;
> INSERT INTO <TABLE_NAME> VALUES (1, 'a1', 'b1');

To see the details of your table, go to https://console.amazonaws.cn/glue/.
In the left navigation, expand Data Catalog, choose Tables, then choose the table you created. Under Schema, you should see that the Delta table you created with Spark stores all columns in a data type of array<string> in Amazon Glue.
To define column and cell-level filters in Lake Formation, remove the col column from your schema, and then add the columns that are in your table schema. In this example, add the columns x, y, and z.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Iceberg and Lake Formation

Considerations