Considerations and limitations
-
Delta Lake is supported for use with Amazon EMR releases 6.9.0 and higher. You can use Apache Spark
3.x on Amazon EMR clusters with Delta tables. -
We recommend that you use s3 URI scheme for S3 location paths instead of s3a for best performance, security and reliability. For more information see Working with storage and file systems.
-
With Amazon EMR 7.0, Delta Universal Format (UniForm) and convert-to-Iceberg statements aren't supported.
-
With Amazon EMR 6.9 and 6.10, when you store Delta Lake table data in Amazon S3, column data becomes
NULLafter column rename operation. This issue is resolved with Amazon EMR 6.11. For more information about the experimental column rename operation, see Column rename operationin the Delta Lake User Guide. -
When using EMR Delta with Glue in the Beijing (cn-north-1) region, set
hive.s3.endpointtohttps://s3.cn-north-1.amazonaws.com.cn. -
If you create a database in the Amazon Glue Data Catalog outside of Apache Spark, the database could have an empty
LOCATIONfield. Because Spark doesn't allow databases to be created with an empty location property, you'll get the following error if you use Spark in Amazon EMR to create a Delta table in a Glue database and the database has an emptyLOCATIONproperty:IllegalArgumentException: Can not create a Path from an empty stringTo resolve this issue, create the database in the Data Catalog with a valid, non-empty path for the
LOCATIONfield. For steps to implement this solution, see Illegal argument exception when creating a table in the Amazon Athena User Guide.