Use a Delta Lake cluster with Spark and Amazon Glue

To use the Amazon Glue Catalog as the Metastore for Delta Lake tables, create a cluster with following steps. For information on specifying the Delta Lake classification using Amazon Command Line Interface, see Supply a configuration using the Amazon Command Line Interface when you create a cluster or Supply a configuration using the Java SDK when you create a cluster.

Create a Delta Lake cluster

Create a file, configurations.json, with the following content:



[{"Classification":"delta-defaults",  
"Properties":{"delta.enabled":"true"}},
{"Classification":"spark-hive-site",
"Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]

Create a cluster with the following configuration, replacing the example Amazon S3 bucket path and the subnet ID with your own.



aws emr create-cluster 
    --release-label  emr-6.9.0  
    --applications Name=Spark  
    --configurations file://delta_configurations.json 
    --region us-east-1  
    --name My_Spark_Delta_Cluster  
    --log-uri  s3://amzn-s3-demo-bucket/  
    --instance-type m5.xlarge  
    --instance-count 2   
    --service-role EMR_DefaultRole_V2  
    --ec2-attributes  InstanceProfile=EMR_EC2_DefaultRole,SubnetId=subnet-1234567890abcdef0

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Delta Lake with Spark

Considerations