将 Delta Lake 集群与 Spark 和 Amazon Glue 结合使用

要使用 Glue Amazon Catalog 作为 Delta Lake 表格的元数据仓，请按照以下步骤创建一个集群。有关使用指定 Delta Lake 分类的信息 Amazon Command Line Interface，请参阅在创建集群 Amazon Command Line Interface 时使用提供配置或在创建集群时使用 Java SDK 提供配置。

创建 Delta Lake 集群

创建文件 configurations.json 并输入以下内容：



[{"Classification":"delta-defaults",  
"Properties":{"delta.enabled":"true"}},
{"Classification":"spark-hive-site",
"Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]

使用以下配置创建集群，将 example Amazon S3 bucket path 和 subnet ID 替换为您自己的值。



aws emr create-cluster 
    --release-label  emr-6.9.0  
    --applications Name=Spark  
    --configurations file://delta_configurations.json 
    --region us-east-1  
    --name My_Spark_Delta_Cluster  
    --log-uri  s3://amzn-s3-demo-bucket/  
    --instance-type m5.xlarge  
    --instance-count 2   
    --service-role EMR_DefaultRole_V2  
    --ec2-attributes  InstanceProfile=EMR_EC2_DefaultRole,SubnetId=subnet-1234567890abcdef0

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

将 Delta Lake 与 Spark 结合使用

注意事项