将 Delta Lake 与 Amazon EMR on EKS 结合使用
将 Delta Lake 与 Amazon EMR on EKS 应用程序结合使用
-
启动任务运行以在应用程序配置中提交 Spark 任务时,请包含 Delta Lake JAR 文件:
--job-driver '{"sparkSubmitJobDriver" : { "sparkSubmitParameters" : "--jars local:///usr/share/aws/delta/lib/delta-core.jar,local:///usr/share/aws/delta/lib/delta-storage.jar,local:///usr/share/aws/delta/lib/delta-storage-s3-dynamodb.jar"}}'
-
请包含 Delta Lake 额外配置,并使用 Amazon Glue Data Catalog 作为元存储。
--configuration-overrides '{ "applicationConfiguration": [ { "classification" : "spark-defaults", "properties" : { "spark.sql.extensions" : "io.delta.sql.DeltaSparkSessionExtension", "spark.sql.catalog.spark_catalog":"org.apache.spark.sql.delta.catalog.DeltaCatalog", "spark.hadoop.hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory" } }]}'