使用任务提交者分类 - Amazon EMR
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

使用任务提交者分类

概览

Amazon EMR on EKS StartJobRun 请求会创建一个 job-submitter Pod(也称为 job-runner Pod)来生成 Spark 驱动程序。您可以使用 emr-job-submitter 分类为作业提交者容器组配置节点选择器,也可以为作业提交者容器组的日志记录容器设置映像、CPU 和内存。

可在 emr-job-submitter 分类下进行以下设置:

jobsubmitter.node.selector.[labelKey]

添加到任务提交者 Pod 的节点选择器中,以键 labelKey 和值作为配置的配置值。例如,您可以将 jobsubmitter.node.selector.identifier 设置为 myIdentifier,任务提交者 Pod 会有一个键标识符值为 myIdentifier 的节点选择器。这可用于指定作业提交者容器组可以放在哪些节点上。要添加多个节点选择器键,可使用此前缀设置多个配置。

jobsubmitter.logging.image

设置用于作业提交者容器组上的日志记录容器的自定义映像。

jobsubmitter.logging.request.cores

为作业提交者容器组上的日志记录容器设置 CPU 数量自定义值(以 CPU 单元为单位)。默认情况下,这设置为 100m

jobsubmitter.logging.request.memory

为作业提交者容器组上的日志记录容器设置内存量自定义值(以字节为单位)。默认情况下,这设置为 200Mi。mebibyte 是一种类似于 megabyte 的度量单位。

我们建议将作业提交者容器组放在按需型实例上。如果作业提交者容器组运行的实例遭遇竞价型实例中断,则在竞价型实例上放置作业提交者容器组可能会导致作业失败。您也可以将任务提交者 Pod 放在单个可用区中,也可以使用应用于节点的任何 Kubernetes 标签

任务提交者分类示例

StartJobRun 为任务提交者 Pod 请求按需型节点放置

cat >spark-python-in-s3-nodeselector-job-submitter.json << EOF { "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "virtual-cluster-id", "executionRoleArn": "execution-role-arn", "releaseLabel": "emr-6.11.0-latest", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false" } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.eks.amazonaws.com/capacityType": "ON_DEMAND" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } } EOF aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter.json

StartJobRun 为任务提交者 Pod 请求单可用区型节点放置

cat >spark-python-in-s3-nodeselector-job-submitter-az.json << EOF { "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "virtual-cluster-id", "executionRoleArn": "execution-role-arn", "releaseLabel": "emr-6.11.0-latest", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false" } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.topology.kubernetes.io/zone": "Availability Zone" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } } EOF aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter-az.json

StartJobRun 为任务提交者 Pod 请求单可用区和 Amazon EC2 实例类型放置

{ "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "virtual-cluster-id", "executionRoleArn": "execution-role-arn", "releaseLabel": "emr-6.11.0-latest", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.kubernetes.pyspark.pythonVersion=3 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6 --conf spark.sql.shuffle.partitions=1000" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false", } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.topology.kubernetes.io/zone": "Availability Zone", "jobsubmitter.node.selector.node.kubernetes.io/instance-type":"m5.4xlarge" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } }

具有自定义日志记录容器映像、CPU 和内存的 StartJobRun 请求

{ "name": "spark-python", "virtualClusterId": "virtual-cluster-id", "executionRoleArn": "execution-role-arn", "releaseLabel": "emr-6.11.0-latest", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix/trip-count.py" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "emr-job-submitter", "properties": { "jobsubmitter.logging.image": "YOUR_ECR_IMAGE_URL", "jobsubmitter.logging.request.memory": "200Mi", "jobsubmitter.logging.request.cores": "0.5" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } }

具有自定义容器映像的 StartJobRun 请求

cat >spark-python-in-s3-nodeselector-custom-container-image-job-submitter.json << EOF { "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "virtual-cluster-id", "executionRoleArn": "execution-role-arn", "releaseLabel": "emr-6.11.0-latest", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false" } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.container.image": "123456789012.dkr.ecr.us-west-2.amazonaws.com/emr6.11_custom_repo", "jobsubmitter.container.image.pullPolicy": "kubernetes pull policy" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } } EOF aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-custom-container-image-job-submitter.json