Using job submitter classification
Overview
The Amazon EMR on EKS StartJobRun
request creates a job submitter pod (also known as the job-runner pod) to spawn the Spark driver. You can configure node selectors for your job submitter pod with the
emr-job-submitter
classification.
The following setting is available under the emr-job-submitter
classification:
jobsubmitter.node.selector.[
labelKey
]-
Adds to the node selector of the job submitter pod, with key
and the value as the configuration value for the configuration. For example, you can setlabelKey
jobsubmitter.node.selector.identifier
tomyIdentifier
and the job submitter pod will have a node selector with a key identifier value ofmyIdentifier
. To add multiple node selector keys, set multiple configurations with this prefix.
As a best practice, we recommend that job submitter pods have node placement on On Demand Instances and not on Spot Instances. This is because a job will fail if the job submitter pod is subjected to Spot Instance interruptions. You can also place the job submitter pod in a single Availability Zone, or use any Kubernetes labels that are applied to the nodes.
Job submitter classification examples
In this section
StartJobRun
request with
On-Demand node placement for the job submitter pod
cat >spark-python-in-s3-nodeselector-job-submitter.json << EOF { "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "
virtual-cluster-id
", "executionRoleArn": "execution-role-arn
", "releaseLabel": "emr-6.11.0-latest
", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix
/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false" } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.eks.amazonaws.com/capacityType": "ON_DEMAND" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } } EOF aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter.json
StartJobRun
request with
single-AZ node placement for the job submitter pod
cat >spark-python-in-s3-nodeselector-job-submitter-az.json << EOF { "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "
virtual-cluster-id
", "executionRoleArn": "execution-role-arn
", "releaseLabel": "emr-6.11.0-latest
", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix
/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false" } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.topology.kubernetes.io/zone": "Availability Zone
" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } } EOF aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter-az.json
StartJobRun
request with
single-AZ and Amazon EC2 instance type placement for the job submitter pod
{ "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "
virtual-cluster-id
", "executionRoleArn": "execution-role-arn
", "releaseLabel": "emr-6.11.0-latest
", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix
/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.kubernetes.pyspark.pythonVersion=3 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6 --conf spark.sql.shuffle.partitions=1000" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false", } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.topology.kubernetes.io/zone": "Availability Zone
", "jobsubmitter.node.selector.node.kubernetes.io/instance-type":"m5.4xlarge
" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } }