Adding steps to an Amazon EMR cluster with the Amazon CLI
The following procedures demonstrate how to add steps to a newly created cluster and
to a running cluster with the Amazon CLI. Both examples use the --steps
subcommand to add steps to the cluster.
To add steps during cluster creation
-
Type the following command to create a cluster and add an Apache Pig step. Make sure to replace
with the name of your Amazon EC2 key pair.myKeyaws emr create-cluster --name "Test cluster" \ --applications Name=Spark\ --use-default-roles \ --ec2-attributes KeyName=myKey\ --instance-groups InstanceGroupType=PRIMARY,InstanceCount=1,InstanceType=m5.xlargeInstanceGroupType=CORE,InstanceCount=2,InstanceType=m5.xlarge\ --steps '[{"Args":["spark-submit","--deploy-mode","cluster","--class","org.apache.spark.examples.SparkPi","/usr/lib/spark/examples/jars/spark-examples.jar","5"],"Type":"CUSTOM_JAR","ActionOnFailure":"CONTINUE","Jar":"command-runner.jar","Properties":"","Name":"Spark application"}]'Note
The list of arguments changes depending on the type of step.
By default, the step concurrency level is
1. You can set the step concurrency level with theStepConcurrencyLevelparameter when you create a cluster.The output is a cluster identifier similar to the following.
{ "ClusterId": "j-2AXXXXXXGAPLF" }
To add a step to a running cluster
-
Type the following command to add a step to a running cluster. Replace
with your own cluster ID.j-2AXXXXXXGAPLFaws emr add-steps --cluster-idj-2AXXXXXXGAPLF\ --steps '[{"Args":["spark-submit","--deploy-mode","cluster","--class","org.apache.spark.examples.SparkPi","/usr/lib/spark/examples/jars/spark-examples.jar","5"],"Type":"CUSTOM_JAR","ActionOnFailure":"CONTINUE","Jar":"command-runner.jar","Properties":"","Name":"Spark application"}]'The output is a step identifier similar to the following.
{ "StepIds": [ "s-Y9XXXXXXAPMD" ] }
To modify the StepConcurrencyLevel in a running cluster
-
In a running cluster, you can modify the
StepConcurrencyLevelwith theModifyClusterAPI. For example, type the following command to increase theStepConcurrencyLevelto10. Replacewith your cluster ID.j-2AXXXXXXGAPLFaws emr modify-cluster --cluster-idj-2AXXXXXXGAPLF--step-concurrency-level 10 -
The output is similar to the following.
{ "StepConcurrencyLevel": 10 }
For more information on using Amazon EMR commands in the Amazon CLI, see the Amazon CLI Command Reference.