Manage an Amazon EMR job
This sample project demonstrates Amazon EMR and Amazon Step Functions integration. The project creates an Amazon EMR cluster, adds multiple steps and runs them, and then terminate the cluster.
Important
Amazon EMR does not have a free pricing tier. Running the sample project will incur costs. You can
find pricing information on the Amazon EMR
pricing
Step 1: Create the state machine
Open the Step Functions console
and choose Create state machine. -
Type
Manage an EMR job
in the search box, and then choose Manage an EMR job from the search results that are returned. -
Choose Next to continue.
-
Choose Run a demo to create a read-only and ready-to-deploy workflow, or choose Build on it to create an editable state machine definition that you can build on and later deploy.
This sample project deploys the following resources:
-
An Amazon S3 bucket
-
An Amazon EMR cluster
-
An Amazon Step Functions state machine
-
Related Amazon Identity and Access Management (IAM) roles
The following image shows the workflow graph for the Manage an EMR job sample project:
-
-
Choose Use template to continue with your selection.
Next steps depend on your previous choice:
-
Run a demo – You can review the state machine before you create a read-only project with resources deployed by Amazon CloudFormation to your Amazon Web Services account.
You can view the state machine definition, and when you are ready, choose Deploy and run to deploy the project and create the resources.
Deploying can take up to 10 minutes to create resources and permissions. You can use the Stack ID link to monitor progress in Amazon CloudFormation.
After deploy completes, you should see your new state machine in the console.
-
Build on it – You can review and edit the workflow definition. You might need to set values for placeholders in the sample project before attemping to run your custom workflow.
Note
Standard charges might apply for services deployed to your account.
Step 2: Run the state machine
-
On the State machines page, choose your sample project.
-
On the sample project page, choose Start execution.
-
In the Start execution dialog box, do the following:
-
(Optional) Enter a custom execution name to override the generated default.
Non-ASCII names and logging
Step Functions accepts names for state machines, executions, activities, and labels that contain non-ASCII characters. Because such characters will not work with Amazon CloudWatch, we recommend using only ASCII characters so you can track metrics in CloudWatch.
-
(Optional) In the Input box, enter input values as JSON. You can skip this step if you are running a demo.
-
Choose Start execution.
The Step Functions console will direct you to an Execution Details page where you can choose states in the Graph view to explore related information in the Step details pane.
-
Example State Machine Code
The state machine in this sample project integrates with Amazon EMR by passing parameters directly to those resources. Browse through this example state machine to see how Step Functions uses a state machine to call the Amazon EMR task synchronously, waits for the task to succeed or fail, and terminates the cluster.
For more information about how Amazon Step Functions can control other Amazon services, see Integrating services with Step Functions.
{
"Comment": "An example of the Amazon States Language for running jobs on Amazon EMR",
"StartAt": "Create an EMR cluster",
"States": {
"Create an EMR cluster": {
"Type": "Task",
"Resource": "arn:<PARTITION>:states:::elasticmapreduce:createCluster.sync",
"Parameters": {
"Name": "ExampleCluster",
"VisibleToAllUsers": true,
"ReleaseLabel": "emr-5.26.0",
"Applications": [
{ "Name": "Hive" }
],
"ServiceRole": "<EMR_SERVICE_ROLE>",
"JobFlowRole": "<EMR_EC2_INSTANCE_PROFILE>",
"LogUri": "s3://<amzn-s3-demo-EMR_LOG>/logs/",
"Instances": {
"KeepJobFlowAliveWhenNoSteps": true,
"InstanceFleets": [
{
"Name": "MyMasterFleet",
"InstanceFleetType": "MASTER",
"TargetOnDemandCapacity": 1,
"InstanceTypeConfigs": [
{
"InstanceType": "m5.xlarge"
}
]
},
{
"Name": "MyCoreFleet",
"InstanceFleetType": "CORE",
"TargetOnDemandCapacity": 1,
"InstanceTypeConfigs": [
{
"InstanceType": "m5.xlarge"
}
]
}
]
}
},
"ResultPath": "$.cluster",
"Next": "Run first step"
},
"Run first step": {
"Type": "Task",
"Resource": "arn:<PARTITION>:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "My first EMR step",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": ["<COMMAND_ARGUMENTS>"]
}
}
},
"Retry" : [
{
"ErrorEquals": [ "States.ALL" ],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2.0
}
],
"ResultPath": "$.firstStep",
"Next": "Run second step"
},
"Run second step": {
"Type": "Task",
"Resource": "arn:<PARTITION>:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "My second EMR step",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": ["<COMMAND_ARGUMENTS>"]
}
}
},
"Retry" : [
{
"ErrorEquals": [ "States.ALL" ],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2.0
}
],
"ResultPath": "$.secondStep",
"Next": "Terminate Cluster"
},
"Terminate Cluster": {
"Type": "Task",
"Resource": "arn:<PARTITION>:states:::elasticmapreduce:terminateCluster",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId"
},
"End": true
}
}
}
IAM Example
This example Amazon Identity and Access Management (IAM) policy generated by the sample project includes the least privilege necessary to execute the state machine and related resources. It's a best practice to include only those permissions that are necessary in your IAM policies.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"elasticmapreduce:RunJobFlow",
"elasticmapreduce:DescribeCluster",
"elasticmapreduce:TerminateJobFlows"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": [
"arn:aws-cn:iam::123456789012:role/StepFunctionsSample-EMRJobManagement-EMRServiceRole-ANPAJ2UCCR6DPCEXAMPLE",
"arn:aws-cn:iam::123456789012:role/StepFunctionsSample-EMRJobManagementWJALRXUTNFEMI-ANPAJ2UCCR6DPCEXAMPLE-EMREc2InstanceProfile-1ANPAJ2UCCR6DPCEXAMPLE"
]
},
{
"Effect": "Allow",
"Action": [
"events:PutTargets",
"events:PutRule",
"events:DescribeRule"
],
"Resource": [
"arn:aws-cn:events:sa-east-1:123456789012:rule/StepFunctionsGetEventForEMRRunJobFlowRule"
]
}
]
}
The following policy ensures that addStep
has sufficient permissions.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"elasticmapreduce:AddJobFlowSteps",
"elasticmapreduce:DescribeStep",
"elasticmapreduce:CancelSteps"
],
"Resource": "arn:aws:elasticmapreduce:*:*:cluster/*"
},
{
"Effect": "Allow",
"Action": [
"events:PutTargets",
"events:PutRule",
"events:DescribeRule"
],
"Resource": [
"arn:aws-cn:events:sa-east-1:123456789012:rule/StepFunctionsGetEventForEMRAddJobFlowStepsRule"
]
}
]
}
}
For information about how to configure IAM when using Step Functions with other Amazon services, see How Step Functions generates IAM policies for integrated services.