Call Amazon EMR Serverless with Step Functions
Step Functions can control certain Amazon services directly from Amazon States Language (ASL). To learn more, see Working with other services and Pass parameters to a service API.
How the Optimized EMR Serverless integration is different than the EMR Serverless Amazon SDK integration
-
The Optimized EMR Serverless service integration has a customized set of APIs that wrap the underlying EMR Serverless APIs. Because of this customization, the optimized EMR Serverless integration differs significantly from the EMR Serverless Amazon SDK service integration. In addition, the optimized EMR Serverless integration supports Run a Job (.sync) integration pattern.
-
The Wait for a Callback with the Task Token integration pattern is not supported.
In this topic
EMR Serverless service integration APIs
To integrate Amazon Step Functions with EMR Serverless, you can use the following six EMR Serverless service integration APIs. These service integration APIs are similar to the corresponding EMR Serverless APIs, with some differences in the fields that are passed and in the responses that are returned.
The following table describes the differences between each service integration API and its corresponding EMR Serverless API.
EMR Serverless service integration APIs and corresponding EMR Serverless APIs | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
EMR Serverless service integration API | Corresponding EMR Serverless API | Differences | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
createApplication Creates an application. EMR Serverless is linked to a unique type of IAM role known as a service-linked role. For |
CreateApplication | None | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
createApplication.sync Creates an application. |
CreateApplication |
No differences between the requests and responses of the EMR Serverless API and EMR Serverless service integration API. However, createApplication.sync waits for the application to reach the |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
startApplication Starts a specified application and initializes the application's initial capacity if configured. |
StartApplication |
The EMR Serverless API response doesn't contain any data, but the EMR Serverless service integration API response includes the following data.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
startApplication.sync Starts a specified application and initializes the initial capacity if configured. |
StartApplication |
The EMR Serverless API response doesn't contain any data, but the EMR Serverless service integration API response includes the following data.
Also, startApplication.sync waits for the application to reach the |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
stopApplication Stops a specified application and releases initial capacity if configured. All scheduled and running jobs must be completed or cancelled before stopping an application. |
StopApplication |
The EMR Serverless API response doesn't contain any data, but the EMR Serverless service integration API response includes the following data.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
stopApplication.sync Stops a specified application and releases initial capacity if configured. All scheduled and running jobs must be completed or cancelled before stopping an application. |
StopApplication |
The EMR Serverless API response doesn't contain any data, but the EMR Serverless service integration API response includes the following data.
Also, stopApplication.sync waits for the application to reach the |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
deleteApplication Deletes an application. An application must be in the |
DeleteApplication |
The EMR Serverless API response doesn't contain any data, but the EMR Serverless service integration API response includes the following data.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
deleteApplication.sync Deletes an application. An application must be in the |
DeleteApplication |
The EMR Serverless API response doesn't contain any data, but the EMR Serverless service integration API response includes the following data.
Also, stopApplication.sync waits for the application to reach the |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
startJobRun Starts a job run. |
StartJobRun | None | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
startJobRun.sync Starts a job run. |
StartJobRun |
No differences between the requests and responses of the EMR Serverless API and EMR Serverless service integration API. However, startJobRun.sync waits for the application to reach the |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
cancelJobRun Cancels a job run. |
CancelJobRun | None | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
cancelJobRun.sync Cancels a job run. |
CancelJobRun |
No differences between the requests and responses of the EMR Serverless API and EMR Serverless service integration API. However, cancelJobRun.sync waits for the application to reach the |
EMR Serverless integration use cases
For the Optimized EMR Serverless service integration, we recommend that you create a single application, and then use that application to run multiple jobs. For example, in a single state machine, you can include multiple startJobRun requests, all of which use the same application. The following Task state state examples show use cases to integrate EMR Serverless APIs with Step Functions. For information about other use cases of EMR Serverless, see What is Amazon EMR Serverless.
Tip
To deploy an example of a state machine that integrates with EMR Serverless for running multiple jobs to your Amazon Web Services account, see Run an EMR Serverless job.
For information about how to configure IAM permissions when using Step Functions with other Amazon services, see IAM Policies for integrated services.
In the examples shown in the following use cases, replace the italicized
text with your resource-specific information. For example, replace yourApplicationId
with the ID of your EMR Serverless application, such as 00yv7iv71inak893
.
Create an application
The following Task state example creates an application using the createApplication.sync service integration API.
"Create_Application": { "Type": "Task", "Resource": "arn:aws-cn:states:::emr-serverless:createApplication.sync", "Parameters": { "Name": "
MyApplication
", "ReleaseLabel": "emr-6.9.0", "Type": "SPARK" }, "End": true }
Start an application
The following Task state example starts an application using the startApplication.sync service integration API.
"Start_Application": { "Type": "Task", "Resource": "arn:aws-cn:states:::emr-serverless:startApplication.sync", "Parameters": { "ApplicationId": "
yourApplicationId
" }, "End": true }
Stop an application
The following Task state example stops an application using the stopApplication.sync service integration API.
"Stop_Application": { "Type": "Task", "Resource": "arn:aws-cn:states:::emr-serverless:stopApplication.sync", "Parameters": { "ApplicationId": "
yourApplicationId
" }, "End": true }
Delete an application
The following Task state example deletes an application using the deleteApplication.sync service integration API.
"Delete_Application": { "Type": "Task", "Resource": "arn:aws-cn:states:::emr-serverless:deleteApplication.sync", "Parameters": { "ApplicationId": "
yourApplicationId
" }, "End": true }
Start a job in an application
The following Task state example starts a job in an application using the startJobRun.sync service integration API.
"Start_Job": { "Type": "Task", "Resource": "arn:aws-cn:states:::emr-serverless:startJobRun.sync", "Parameters": { "ApplicationId": "
yourApplicationId
", "ExecutionRoleArn": "arn:aws-cn:iam::123456789012:role/myEMRServerless-execution-role
", "JobDriver": { "SparkSubmit": { "EntryPoint": "s3://<mybucket>
/sample.py
", "EntryPointArguments": ["1"], "SparkSubmitParameters": "--conf spark.executor.cores=4 --conf spark.executor.memory=4g --conf spark.driver.cores=2 --conf spark.driver.memory=4g --conf spark.executor.instances=1" } } }, "End": true }
Cancel a job in an application
The following Task state example cancels a job in an application using the cancelJobRun.sync service integration API.
"Cancel_Job": { "Type": "Task", "Resource": "arn:aws-cn:states:::emr-serverless:cancelJobRun.sync", "Parameters": { "ApplicationId.$": "$.ApplicationId", "JobRunId.$": "$.JobRunId" }, "End": true }