Call Amazon EMR Serverless with Step Functions - Amazon Step Functions
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Call Amazon EMR Serverless with Step Functions

Step Functions can control certain Amazon services directly from Amazon States Language (ASL). To learn more, see Working with other services and Pass parameters to a service API.

How the Optimized EMR Serverless integration is different than the EMR Serverless Amazon SDK integration
  • The Optimized EMR Serverless service integration has a customized set of APIs that wrap the underlying EMR Serverless APIs. Because of this customization, the optimized EMR Serverless integration differs significantly from the EMR Serverless Amazon SDK service integration. In addition, the optimized EMR Serverless integration supports Run a Job (.sync) integration pattern.

  • The Wait for a Callback with the Task Token integration pattern is not supported.

EMR Serverless service integration APIs

To integrate Amazon Step Functions with EMR Serverless, you can use the following six EMR Serverless service integration APIs. These service integration APIs are similar to the corresponding EMR Serverless APIs, with some differences in the fields that are passed and in the responses that are returned.

The following table describes the differences between each service integration API and its corresponding EMR Serverless API.

EMR Serverless service integration APIs and corresponding EMR Serverless APIs
EMR Serverless service integration API Corresponding EMR Serverless API Differences

createApplication

Creates an application.

EMR Serverless is linked to a unique type of IAM role known as a service-linked role. For createApplication and createApplication.sync to work, you must have configured the necessary permissions to create the service-linked role AmazonServiceRoleForAmazonEMRServerless. For more information about this, including a statement you can add to your IAM permissions policy, see Using service-linked roles for EMR Serverless.

CreateApplication None

createApplication.sync

Creates an application.

CreateApplication

No differences between the requests and responses of the EMR Serverless API and EMR Serverless service integration API. However, createApplication.sync waits for the application to reach the CREATED state.

startApplication

Starts a specified application and initializes the application's initial capacity if configured.

StartApplication

The EMR Serverless API response doesn't contain any data, but the EMR Serverless service integration API response includes the following data.

{ "ApplicationId": "string" }

startApplication.sync

Starts a specified application and initializes the initial capacity if configured.

StartApplication

The EMR Serverless API response doesn't contain any data, but the EMR Serverless service integration API response includes the following data.

{ "ApplicationId": "string" }

Also, startApplication.sync waits for the application to reach the STARTED state.

stopApplication

Stops a specified application and releases initial capacity if configured. All scheduled and running jobs must be completed or cancelled before stopping an application.

StopApplication

The EMR Serverless API response doesn't contain any data, but the EMR Serverless service integration API response includes the following data.

{ "ApplicationId": "string" }

stopApplication.sync

Stops a specified application and releases initial capacity if configured. All scheduled and running jobs must be completed or cancelled before stopping an application.

StopApplication

The EMR Serverless API response doesn't contain any data, but the EMR Serverless service integration API response includes the following data.

{ "ApplicationId": "string" }

Also, stopApplication.sync waits for the application to reach the STOPPED state.

deleteApplication

Deletes an application. An application must be in the STOPPED or CREATED state in order to be deleted.

DeleteApplication

The EMR Serverless API response doesn't contain any data, but the EMR Serverless service integration API response includes the following data.

{ "ApplicationId": "string" }

deleteApplication.sync

Deletes an application. An application must be in the STOPPED or CREATED state in order to be deleted.

DeleteApplication

The EMR Serverless API response doesn't contain any data, but the EMR Serverless service integration API response includes the following data.

{ "ApplicationId": "string" }

Also, stopApplication.sync waits for the application to reach the TERMINATED state.

startJobRun

Starts a job run.

StartJobRun None

startJobRun.sync

Starts a job run.

StartJobRun

No differences between the requests and responses of the EMR Serverless API and EMR Serverless service integration API. However, startJobRun.sync waits for the application to reach the SUCCESS state.

cancelJobRun

Cancels a job run.

CancelJobRun None

cancelJobRun.sync

Cancels a job run.

CancelJobRun

No differences between the requests and responses of the EMR Serverless API and EMR Serverless service integration API. However, cancelJobRun.sync waits for the application to reach the CANCELLED state.

EMR Serverless integration use cases

For the Optimized EMR Serverless service integration, we recommend that you create a single application, and then use that application to run multiple jobs. For example, in a single state machine, you can include multiple startJobRun requests, all of which use the same application. The following Task state state examples show use cases to integrate EMR Serverless APIs with Step Functions. For information about other use cases of EMR Serverless, see What is Amazon EMR Serverless.

Tip

To deploy an example of a state machine that integrates with EMR Serverless for running multiple jobs to your Amazon Web Services account, see Run an EMR Serverless job.

For information about how to configure IAM permissions when using Step Functions with other Amazon services, see IAM Policies for integrated services.

In the examples shown in the following use cases, replace the italicized text with your resource-specific information. For example, replace yourApplicationId with the ID of your EMR Serverless application, such as 00yv7iv71inak893.

Create an application

The following Task state example creates an application using the createApplication.sync service integration API.

"Create_Application": { "Type": "Task", "Resource": "arn:aws-cn:states:::emr-serverless:createApplication.sync", "Parameters": { "Name": "MyApplication", "ReleaseLabel": "emr-6.9.0", "Type": "SPARK" }, "End": true }

Start an application

The following Task state example starts an application using the startApplication.sync service integration API.

"Start_Application": { "Type": "Task", "Resource": "arn:aws-cn:states:::emr-serverless:startApplication.sync", "Parameters": { "ApplicationId": "yourApplicationId" }, "End": true }

Stop an application

The following Task state example stops an application using the stopApplication.sync service integration API.

"Stop_Application": { "Type": "Task", "Resource": "arn:aws-cn:states:::emr-serverless:stopApplication.sync", "Parameters": { "ApplicationId": "yourApplicationId" }, "End": true }

Delete an application

The following Task state example deletes an application using the deleteApplication.sync service integration API.

"Delete_Application": { "Type": "Task", "Resource": "arn:aws-cn:states:::emr-serverless:deleteApplication.sync", "Parameters": { "ApplicationId": "yourApplicationId" }, "End": true }

Start a job in an application

The following Task state example starts a job in an application using the startJobRun.sync service integration API.

"Start_Job": { "Type": "Task", "Resource": "arn:aws-cn:states:::emr-serverless:startJobRun.sync", "Parameters": { "ApplicationId": "yourApplicationId", "ExecutionRoleArn": "arn:aws-cn:iam::123456789012:role/myEMRServerless-execution-role", "JobDriver": { "SparkSubmit": { "EntryPoint": "s3://<mybucket>/sample.py", "EntryPointArguments": ["1"], "SparkSubmitParameters": "--conf spark.executor.cores=4 --conf spark.executor.memory=4g --conf spark.driver.cores=2 --conf spark.driver.memory=4g --conf spark.executor.instances=1" } } }, "End": true }

Cancel a job in an application

The following Task state example cancels a job in an application using the cancelJobRun.sync service integration API.

"Cancel_Job": { "Type": "Task", "Resource": "arn:aws-cn:states:::emr-serverless:cancelJobRun.sync", "Parameters": { "ApplicationId.$": "$.ApplicationId", "JobRunId.$": "$.JobRunId" }, "End": true }