Create a long-lived Amazon EMR cluster and run several steps using an Amazon SDK - Amazon Identity and Access Management
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Create a long-lived Amazon EMR cluster and run several steps using an Amazon SDK

The following code example shows how to create a long-lived Amazon EMR cluster and run several steps.

Python
SDK for Python (Boto3)

Create a long-lived Amazon EMR cluster that uses Apache Spark to query historical Amazon review data from the Amazon Customer Reviews Dataset. Run a job that gets data for top-rated products in specific categories that contain keywords in their product titles. Job results are written to an Amazon Simple Storage Service (Amazon S3) bucket.

  • Create an Amazon S3 bucket and upload a job script.

  • Create Amazon Identity and Access Management (IAM) roles.

  • Create Amazon Elastic Compute Cloud (Amazon EC2) security groups.

  • Create a long-lived cluster and run several job steps.

This example is best viewed on GitHub. For complete source code and instructions on how to set up and run, see the full example on GitHub.

Services used in this example
  • Amazon EC2

  • Amazon EMR

  • IAM

  • Amazon S3

For a complete list of Amazon SDK developer guides and code examples, see Using IAM with an Amazon SDK. This topic also includes information about getting started and details about previous SDK versions.