Connect to an Amazon EMR cluster from Studio Classic using runtime IAM roles - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Connect to an Amazon EMR cluster from Studio Classic using runtime IAM roles

When you connect to an Amazon EMR cluster from your Amazon SageMaker Studio Classic notebook, you can visually browse a list of IAM roles, known as runtime roles, and select one on the fly. Subsequently, all your Apache Spark, Apache Hive, or Presto jobs created from your Studio Classic notebook access only the data and resources permitted by policies attached to the runtime role. Also, when data is accessed from data lakes managed with Amazon Lake Formation, you can enforce table-level and column-level access using policies attached to the runtime role.

With this capability, you and your teammates can connect to the same cluster, each using a runtime role scoped with permissions matching your individual level of access to data. Your sessions are also isolated from one another on the shared cluster. With this ability to control fine-grained access to data on the same shared cluster, you can simplify provisioning of Amazon EMR clusters, reducing operational overhead and saving costs.

To try out this new feature, see Apply fine-grained data access controls with Amazon Lake Formation and Amazon EMR from Amazon SageMaker Studio Classic . This blog post helps you set up a demo environment where you can try using preconfigured runtime roles to connect to Amazon EMR clusters.

Prerequisites

Before you get started, make sure you meet the following prerequisites:

Cross-account connection scenarios

Runtime role authentication supports a variety of cross-account connection scenarios when your data resides outside of your Studio Classic account. The following image shows three different ways you can assign your Amazon EMR cluster, data, and even Amazon EMR execution role between your Studio Classic and data accounts:

Cross-account scenarios supported by runtime IAM role authentication.

In option 1, your Amazon EMR cluster and Amazon EMR execution role are in a separate data account from your Studio Classic account. You define a separate Amazon EMR access role permission policy which grants permission to your Studio Classic execution role to assume the Amazon EMR access role. The Amazon EMR access role then calls the Amazon EMR API GetClusterSessionCredentials on behalf of your Studio Classic execution role, giving you access to the cluster.

In option 2, your Amazon EMR cluster and Amazon EMR execution role are in your Studio Classic account. Your Studio Classic execution role has permission to use the Amazon EMR API GetClusterSessionCredentials to gain access to your cluster. To access the Amazon S3 bucket, give the Amazon EMR execution role cross-account Amazon S3 bucket access permissions — you grant these permissions within your Amazon S3 bucket policy.

In option 3, your Amazon EMR clusters are in your Studio Classic account, and the Amazon EMR execution role is in the data account. Your Studio Classic execution role has permission to use the Amazon EMR API GetClusterSessionCredentials to gain access to your cluster. Add the Amazon EMR execution role into the execution role configuration JSON. Then you can select the role in the UI when you choose your cluster. For details about how to set up your execution role configuration JSON file, see Preload your execution roles into Studio Classic.

Set up Studio Classic to use runtime IAM roles

To establish runtime role authentication for your Amazon EMR clusters, configure the required IAM policies, network, and usability enhancements. Your setup depends on whether you handle any cross-account arrangements if your Amazon EMR clusters, Amazon EMR execution role, or both, reside outside of your Amazon SageMaker Studio Classic account. The following discussion guides you through the policies to install, how to configure the network to allow traffic between cross-accounts, and the local configuration file to set up to automate your Amazon EMR connection.

Configure runtime role authentication when your Amazon EMR cluster and Studio Classic are in the same account

If your Amazon EMR cluster resides in your Studio Classic account, add the basic policy to connect to your Amazon EMR cluster and set permissions to call the Amazon EMR API GetClusterSessionCredentials, which gives you access to the cluster. Complete the following steps to add necessary permissions to your Studio Classic execution policy:

  1. Add the required IAM policy to connect to Amazon EMR clusters. For details, see Discover Amazon EMR clusters from SageMaker Studio Classic.

  2. Grant permission to call the Amazon EMR API GetClusterSessionCredentials when you pass one or more permitted Amazon EMR execution roles specified in the policy.

  3. (Optional) Grant permission to pass IAM roles that follow any user-defined naming conventions.

  4. (Optional) Grant permission to access Amazon EMR clusters that are tagged with specific user-defined strings.

  5. If you don't want to manually call the Amazon EMR connection command, install a SageMaker configuration file in your local Amazon EFS and select the role to use when you select your Amazon EMR cluster. For details about how to preload your IAM roles, see Preload your execution roles into Studio Classic.

The following example policy permits Amazon EMR execution roles belonging to the modeling and training groups to call GetClusterSessionCredentials. In addition, the policyholder can access Amazon EMR clusters tagged with the strings modeling or training.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": "elasticmapreduce:GetClusterSessionCredentials", "Resource": "*", "Condition": { "StringLike": { "elasticmapreduce:ExecutionRoleArn": [ "arn:aws:iam::123456780910:role/emr-execution-role-ml-modeling*", "arn:aws:iam::123456780910:role/emr-execution-role-ml-training*" ], "elasticmapreduce:ResourceTag/group": [ "*modeling*", "*training*" ] } } } ] }

Configure runtime role authentication when your cluster and Studio Classic are in different accounts

If your Amazon EMR cluster is not in your Studio Classic account, allow your Studio Classic execution role to assume the cross-account Amazon EMR access role so you can connect to the cluster. Complete the following steps to set up your cross-account configuration:

  1. Create your Studio Classic execution role permission policy so that the execution role can assume the Amazon EMR access role. The following policy is an example:

    { "Version": "2012-10-17", "Statement": [ { "Sid": "AllowAssumeCrossAccountEMRAccessRole", "Effect": "Allow", "Action": "sts:AssumeRole", "Resource": "arn:aws:iam::emr_account_id:role/emr-access-role-name" } ] }
  2. Create the trust policy to specify which Studio Classic account IDs are trusted to assume the Amazon EMR access role. The following policy is an example:

    { "Version": "2012-10-17", "Statement": [ { "Sid": "AllowCrossAccountSageMakerExecutionRoleToAssumeThisRole", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::studio_account_id:role/studio_execution_role" }, "Action": "sts:AssumeRole" } }
  3. Create the Amazon EMR access role permission policy, which grants the Amazon EMR execution role the needed permissions to carry out the intended tasks on the cluster. Configure the Amazon EMR access role to call the API GetClusterSessionCredentials with the Amazon EMR execution roles specified in the access role permission policy. The following policy is an example:

    { "Version": "2012-10-17", "Statement": [ { "Sid": "AllowCallingEmrGetClusterSessionCredentialsAPI", "Effect": "Allow", "Action": "elasticmapreduce:GetClusterSessionCredentials", "Resource": "", "Condition": { "StringLike": { "elasticmapreduce:ExecutionRoleArn": [ "arn:aws:iam::emr_account_id:role/emr-execution-role-name" ] } } } ] }
  4. Set up the cross-account network so that traffic can move back and forth between your accounts. For guided instruction, see Set up the network in the blog post Create and manage Amazon EMR Clusters from SageMaker Studio Classic to run interactive Spark and ML workloads – Part 2. The steps in the blog post help you complete the following tasks:

    1. VPC-peer your Studio Classic account and your Amazon EMR account to establish a connection.

    2. Manually add routes to the private subnet route tables in both accounts. This permits creation and connection of Amazon EMR clusters from the Studio Classic account to the remote account’s private subnet.

    3. Set up the security group attached to your Studio Classic domain to allow outbound traffic and the security group of the Amazon EMR primary node to allow inbound TCP traffic from the Studio Classic instance security group.

  5. If you don't want to manually call the Amazon EMR connection command, install a SageMaker configuration file in your local Amazon EFS so you can select the role to use when you choose your Amazon EMR cluster. For details about how to preload your IAM roles, see Preload your execution roles into Studio Classic.

Configure Lake Formation access

When you access data from data lakes managed by Amazon Lake Formation, you can enforce table-level and column-level access using policies attached to your runtime role. To configure permission for Lake Formation access, see Integrate Amazon EMR with Amazon Lake Formation.

Preload your execution roles into Studio Classic

If you don't want to manually call the Amazon EMR connection command, you can install a SageMaker configuration file in your local Amazon EFS so you can select the execution role to use when you choose your Amazon EMR cluster.

To write a configuration file for the Amazon EMR execution roles, associate a Use lifecycle configurations with Amazon SageMaker Studio Classic (LCC) to the Jupyter server application. Alternatively, you can write or update the configuration file and restart the Jupyter server with the command: restart-jupyter-server.

The following snippet is an example LCC bash script you can apply if your Studio Classic application and cluster are in the same account:

#!/bin/bash set -eux FILE_DIRECTORY="/home/sagemaker-user/.sagemaker-analytics-configuration-DO_NOT_DELETE" FILE_NAME="emr-configurations-DO_NOT_DELETE.json" FILE="$FILE_DIRECTORY/$FILE_NAME" mkdir -p $FILE_DIRECTORY cat << 'EOF' > "$FILE" { "emr-execution-role-arns": { "123456789012": [ "arn:aws:iam::123456789012:role/emr-execution-role-1", "arn:aws:iam::123456789012:role/emr-execution-role-2" ] } } EOF

If your Studio Classic application and clusters are in different accounts, specify the Amazon EMR access roles that can use the cluster. In the following example policy, 123456789012 is the ARN for the Amazon EMR cluster account, and 212121212121 and 434343434343 are the ARNs for the permitted Amazon EMR access roles.

#!/bin/bash set -eux FILE_DIRECTORY="/home/sagemaker-user/.sagemaker-analytics-configuration-DO_NOT_DELETE" FILE_NAME="emr-configurations-DO_NOT_DELETE.json" FILE="$FILE_DIRECTORY/$FILE_NAME" mkdir -p $FILE_DIRECTORY cat << 'EOF' > "$FILE" { "emr-execution-role-arns": { "123456789012": [ "arn:aws:iam::212121212121:role/emr-execution-role-1", "arn:aws:iam::434343434343:role/emr-execution-role-2" ] } } EOF # add your cross-account EMR access role FILE_DIRECTORY="/home/sagemaker-user/.cross-account-configuration-DO_NOT_DELETE" FILE_NAME="emr-discovery-iam-role-arns-DO_NOT_DELETE.json" FILE="$FILE_DIRECTORY/$FILE_NAME" mkdir -p $FILE_DIRECTORY cat << 'EOF' > "$FILE" { "123456789012": "arn:aws:iam::123456789012:role/cross-account-emr-access-role" } EOF