Configure Amazon EMR CloudFormation templates in the Service Catalog - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Configure Amazon EMR CloudFormation templates in the Service Catalog

This topic assumes administrators are familiar with Amazon CloudFormation, portfolios and products in Amazon Service Catalog, as well as Amazon EMR.

To simplify the creation of Amazon EMR clusters from Studio, administrators can register an Amazon EMR CloudFormation template as a product in an Amazon Service Catalog portfolio. To make the template available to data scientists, they must associate the portfolio with the SageMaker execution role used in Studio or Studio Classic. Finally, to allow users to discover templates, provision clusters, and connect to Amazon EMR clusters from Studio or Studio Classic, administrators need to set appropriate access permissions.

The Amazon EMR Amazon CloudFormation templates can allow end-users to customize various cluster aspects. For example, administrators can define an approved list of instance types that users can choose from when creating a cluster.

The following instructions use end-to-end CloudFormation stacks to setup a Studio or Studio Classic domain, a user profile, a Service Catalog portfolio, and populate an Amazon EMR launch template. The following steps highlight the specific settings that administrators must apply in their end-to-end stack to enable Studio or Studio Classic to access Service Catalog products and provision Amazon EMR clusters.

Note

The GitHub repository aws-samples/sagemaker-studio-emr contains example end-to-end CloudFormation stacks that deploy the necessary IAM roles, networking, SageMaker domain, user profile, Service Catalog portfolio, and add an Amazon EMR launch CloudFormation template. The templates provide different authentication options between Studio or Studio Classic and the Amazon EMR cluster. In these example templates, the parent CloudFormation stack passes SageMaker VPC, security group, and subnet parameters to the Amazon EMR cluster template.

The sagemaker-studio-emr/cloudformation/emr_servicecatalog_templates repository contains various sample Amazon EMR CloudFormation launch templates, including options for single account and cross-account deployments.

Refer to Connect to an Amazon EMR cluster from SageMaker Studio or Studio Classic for details on the authentication methods you can use to connect to an Amazon EMR cluster.

To let data scientists discover Amazon EMR CloudFormation templates and provision clusters from Studio or Studio Classic, follow these steps.

Step 0: Check your networking and prepare your CloudFormation stack

Before you start:

  • Ensure that you have reviewed the networking and security requirements in Configure networking.

  • You must have an existing end-to-end CloudFormation stack that supports the authentication method of your choice. You can find examples of such CloudFormation templates in the aws-samples/sagemaker-studio-emr GitHub repository. The following steps highlight the specific configurations in your end-to-end stack to enable the use of Amazon EMR templates within Studio or Studio Classic.

Step 1: Associate your Service Catalog portfolio with SageMaker

In your Service Catalog portfolio, associate your portfolio ID with the SageMaker execution role accessing your cluster.

To do so, add the following section (here in YAML format) to your stack. This grants the SageMaker execution role access to the specified Service Catalog portfolio containing products like Amazon EMR templates. It allows roles assumed by SageMaker to launch those products.

Replace SageMakerExecutionRole.Arn and SageMakerStudioEMRProductPortfolio.ID with their actual values.

SageMakerStudioEMRProductPortfolioPrincipalAssociation: Type: AWS::ServiceCatalog::PortfolioPrincipalAssociation Properties: PrincipalARN: SageMakerExecutionRole.Arn PortfolioId: SageMakerStudioEMRProductPortfolio.ID PrincipalType: IAM
Note

What execution role should you consider?

The Studio UI determines its permissions from the execution role associated with the user profile that launched it. The UI sets these permissions at the time of launch. However, the spaces that launch JupyterLab or Studio Classic applications can have separate permissions.

For consistent access to Amazon EMR templates and clusters across applications (such as the Studio UI, JupyterLab, and Studio Classic), grant the same subset of permissions to all roles at the domain, user profile, or space level. The permissions should allow discovering and provisioning Amazon EMR clusters.

For details on the required set of IAM permissions, see the permissions section.

Step 2: Reference an Amazon EMR template in a Service Catalog product

In a Service Catalog product of your portfolio, reference an Amazon EMR template resource and ensure its visibility in Studio or Studio Classic.

To do so, reference the Amazon EMR template resource in the Service Catalog product definition, and then add the following tag key "sagemaker:studio-visibility:emr" set to the value "true" (see the example in YAML format).

In the Service Catalog product definition, the Amazon CloudFormation template of the cluster is referenced via URL. The additional tag set to true ensures the visibility of the Amazon EMR templates in Studio or Studio Classic.

Note

The Amazon EMR template referenced by the provided URL in the example does not enforce any authentication requirements when launched. This option is meant for demonstration and learning purposes. It is not recommended in a production environment.

SMStudioEMRNoAuthProduct: Type: AWS::ServiceCatalog::CloudFormationProduct Properties: Owner: AWS Name: SageMaker Studio Domain No Auth EMR ProvisioningArtifactParameters: - Name: SageMaker Studio Domain No Auth EMR Description: Provisions a SageMaker domain and No Auth EMR Cluster Info: LoadTemplateFromURL: Link to your CloudFormation template. For example, https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/astra-m4-sagemaker/end-to-end/CFN-EMR-NoStudioNoAuthTemplate-v3.yaml Tags: - Key: "sagemaker:studio-visibility:emr" Value: "true"

Step 3: Parameterize the Amazon EMR CloudFormation template

The CloudFormation template used to define the Amazon EMR cluster within the Service Catalog product allows administrators to specify configurable parameters. Administrators can define Default values and AllowedValues ranges for these parameters within the template's Parameters section. During the cluster launch process, data scientists can provide custom inputs or make selections from those predefined options to customize certain aspects of their Amazon EMR cluster.

The following example illustrates additional input parameters that administrators can set when creating an Amazon EMR template.

"Parameters": { "EmrClusterName": { "Type": "String", "Description": "EMR cluster Name." }, "MasterInstanceType": { "Type": "String", "Description": "Instance type of the EMR master node.", "Default": "m5.xlarge", "AllowedValues": [ "m5.xlarge", "m5.2xlarge", "m5.4xlarge" ] }, "CoreInstanceType": { "Type": "String", "Description": "Instance type of the EMR core nodes.", "Default": "m5.xlarge", "AllowedValues": [ "m5.xlarge", "m5.2xlarge", "m5.4xlarge", "m3.medium", "m3.large", "m3.xlarge", "m3.2xlarge" ] }, "CoreInstanceCount": { "Type": "String", "Description": "Number of core instances in the EMR cluster.", "Default": "2", "AllowedValues": [ "2", "5", "10" ] }, "EmrReleaseVersion": { "Type": "String", "Description": "The release version of EMR to launch.", "Default": "emr-5.33.1", "AllowedValues": [ "emr-5.33.1", "emr-6.4.0" ] } }

After administrators have made the Amazon EMR CloudFormation templates available within Studio, data scientists can use them to self-provision Amazon EMR clusters. The Parameters section defined in the template translates into input fields on the cluster creation form within Studio or Studio Classic. For each parameter, data scientists can either enter a custom value into the input box or select from the predefined options listed in a dropdown menu, which corresponds to the AllowedValues specified in the template.

The following illustration shows the dynamic form assembled from a CloudFormation Amazon EMR template to create an Amazon EMR cluster in Studio or Studio Classic.

Illustration of a dynamic form assembled from a CloudFormation Amazon EMR template to create an Amazon EMR cluster from Studio or Studio Classic.

Visit Launch an Amazon EMR cluster from Studio or Studio Classic to learn about how to launch a cluster from Studio or Studio Classic using those Amazon EMR templates.

Step 4: Set up the permissions to enable listing and launching Amazon EMR clusters from Studio

Last, attach the required IAM permissions to enable listing existing running Amazon EMR clusters and self-provisioning new clusters from Studio or Studio Classic.

The role(s) to which you must add those permissions depends on whether Studio or Studio Classic and Amazon EMR are deployed in the same account (choose Single Account) or in different accounts (choose Cross account).

Note

Studio does not currently support accessing Amazon EMR clusters created in a different Amazon account than the account in which Studio is deployed. Cross account access is available in Studio Classic only.

For more information on cross-account access using roles, see Cross account resource access in IAM.

If your Amazon EMR clusters and Studio or Studio Classic are deployed in the same Amazon account, attach the following permissions to the SageMaker execution role accessing your cluster.

Note

What execution role should you consider?

The Studio UI determines its permissions from the execution role associated with the user profile that launched it. The UI sets these permissions at the time of launch. However, the spaces that launch JupyterLab or Studio Classic applications can have separate permissions.

For consistent access to Amazon EMR templates and clusters across applications (such as the Studio UI, JupyterLab, and Studio Classic), grant the same subset of permissions to all roles at the domain, user profile, or space level. The permissions should allow discovering and provisioning Amazon EMR clusters.

  1. Find the execution role of your domain, user profile, or space. For information on how to retrieve the execution role, see Get your execution role.

  2. Open the IAM console at https://console.amazonaws.cn/sagemaker/.

  3. Choose Roles and then search for the role you created by typing in your role name in the Search field.

  4. Follow the link to your role.

  5. Choose Add permissions and then Create inline policy.

  6. In the JSON tab, add the following JSON policy with the permissions:

    • AllowPresignedUrl allows generating pre-signed URLs for accessing the Spark UI from within Studio or Studio Classic.

    • AllowClusterDiscovery and AllowClusterDetailsDiscovery allow listing and describing Amazon EMR clusters in the account/region from Studio or Studio Classic.

    • AllowEMRTemplateDiscovery allows searching for Amazon EMR templates in the Service Catalog. Studio and Studio Classic use this to show available templates.

    • AllowSagemakerProjectManagement allows creating and deleting . In SageMaker, access to the Amazon Service Catalog is managed through Automate MLOps with SageMaker Projects.

    The IAM policy defined in the provided JSON grants those permissions. Replace studio-region and studio-account with your actual region and Amazon account ID values before copying the list of statements to the inline policy of your role.

    { "Version": "2012-10-17", "Statement": [ { "Sid": "AllowPresignedUrl", "Effect": "Allow", "Action": [ "elasticmapreduce:CreatePersistentAppUI", "elasticmapreduce:DescribePersistentAppUI", "elasticmapreduce:GetPersistentAppUIPresignedURL", "elasticmapreduce:GetOnClusterAppUIPresignedURL" ], "Resource": [ "arn:aws:elasticmapreduce:studio-region:studio-account:cluster/*" ] }, { "Sid": "AllowClusterDetailsDiscovery", "Effect": "Allow", "Action": [ "elasticmapreduce:DescribeCluster", "elasticmapreduce:ListInstances", "elasticmapreduce:ListInstanceGroups", "elasticmapreduce:DescribeSecurityConfiguration" ], "Resource": [ "arn:aws:elasticmapreduce:studio-region:studio-account:cluster/*" ] }, { "Sid": "AllowClusterDiscovery", "Effect": "Allow", "Action": [ "elasticmapreduce:ListClusters" ], "Resource": "*" }, { "Sid": "AllowEMRTemplateDiscovery", "Effect": "Allow", "Action": [ "servicecatalog:SearchProducts" ], "Resource": "*" }, { "Sid": "AllowSagemakerProjectManagement", "Effect": "Allow", "Action": [ "sagemaker:CreateProject", "sagemaker:DeleteProject" ], "Resource": "arn:aws:sagemaker:studio-region:studio-account:project/*" } ] }
  7. Choose Next and then provide a Policy name.

  8. Choose Create policy.

If your Amazon EMR clusters and Studio or Studio Classic are deployed in separate Amazon accounts, you configure the permissions on both accounts.

On the Amazon EMR account

On the account where Amazon EMR is deployed, also referred to as the trusting account, create a custom IAM role named ASSUMABLE-ROLE with the following configuration:

  • Permissions: Grant the necessary permissions to ASSUMABLE-ROLE to allow accessing Amazon EMR resources.

  • Trust relationship: Configure the trust policy for ASSUMABLE-ROLE to allow assuming the role from the Studio account that requires access.

By assuming the role, Studio or Studio Classic can gain temporary access to the permissions it needs in Amazon EMR.

  • Create a new policy for the role.

    1. Open the IAM console at https://console.amazonaws.cn/sagemaker/.

    2. In the left menu, choose Policies and then Create policy.

    3. In the JSON tab, add the following JSON policy with the permissions:

      • AllowPresignedUrl allows generating pre-signed URLs for accessing the Spark UI from within Studio.

      • AllowClusterDiscovery and AllowClusterDetailsDiscovery allows listing and describing Amazon EMR clusters in the account/region from Studio.

      Replace emr-region and emr-account with your actual region and Amazon account ID values before copying the JSON to your policy.

      { "Version": "2012-10-17", "Statement": [ { "Sid": "AllowPresignedUrl", "Effect": "Allow", "Action": [ "elasticmapreduce:CreatePersistentAppUI", "elasticmapreduce:DescribePersistentAppUI", "elasticmapreduce:GetPersistentAppUIPresignedURL", "elasticmapreduce:GetOnClusterAppUIPresignedURL" ], "Resource": [ "arn:aws:elasticmapreduce:emr-region:emr-account:cluster/*" ] }, { "Sid": "AllowClusterDetailsDiscovery", "Effect": "Allow", "Action": [ "elasticmapreduce:DescribeCluster", "elasticmapreduce:ListInstances", "elasticmapreduce:ListInstanceGroups", "elasticmapreduce:DescribeSecurityConfiguration" ], "Resource": [ "arn:aws:elasticmapreduce:emr-region:emr-account:cluster/*" ] }, { "Sid": "AllowClusterDiscovery", "Effect": "Allow", "Action": [ "elasticmapreduce:ListClusters" ], "Resource": "*" } ] }
    4. Name your policy and choose Create policy.

  • Create a custom IAM role named ASSUMABLE-ROLE, and then attach your new policy to the role.

    1. In the IAM console, choose Roles in the left menu, and then Create role.

    2. For Trusted entity type, choose Amazon account and then Next.

    3. Select the permission you just created and then choose Next.

    4. Name your role ASSUMABLE-ROLE and then choose the Edit button on the right of Step 1: Select trusted entities.

    5. For Trusted entity type, choose Custom trust policy and then paste the following trust relationship. This grants the account where Studio is deployed (the trusted account) the permission to assume this role.

      Replace studio-account with its actual Amazon account ID. Choose Next.

      { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::studio-account:root" }, "Action": "sts:AssumeRole" } ] }
    6. Find and select the permission you just created again and then choose Next.

    7. Your trust policy should be updated with the latest JSON you pasted. Choose Create role.

For more information about creating a role on an Amazon account, see Creating an IAM role (console).

On the Studio account

On the account where Studio or Studio Classic is deployed, also referred to as the trusted account, update the SageMaker execution role accessing your cluster with the required permissions to access resources in the trusting account.

Note

What execution role should you consider?

The Studio UI determines its permissions from the execution role associated with the user profile that launched it. The UI sets these permissions at the time of launch. However, the spaces that launch JupyterLab or Studio Classic applications can have separate permissions.

For consistent access to Amazon EMR templates and clusters across applications (such as the Studio UI, JupyterLab, and Studio Classic), grant the same subset of permissions to all roles at the domain, user profile, or space level. The permissions should allow discovering and provisioning Amazon EMR clusters.

  1. Find the execution role of your domain, user profile, or space. For information on how to retrieve the execution role, see Get your execution role.

  2. Open the IAM console at https://console.amazonaws.cn/sagemaker/.

  3. Choose Roles and then search for the role you created by typing in your role name in the Search field.

  4. Follow the link to your role.

  5. Choose Add permissions and then Create inline policy.

  6. In the JSON tab, add the following JSON policy with the permissions:

    • AllowEMRTemplateDiscovery allows searching for Amazon EMR templates in the Service Catalog. Studio Classic uses this to show available templates.

    • AllowSagemakerProjectManagement allows creating and deleting . In SageMaker, access to the Amazon Service Catalog is managed through Automate MLOps with SageMaker Projects.

    The IAM policy defined in the provided JSON grants those permissions. Replace studio-region and studio-account with your actual region and Amazon account ID values before copying the list of statements to your policy.

    { "Version": "2012-10-17", "Statement": [ { "Sid": "AllowEMRTemplateDiscovery", "Effect": "Allow", "Action": [ "servicecatalog:SearchProducts" ], "Resource": "*" }, { "Sid": "AllowSagemakerProjectManagement", "Effect": "Allow", "Action": [ "sagemaker:CreateProject", "sagemaker:DeleteProject" ], "Resource": "arn:aws:sagemaker:studio-region:studio-account:project/*" } ] }
  7. Choose Next and then provide a Policy name.

  8. Choose Create policy.

  9. Repeat the step to add another inline policy to the Studio execution role. The policy should allow cross-account role assumption for discovering resources in another account.

    On your execution role detail page, choose Add permissions and then Create inline policy.

  10. In the JSON tab, add the following JSON policy. Update the emr-account with the account ID of the Amazon EMR account.

    { "Version": "2012-10-17", "Statement": [ { "Sid": "AllowRoleAssumptionForCrossAccountDiscovery", "Effect": "Allow", "Action": "sts:AssumeRole", "Resource": ["arn:aws:iam::emr-account:role/ASSUMABLE-ROLE" ] }] }
  11. Choose Next, provide a Policy name, and then choose Create policy.

  12. To allow listing Amazon EMR clusters deployed in the same account as Studio, add an additional inline policy to your Studio execution role as defined in the Single account tab of Configure listing Amazon EMR clusters.

Pass the role's ARN at the Jupyter server launch

Last, see Additional configuration for cross-account access to learn about how to provide the ARN of the ASSUMABLE-ROLE to your Studio execution role. The ARN is loaded by the Jupyter server at launch. The execution role used by Studio assumes that cross-account role to discover and connect to Amazon EMR clusters in the trusting account.