Configure Amazon EMR CloudFormation templates in the Service Catalog
This topic assumes administrators are familiar with Amazon CloudFormation, portfolios and products in Amazon Service Catalog, as well as Amazon EMR.
To simplify the creation of Amazon EMR clusters from Studio, administrators can register an Amazon EMR CloudFormation template as a product in an Amazon Service Catalog portfolio. To make the template available to data scientists, they must associate the portfolio with the SageMaker execution role used in Studio or Studio Classic. Finally, to allow users to discover templates, provision clusters, and connect to Amazon EMR clusters from Studio or Studio Classic, administrators need to set appropriate access permissions.
The Amazon EMR Amazon CloudFormation templates can allow end-users to customize various cluster aspects. For example, administrators can define an approved list of instance types that users can choose from when creating a cluster.
The following instructions use end-to-end CloudFormation stacks
Note
The GitHub repository aws-samples/sagemaker-studio-emr
The sagemaker-studio-emr/cloudformation/emr_servicecatalog_templates
Refer to Connect to an Amazon EMR cluster from SageMaker Studio or Studio Classic for details on the authentication methods you can use to connect to an Amazon EMR cluster.
To let data scientists discover Amazon EMR CloudFormation templates and provision clusters from Studio or Studio Classic, follow these steps.
Step 0: Check your networking and prepare your CloudFormation stack
Before you start:
-
Ensure that you have reviewed the networking and security requirements in Configure networking.
-
You must have an existing end-to-end CloudFormation stack that supports the authentication method of your choice. You can find examples of such CloudFormation templates in the aws-samples/sagemaker-studio-emr
GitHub repository. The following steps highlight the specific configurations in your end-to-end stack to enable the use of Amazon EMR templates within Studio or Studio Classic.
Step 1: Associate your Service Catalog portfolio with SageMaker
In your Service Catalog portfolio, associate your portfolio ID with the SageMaker execution role accessing your cluster.
To do so, add the following section (here in YAML format) to your stack. This grants the SageMaker execution role access to the specified Service Catalog portfolio containing products like Amazon EMR templates. It allows roles assumed by SageMaker to launch those products.
Replace SageMakerExecutionRole.Arn
and
SageMakerStudioEMRProductPortfolio.ID
with their
actual values.
SageMakerStudioEMRProductPortfolioPrincipalAssociation: Type: AWS::ServiceCatalog::PortfolioPrincipalAssociation Properties: PrincipalARN:
SageMakerExecutionRole.Arn
PortfolioId:SageMakerStudioEMRProductPortfolio.ID
PrincipalType: IAM
Note
What execution role should you consider?
The Studio UI determines its permissions from the execution role associated with the user profile that launched it. The UI sets these permissions at the time of launch. However, the spaces that launch JupyterLab or Studio Classic applications can have separate permissions.
For consistent access to Amazon EMR templates and clusters across applications (such as the Studio UI, JupyterLab, and Studio Classic), grant the same subset of permissions to all roles at the domain, user profile, or space level. The permissions should allow discovering and provisioning Amazon EMR clusters.
For details on the required set of IAM permissions, see the permissions section.
Step 2: Reference an Amazon EMR template in a Service Catalog product
In a Service Catalog product of your portfolio, reference an Amazon EMR template resource and ensure its visibility in Studio or Studio Classic.
To do so, reference the Amazon EMR template resource in the Service Catalog product definition,
and then add the following tag key "sagemaker:studio-visibility:emr"
set to the value "true"
(see the example in YAML format).
In the Service Catalog product definition, the Amazon CloudFormation template of the cluster is referenced via URL. The additional tag set to true ensures the visibility of the Amazon EMR templates in Studio or Studio Classic.
Note
The Amazon EMR template referenced by the provided URL in the example does not enforce any authentication requirements when launched. This option is meant for demonstration and learning purposes. It is not recommended in a production environment.
SMStudioEMRNoAuthProduct: Type: AWS::ServiceCatalog::CloudFormationProduct Properties: Owner: AWS Name: SageMaker Studio Domain No Auth EMR ProvisioningArtifactParameters: - Name: SageMaker Studio Domain No Auth EMR Description: Provisions a SageMaker domain and No Auth EMR Cluster Info: LoadTemplateFromURL:
Link to your CloudFormation template. For example, https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/astra-m4-sagemaker/end-to-end/CFN-EMR-NoStudioNoAuthTemplate-v3.yaml
Tags: - Key: "sagemaker:studio-visibility:emr" Value: "true"
Step 3: Parameterize the Amazon EMR CloudFormation template
The CloudFormation template used to define the Amazon EMR cluster
within the Service Catalog product allows administrators to specify configurable
parameters. Administrators can define Default
values and
AllowedValues
ranges for these parameters within the template's
Parameters
section. During the cluster launch process, data
scientists can provide custom inputs or make selections from those predefined
options to customize certain aspects of their Amazon EMR cluster.
The following example illustrates additional input parameters that administrators can set when creating an Amazon EMR template.
"Parameters": { "EmrClusterName": { "Type": "String", "Description": "EMR cluster Name." }, "MasterInstanceType": { "Type": "String", "Description": "Instance type of the EMR master node.", "Default": "m5.xlarge", "AllowedValues": [ "m5.xlarge", "m5.2xlarge", "m5.4xlarge" ] }, "CoreInstanceType": { "Type": "String", "Description": "Instance type of the EMR core nodes.", "Default": "m5.xlarge", "AllowedValues": [ "m5.xlarge", "m5.2xlarge", "m5.4xlarge", "m3.medium", "m3.large", "m3.xlarge", "m3.2xlarge" ] }, "CoreInstanceCount": { "Type": "String", "Description": "Number of core instances in the EMR cluster.", "Default": "2", "AllowedValues": [ "2", "5", "10" ] }, "EmrReleaseVersion": { "Type": "String", "Description": "The release version of EMR to launch.", "Default": "emr-5.33.1", "AllowedValues": [ "emr-5.33.1", "emr-6.4.0" ] } }
After administrators have made the Amazon EMR CloudFormation templates available within
Studio, data scientists can use them to self-provision Amazon EMR clusters. The
Parameters
section defined in the template translates into input
fields on the cluster creation form within Studio or Studio Classic. For each
parameter, data scientists can either enter a custom value into the input box or
select from the predefined options listed in a dropdown menu, which corresponds to
the AllowedValues
specified in the template.
The following illustration shows the dynamic form assembled from a CloudFormation Amazon EMR template to create an Amazon EMR cluster in Studio or Studio Classic.
Visit Launch an Amazon EMR cluster from Studio or Studio Classic to learn about how to launch a cluster from Studio or Studio Classic using those Amazon EMR templates.
Step 4: Set up the permissions to enable listing and launching Amazon EMR clusters from Studio
Last, attach the required IAM permissions to enable listing existing running Amazon EMR clusters and self-provisioning new clusters from Studio or Studio Classic.
The role(s) to which you must add those permissions depends on whether Studio or Studio Classic and Amazon EMR are deployed in the same account (choose Single Account) or in different accounts (choose Cross account).
Note
Studio does not currently support accessing Amazon EMR clusters created in a different Amazon account than the account in which Studio is deployed. Cross account access is available in Studio Classic only.
For more information on cross-account access using roles, see Cross account resource access in IAM.
If your Amazon EMR clusters and Studio or Studio Classic are deployed in the same Amazon account, attach the following permissions to the SageMaker execution role accessing your cluster.
Note
What execution role should you consider?
The Studio UI determines its permissions from the execution role associated with the user profile that launched it. The UI sets these permissions at the time of launch. However, the spaces that launch JupyterLab or Studio Classic applications can have separate permissions.
For consistent access to Amazon EMR templates and clusters across applications (such as the Studio UI, JupyterLab, and Studio Classic), grant the same subset of permissions to all roles at the domain, user profile, or space level. The permissions should allow discovering and provisioning Amazon EMR clusters.
-
Find the execution role of your domain, user profile, or space. For information on how to retrieve the execution role, see Get your execution role.
-
Open the IAM console at https://console.amazonaws.cn/sagemaker/
. -
Choose Roles and then search for the role you created by typing in your role name in the Search field.
-
Follow the link to your role.
-
Choose Add permissions and then Create inline policy.
-
In the JSON tab, add the following JSON policy with the permissions:
-
AllowPresignedUrl
allows generating pre-signed URLs for accessing the Spark UI from within Studio or Studio Classic. -
AllowClusterDiscovery
andAllowClusterDetailsDiscovery
allow listing and describing Amazon EMR clusters in the account/region from Studio or Studio Classic. -
AllowEMRTemplateDiscovery
allows searching for Amazon EMR templates in the Service Catalog. Studio and Studio Classic use this to show available templates. -
AllowSagemakerProjectManagement
allows creating and deleting . In SageMaker, access to the Amazon Service Catalog is managed through Automate MLOps with SageMaker Projects.
The IAM policy defined in the provided JSON grants those permissions. Replace
studio-region
andstudio-account
with your actual region and Amazon account ID values before copying the list of statements to the inline policy of your role.{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowPresignedUrl", "Effect": "Allow", "Action": [ "elasticmapreduce:CreatePersistentAppUI", "elasticmapreduce:DescribePersistentAppUI", "elasticmapreduce:GetPersistentAppUIPresignedURL", "elasticmapreduce:GetOnClusterAppUIPresignedURL" ], "Resource": [ "arn:aws:elasticmapreduce:
studio-region
:studio-account
:cluster/*" ] }, { "Sid": "AllowClusterDetailsDiscovery", "Effect": "Allow", "Action": [ "elasticmapreduce:DescribeCluster", "elasticmapreduce:ListInstances", "elasticmapreduce:ListInstanceGroups", "elasticmapreduce:DescribeSecurityConfiguration" ], "Resource": [ "arn:aws:elasticmapreduce:studio-region
:studio-account
:cluster/*" ] }, { "Sid": "AllowClusterDiscovery", "Effect": "Allow", "Action": [ "elasticmapreduce:ListClusters" ], "Resource": "*" }, { "Sid": "AllowEMRTemplateDiscovery", "Effect": "Allow", "Action": [ "servicecatalog:SearchProducts" ], "Resource": "*" }, { "Sid": "AllowSagemakerProjectManagement", "Effect": "Allow", "Action": [ "sagemaker:CreateProject", "sagemaker:DeleteProject" ], "Resource": "arn:aws:sagemaker:studio-region
:studio-account
:project/*" } ] } -
-
Choose Next and then provide a Policy name.
-
Choose Create policy.
If your Amazon EMR clusters and Studio or Studio Classic are deployed in separate Amazon accounts, you configure the permissions on both accounts.
On the Amazon EMR account
On the account where Amazon EMR is deployed, also referred to as the
trusting account, create a custom
IAM role named ASSUMABLE-ROLE
with the following
configuration:
-
Permissions: Grant the necessary permissions to
ASSUMABLE-ROLE
to allow accessing Amazon EMR resources. -
Trust relationship: Configure the trust policy for
ASSUMABLE-ROLE
to allow assuming the role from the Studio account that requires access.
By assuming the role, Studio or Studio Classic can gain temporary access to the permissions it needs in Amazon EMR.
-
Create a new policy for the role.
-
Open the IAM console at https://console.amazonaws.cn/sagemaker/
. -
In the left menu, choose Policies and then Create policy.
-
In the JSON tab, add the following JSON policy with the permissions:
-
AllowPresignedUrl
allows generating pre-signed URLs for accessing the Spark UI from within Studio. -
AllowClusterDiscovery
andAllowClusterDetailsDiscovery
allows listing and describing Amazon EMR clusters in the account/region from Studio.
Replace
emr-region
andemr-account
with your actual region and Amazon account ID values before copying the JSON to your policy.{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowPresignedUrl", "Effect": "Allow", "Action": [ "elasticmapreduce:CreatePersistentAppUI", "elasticmapreduce:DescribePersistentAppUI", "elasticmapreduce:GetPersistentAppUIPresignedURL", "elasticmapreduce:GetOnClusterAppUIPresignedURL" ], "Resource": [ "arn:aws:elasticmapreduce:
emr-region
:emr-account
:cluster/*" ] }, { "Sid": "AllowClusterDetailsDiscovery", "Effect": "Allow", "Action": [ "elasticmapreduce:DescribeCluster", "elasticmapreduce:ListInstances", "elasticmapreduce:ListInstanceGroups", "elasticmapreduce:DescribeSecurityConfiguration" ], "Resource": [ "arn:aws:elasticmapreduce:emr-region
:emr-account
:cluster/*" ] }, { "Sid": "AllowClusterDiscovery", "Effect": "Allow", "Action": [ "elasticmapreduce:ListClusters" ], "Resource": "*" } ] } -
-
Name your policy and choose Create policy.
-
-
Create a custom IAM role named
ASSUMABLE-ROLE
, and then attach your new policy to the role.-
In the IAM console, choose Roles in the left menu, and then Create role.
-
For Trusted entity type, choose Amazon account and then Next.
-
Select the permission you just created and then choose Next.
-
Name your role
ASSUMABLE-ROLE
and then choose the Edit button on the right of Step 1: Select trusted entities. -
For Trusted entity type, choose Custom trust policy and then paste the following trust relationship. This grants the account where Studio is deployed (the trusted account) the permission to assume this role.
Replace
studio-account
with its actual Amazon account ID. Choose Next.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::
studio-account
:root" }, "Action": "sts:AssumeRole" } ] } -
Find and select the permission you just created again and then choose Next.
-
Your trust policy should be updated with the latest JSON you pasted. Choose Create role.
-
For more information about creating a role on an Amazon account, see Creating an IAM role (console).
On the Studio account
On the account where Studio or Studio Classic is deployed, also referred to as the trusted account, update the SageMaker execution role accessing your cluster with the required permissions to access resources in the trusting account.
Note
What execution role should you consider?
The Studio UI determines its permissions from the execution role associated with the user profile that launched it. The UI sets these permissions at the time of launch. However, the spaces that launch JupyterLab or Studio Classic applications can have separate permissions.
For consistent access to Amazon EMR templates and clusters across applications (such as the Studio UI, JupyterLab, and Studio Classic), grant the same subset of permissions to all roles at the domain, user profile, or space level. The permissions should allow discovering and provisioning Amazon EMR clusters.
-
Find the execution role of your domain, user profile, or space. For information on how to retrieve the execution role, see Get your execution role.
-
Open the IAM console at https://console.amazonaws.cn/sagemaker/
. -
Choose Roles and then search for the role you created by typing in your role name in the Search field.
-
Follow the link to your role.
-
Choose Add permissions and then Create inline policy.
-
In the JSON tab, add the following JSON policy with the permissions:
-
AllowEMRTemplateDiscovery
allows searching for Amazon EMR templates in the Service Catalog. Studio Classic uses this to show available templates. -
AllowSagemakerProjectManagement
allows creating and deleting . In SageMaker, access to the Amazon Service Catalog is managed through Automate MLOps with SageMaker Projects.
The IAM policy defined in the provided JSON grants those permissions. Replace
studio-region
andstudio-account
with your actual region and Amazon account ID values before copying the list of statements to your policy.{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowEMRTemplateDiscovery", "Effect": "Allow", "Action": [ "servicecatalog:SearchProducts" ], "Resource": "*" }, { "Sid": "AllowSagemakerProjectManagement", "Effect": "Allow", "Action": [ "sagemaker:CreateProject", "sagemaker:DeleteProject" ], "Resource": "arn:aws:sagemaker:
studio-region
:studio-account
:project/*" } ] } -
-
Choose Next and then provide a Policy name.
-
Choose Create policy.
-
Repeat the step to add another inline policy to the Studio execution role. The policy should allow cross-account role assumption for discovering resources in another account.
On your execution role detail page, choose Add permissions and then Create inline policy.
-
In the JSON tab, add the following JSON policy. Update the
emr-account
with the account ID of the Amazon EMR account.{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowRoleAssumptionForCrossAccountDiscovery", "Effect": "Allow", "Action": "sts:AssumeRole", "Resource": ["arn:aws:iam::
emr-account
:role/ASSUMABLE-ROLE
" ] }] } -
Choose Next, provide a Policy name, and then choose Create policy.
-
To allow listing Amazon EMR clusters deployed in the same account as Studio, add an additional inline policy to your Studio execution role as defined in the Single account tab of Configure listing Amazon EMR clusters.
Pass the role's ARN at the Jupyter server launch
Last, see Additional
configuration for cross-account access to
learn about how to provide the ARN of the ASSUMABLE-ROLE
to
your Studio execution role. The ARN is loaded by the Jupyter server
at launch. The execution role used by Studio assumes that
cross-account role to discover and connect to Amazon EMR clusters in the
trusting account.