Connect to an Amazon EMR cluster from your notebook - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Connect to an Amazon EMR cluster from your notebook

If you connect to an Amazon EMR cluster from your Jupyter notebook in Studio, you might need to perform additional setup. In particular, the following discussion addresses two issues:

  • Passing parameters into your Amazon EMR connection command. In SparkMagic kernels, parameters you pass to your Amazon EMR connection command may not work as expected due to differences in how Papermill passes parameters and how SparkMagic receives parameters. The workaround to address this limitation is to pass parameters as environment variables. For more details about the issue and workaround, see Pass parameters to your EMR connection command.

  • Passing user credentials to Kerberos, LDAP, or HTTP Basic Auth-authenticated Amazon EMR clusters. In interactive mode, Studio asks for credentials in a popup form where you can enter your sign-in credentials. In your noninteractive scheduled notebook, you have to pass them through the Amazon Secrets Manager. For more details about how to use the Amazon Secrets Manager in your scheduled notebook jobs, see Pass user credentials to your Kerberos, LDAP, or HTTP Basic Auth-authenticated Amazon EMR cluster.

Pass parameters to your EMR connection command

If you are using images with the SparkMagic PySpark and Spark kernels and want to parameterize your EMR connection command, provide your parameters in the Environment variables field instead of the Parameters field in the Create Job form (in the Additional Options dropdown menu). Make sure your EMR connection command in the Jupyter notebook passes these parameters as environment variables. For example, suppose you pass cluster-id as an environment variable when you create your job. Your EMR connection command should look like the following:

%%local import os
%sm_analytics emr connect —cluster-id {os.getenv('cluster_id')} --auth-type None

You need this workaround to meet requirements by SparkMagic and Papermill. For background context, the SparkMagic kernel expects that the %%local magic command accompany any local variables you define. However, Papermill does not pass the %%local magic command with your overrides. In order to work around this Papermill limitation, you must supply your parameters as environment variables in the Environment variables field.

Pass user credentials to your Kerberos, LDAP, or HTTP Basic Auth-authenticated Amazon EMR cluster

To establish a secure connection to an Amazon EMR cluster that uses Kerberos, LDAP, or HTTP Basic Auth authentication, you use the Amazon Secrets Manager to pass user credentials to your connection command. For information about how to create a Secrets Manager secret, see Create an Amazon Secrets Manager secret. Your secret must contain your username and password. You pass the secret with the --secrets argument, as shown in the following example:

%sm_analytics emr connect --cluster-id j_abcde12345 --auth Kerberos --secret aws_secret_id_123

Your administrator can set up a flexible access policy using an attribute-based-access-control (ABAC) method, which assigns access based on special tags. You can set up flexible access to create a single secret for all users in the account or a secret for each user. The following code samples demonstrate these scenarios:

Create a single secret for all users in the account

{ "Version" : "2012-10-17", "Statement" : [ { "Effect": "Allow", "Principal" : {"AWS" : "arn:aws:iam::AWS_ACCOUNT_ID:role/service-role/AmazonSageMaker-ExecutionRole-20190101T012345"}, "Action" : "secretsmanager:GetSecretValue", "Resource" : [ "arn:aws:secretsmanager:us-west-2:AWS_ACCOUNT_ID:secret:aes123-1a2b3c", "arn:aws:secretsmanager:us-west-2:AWS_ACCOUNT_ID:secret:aes456-4d5e6f", "arn:aws:secretsmanager:us-west-2:AWS_ACCOUNT_ID:secret:aes789-7g8h9i" ] } ] }

Create a different secret for each user

You can create a different secret for each user using the PrincipleTag tag, as shown in the following example:

{ "Version" : "2012-10-17", "Statement" : [ { "Effect": "Allow", "Principal" : {"AWS" : "arn:aws:iam::AWS_ACCOUNT_ID:role/service-role/AmazonSageMaker-ExecutionRole-20190101T012345"}, "Condition" : { "StringEquals" : { "aws:ResourceTag/user-identity": "${aws:PrincipalTag/user-identity}" } }, "Action" : "secretsmanager:GetSecretValue", "Resource" : [ "arn:aws:secretsmanager:us-west-2:AWS_ACCOUNT_ID:secret:aes123-1a2b3c", "arn:aws:secretsmanager:us-west-2:AWS_ACCOUNT_ID:secret:aes456-4d5e6f", "arn:aws:secretsmanager:us-west-2:AWS_ACCOUNT_ID:secret:aes789-7g8h9i" ] } ] }