Configure Zeppelin for Apache Ranger-enabled Amazon EMR clusters
The topic covers how to configure Apache
Zeppelin
By default, Zeppelin is configured with a default login and password which is not secure in a multi-tenant environment.
To configure Zeppelin, complete the following steps.
-
Modify the authentication mechanism.
Modify the
shiro.ini
file to implement your preferred authentication mechanism. Zeppelin supports Active Directory, LDAP, PAM, and Knox SSO. See Apache Shiro authentication for Apache Zeppelinfor more information. -
Configure Zeppelin to impersonate the end user
When you allow Zeppelin to impersonate the end user, jobs submitted by Zeppelin can be run as that end user. Add the following configuration to
core-site.xml
:[ { "Classification": "core-site", "Properties": { "hadoop.proxyuser.zeppelin.hosts": "*", "hadoop.proxyuser.zeppelin.groups": "*" }, "Configurations": [ ] } ]
Next, add the following configuration to
hadoop-kms-site.xml
located in/etc/hadoop/conf
:[ { "Classification": "hadoop-kms-site", "Properties": { "hadoop.kms.proxyuser.zeppelin.hosts": "*", "hadoop.kms.proxyuser.zeppelin.groups": "*" }, "Configurations": [ ] } ]
You can also add these configurations to your Amazon EMR cluster using the console by following the steps in Reconfigure an instance group in the console.
-
Allow Zeppelin to sudo as the end user
Create a file
/etc/sudoers.d/90-zeppelin-user
that contains the following:zeppelin ALL=(ALL) NOPASSWD:ALL
-
Modify interpreters settings to run user jobs in their own processes.
For all interpreters, configure them to instantiate the interpreters "Per User" in "isolated" processes.
-
Modify
zeppelin-env.sh
Add the following to
zeppelin-env.sh
so that Zeppelin starts launch interpreters as the end user:ZEPPELIN_IMPERSONATE_USER=`echo ${ZEPPELIN_IMPERSONATE_USER} | cut -d @ -f1` export ZEPPELIN_IMPERSONATE_CMD='sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c'
Add the following to
zeppelin-env.sh
to change the default notebook permissions to read-only to the creator only:export ZEPPELIN_NOTEBOOK_PUBLIC="false"
Finally, add the following to
zeppelin-env.sh
to include the EMR RecordServer class path after the firstCLASSPATH
statement:export CLASSPATH="$CLASSPATH:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-connector-common.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-spark-connector.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-client.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-common.jar:/usr/share/aws/emr/record-server/lib/jars/secret-agent-interface.jar"
-
Restart Zeppelin.
Run the following command to restart Zeppelin:
sudo systemctl restart zeppelin