Configure Zeppelin for Apache Ranger-enabled Amazon EMR clusters - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Configure Zeppelin for Apache Ranger-enabled Amazon EMR clusters

The topic covers how to configure Apache Zeppelin for an Apache Ranger-enabled Amazon EMR cluster so that you can use Zeppelin as a notebook for interactive data exploration. Zeppelin is included in Amazon EMR release versions 5.0.0 and later. Earlier release versions include Zeppelin as a sandbox application. For more information, see Amazon EMR 4.x release versions in the Amazon EMR Release Guide.

By default, Zeppelin is configured with a default login and password which is not secure in a multi-tenant environment.

To configure Zeppelin, complete the following steps.

  1. Modify the authentication mechanism.

    Modify the shiro.ini file to implement your preferred authentication mechanism. Zeppelin supports Active Directory, LDAP, PAM, and Knox SSO. See Apache Shiro authentication for Apache Zeppelin for more information.

  2. Configure Zeppelin to impersonate the end user

    When you allow Zeppelin to impersonate the end user, jobs submitted by Zeppelin can be run as that end user. Add the following configuration to core-site.xml:

    [ { "Classification": "core-site", "Properties": { "hadoop.proxyuser.zeppelin.hosts": "*", "hadoop.proxyuser.zeppelin.groups": "*" }, "Configurations": [ ] } ]

    Next, add the following configuration to hadoop-kms-site.xml located in /etc/hadoop/conf:

    [ { "Classification": "hadoop-kms-site", "Properties": { "hadoop.kms.proxyuser.zeppelin.hosts": "*", "hadoop.kms.proxyuser.zeppelin.groups": "*" }, "Configurations": [ ] } ]

    You can also add these configurations to your Amazon EMR cluster using the console by following the steps in Reconfigure an instance group in the console.

  3. Allow Zeppelin to sudo as the end user

    Create a file /etc/sudoers.d/90-zeppelin-user that contains the following:

    zeppelin ALL=(ALL) NOPASSWD:ALL
  4. Modify interpreters settings to run user jobs in their own processes.

    For all interpreters, configure them to instantiate the interpreters "Per User" in "isolated" processes.

    Amazon EMR and Apache Ranger architecture diagram.
  5. Modify zeppelin-env.sh

    Add the following to zeppelin-env.sh so that Zeppelin starts launch interpreters as the end user:

    ZEPPELIN_IMPERSONATE_USER=`echo ${ZEPPELIN_IMPERSONATE_USER} | cut -d @ -f1` export ZEPPELIN_IMPERSONATE_CMD='sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c'

    Add the following to zeppelin-env.sh to change the default notebook permissions to read-only to the creator only:

    export ZEPPELIN_NOTEBOOK_PUBLIC="false"

    Finally, add the following to zeppelin-env.sh to include the EMR RecordServer class path after the first CLASSPATH statement:

    export CLASSPATH="$CLASSPATH:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-connector-common.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-spark-connector.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-client.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-common.jar:/usr/share/aws/emr/record-server/lib/jars/secret-agent-interface.jar"
  6. Restart Zeppelin.

    Run the following command to restart Zeppelin:

    sudo systemctl restart zeppelin