Apache Hive plugin for Ranger integration with Amazon EMR
Apache Hive is a popular execution engine within the Hadoop ecosystem. Amazon EMR provides an Apache Ranger plugin to be able to provide fine-grained access controls for Hive. The plugin is compatible with open source Apache Ranger Admin server version 2.0 and later.
Supported features
The Apache Ranger plugin for Hive on EMR supports all the functionality of the
open source plugin, which includes database, table, column level access controls
and row filtering and data masking. For a table of Hive commands and associated
Ranger permissions, see Hive commands to Ranger permission mapping
Installation of service configuration
The Apache Hive plugin is compatible with the existing Hive service definition within Apache Hive Hadoop SQL.

If you do not have an instance of the service under Hadoop SQL, like shown above, you can create one. Click on the + next to Hadoop SQL.
-
Service Name (If displayed): Enter the service name. The suggested value is
amazonemrhive
. Make a note of this service name -- it's needed when creating an EMR security configuration. -
Display Name: Enter the name to be displayed for the service. The suggested value is
amazonemrhive
.

The Apache Hive Config Properties are used to establish a connection to your Apache Ranger Admin server with a HiveServer2 to implement auto complete when creating policies. The properties below are not required to be accurate if you do not have a persistent HiveServer2 process and can be filled with any information.
-
Username: Enter a user name for the JDBC connection to an instance of an HiveServer2 instance.
-
Password: Enter the password for the user name above.
-
jdbc.driver.ClassName: Enter the class name of JDBC class for Apache Hive connectivity. The default value can be used.
-
jdbc.url: Enter the JDBC connection string to use when connecting to HiveServer2.
-
Common Name for Certificate: The CN field within the certificate used to connect to the admin server from a client plugin. This value must match the CN field in your TLS certificate that was created for the plugin.

The Test Connection button tests whether the values above can be used to successfully connect to the HiveServer2 instance. Once the service is successfully created, the Service Manager should look like below:

Considerations
Hive metadata server
The Hive metadata server can only be accessed by trusted engines, specifically
Hive and emr_record_server
, to protect against unauthorized access.
The Hive metadata server is also accessed by all nodes on the cluster. The
required port 9083 provides all nodes access to the main node.
Authentication
By default, Apache Hive is configured to authenticate using Kerberos as
configured in the EMR Security configuration. HiveServer2 can be configured to
authenticate users using LDAP as well. See Implementing LDAP authentication for Hive on a multi-tenant Amazon EMR
cluster
Limitations
The following are current limitations for the Apache Hive plugin on Amazon EMR 5.x:
-
Hive roles are not currently supported. Grant, Revoke statements are not supported.
-
Hive CLI is not supported. JDBC/Beeline is the only authorized way to connect Hive.
-
hive.server2.builtin.udf.blacklist
configuration should be populated with UDFs that you deem unsafe.