Tutorial: Configure a cross-realm trust with an Active Directory domain - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Tutorial: Configure a cross-realm trust with an Active Directory domain

When you set up a cross-realm trust, you allow principals (usually users) from a different Kerberos realm to authenticate to application components on the EMR cluster. The cluster-dedicated key distribution center (KDC) establishes a trust relationship with another KDC using a cross-realm principal that exists in both KDCs. The principal name and the password match precisely.

A cross-realm trust requires that the KDCs can reach one another over the network and resolve each other's domain names. Steps for establishing a cross-realm trust relationship with a Microsoft AD domain controller running as an EC2 instance are provided below, along with an example network setup that provides the required connectivity and domain-name resolution. Any network setup that allows the required network traffic between KDCs is acceptable.

Optionally, after you establish a cross-realm trust with Active Directory using a KDC on one cluster, you can create another cluster using a different security configuration to reference the KDC on the first cluster as an external KDC. For an example security configuration and cluster set up, see External cluster KDC with Active Directory cross-realm trust.

For more information on Amazon EMR support for Kerberos and KDC, as well as links to MIT Kerberos Documentation, see Use Kerberos for authentication with Amazon EMR.

Important

Amazon EMR does not support cross-realm trusts with Amazon Directory Service for Microsoft Active Directory.

Step 1: Set up the VPC and subnet

Step 2: Launch and install the Active Directory domain controller

Step 3: Add accounts to the domain for the EMR Cluster

Step 4: Configure an incoming trust on the Active Directory domain controller

Step 5: Use a DHCP option set to specify the Active Directory domain controller as a VPC DNS server

Step 6: Launch a Kerberized EMR Cluster

Step 7: Create HDFS users and set permissions on the cluster for Active Directory accounts

Step 1: Set up the VPC and subnet

The following steps demonstrate creating a VPC and subnet so that the cluster-dedicated KDC can reach the Active Directory domain controller and resolve its domain name. In these steps, domain-name resolution is provided by referencing the Active Directory domain controller as the domain name server in the DHCP option set. For more information, see Step 5: Use a DHCP option set to specify the Active Directory domain controller as a VPC DNS server.

The KDC and the Active Directory domain controller must be able to resolve one other's domain names. This allows Amazon EMR to join computers to the domain and automatically configure corresponding Linux accounts and SSH parameters on cluster instances.

If Amazon EMR can't resolve the domain name, you can reference the trust using the Active Directory domain controller's IP address. However, you must manually add Linux accounts, add corresponding principals to the cluster-dedicated KDC, and configure SSH.

To set up the VPC and subnet
  1. Create an Amazon VPC with a single public subnet. For more information, see Step 1: Create the VPC in the Amazon VPC Getting Started Guide.

    Important

    When you use a Microsoft Active Directory domain controller, choose a CIDR block for the EMR cluster so that all IPv4 addresses are fewer than nine characters in length (for example, 10.0.0.0/16). This is because the DNS names of cluster computers are used when the computers join the Active Directory directory. Amazon assigns DNS hostnames based on IPv4 address in a way that longer IP addresses may result in DNS names longer than 15 characters. Active Directory has a 15-character limit for registering joined computer names, and truncates longer names, which can cause unpredictable errors.

  2. Remove the default DHCP option set assigned to the VPC. For more information, see Changing a VPC to use No DHCP options. Later on, you add a new one that specifies the Active Directory domain controller as the DNS server.

  3. Confirm that DNS support is enabled for the VPC, that is, that DNS Hostnames and DNS Resolution are both enabled. They are enabled by default. For more information, see Updating DNS support for your VPC.

  4. Confirm that your VPC has an internet gateway attached, which is the default. For more information, see Creating and attaching an internet gateway.

    Note

    An internet gateway is used in this example because you are establishing a new domain controller for the VPC. An internet gateway may not be required for your application. The only requirement is that the cluster-dedicated KDC can access the Active Directory domain controller.

  5. Create a custom route table, add a route that targets the Internet Gateway, and then attach it to your subnet. For more information, see Create a custom route table.

  6. When you launch the EC2 instance for the domain controller, it must have a static public IPv4 address for you to connect to it using RDP. The easiest way to do this is to configure your subnet to auto-assign public IPv4 addresses. This is not the default setting when a subnet is created. For more information, see Modifying the public IPv4 addressing attribute of your subnet. Optionally, you can assign the address when you launch the instance. For more information, see Assigning a public IPv4 address during instance launch.

  7. When you finish, make a note of your VPC and subnet IDs. You use them later when you launch the Active Directory domain controller and the cluster.

Step 2: Launch and install the Active Directory domain controller

  1. Launch an EC2 instance based on the Microsoft Windows Server 2016 Base AMI. We recommend an m4.xlarge or better instance type. For more information, see Launching an Amazon Web Services Marketplace instance in the Amazon EC2 User Guide.

  2. Make a note of the Group ID of the security group associated with the EC2 instance. You need it for Step 6: Launch a Kerberized EMR Cluster. We use sg-012xrlmdomain345. Alternatively, you can specify different security groups for the EMR cluster and this instance that allows traffic between them. For more information, see Amazon EC2 security groups for Linux instances in the Amazon EC2 User Guide.

  3. Connect to the EC2 instance using RDP. For more information, see Connecting to your Windows instance in the Amazon EC2 User Guide.

  4. Start Server Manager to install and configure the Active Directory domain Services role on the server. Promote the server to a domain controller and assign a domain name (the example we use here is ad.domain.com). Make a note of the domain name because you need it later when you create the EMR security configuration and cluster. If you are new to setting up Active Directory, you can follow the instructions in How to set up Active Directory (AD) in Windows Server 2016.

    The instance restarts when you finish.

Step 3: Add accounts to the domain for the EMR Cluster

RDP to the Active Directory domain controller to create accounts in Active Directory Users and Computers for each cluster user. For more information, see Create a User Account in Active Directory Users and Computers on the Microsoft Learn site. Make a note of each user's User logon name. You need these later when you configure the cluster.

In addition, create a account with sufficient privileges to join computers to the domain. You specify this account when you create a cluster. Amazon EMR uses it to join cluster instances to the domain. You specify this account and its password in Step 6: Launch a Kerberized EMR Cluster. To delegate computer join privileges to the account, we recommend that you create a group with join privileges and then assign the user to the group. For instructions, see Delegating directory join privileges in the Amazon Directory Service Administration Guide.

Step 4: Configure an incoming trust on the Active Directory domain controller

The example commands below create a trust in Active Directory, which is a one-way, incoming, non-transitive, realm trust with the cluster-dedicated KDC. The example we use for the cluster's realm is EC2.INTERNAL. Replace the KDC-FQDN with the Public DNS name listed for the Amazon EMR primary node hosting the KDC. The passwordt parameter specifies the cross-realm principal password, which you specify along with the cluster realm when you create a cluster. The realm name is derived from the default domain name in us-east-1 for the cluster. The Domain is the Active Directory domain in which you are creating the trust, which is lower case by convention. The example uses ad.domain.com

Open the Windows command prompt with administrator privileges and type the following commands to create the trust relationship on the Active Directory domain controller:

C:\Users\Administrator> ksetup /addkdc EC2.INTERNAL KDC-FQDN C:\Users\Administrator> netdom trust EC2.INTERNAL /Domain:ad.domain.com /add /realm /passwordt:MyVeryStrongPassword C:\Users\Administrator> ksetup /SetEncTypeAttr EC2.INTERNAL AES256-CTS-HMAC-SHA1-96

Step 5: Use a DHCP option set to specify the Active Directory domain controller as a VPC DNS server

Now that the Active Directory domain controller is configured, you must configure the VPC to use it as a domain name server for name resolution within your VPC. To do this, attach a DHCP options set. Specify the Domain name as the domain name of your cluster - for example, ec2.internal if your cluster is in us-east-1 or region.compute.internal for other regions. For Domain name servers, you must specify the IP address of the Active Directory domain controller (which must be reachable from the cluster) as the first entry, followed by AmazonProvidedDNS (for example, xx.xx.xx.xx,AmazonProvidedDNS). For more information, see Changing DHCP option sets.

Step 6: Launch a Kerberized EMR Cluster

  1. In Amazon EMR, create a security configuration that specifies the Active Directory domain controller you created in the previous steps. An example command is shown below. Replace the domain, ad.domain.com, with the name of the domain you specified in Step 2: Launch and install the Active Directory domain controller.

    aws emr create-security-configuration --name MyKerberosConfig \ --security-configuration '{ "AuthenticationConfiguration": { "KerberosConfiguration": { "Provider": "ClusterDedicatedKdc", "ClusterDedicatedKdcConfiguration": { "TicketLifetimeInHours": 24, "CrossRealmTrustConfiguration": { "Realm": "AD.DOMAIN.COM", "Domain": "ad.domain.com", "AdminServer": "ad.domain.com", "KdcServer": "ad.domain.com" } } } } }'
  2. Create the cluster with the following attributes:

    • Use the --security-configuration option to specify the security configuration that you created. We use MyKerberosConfig in the example.

    • Use the SubnetId property of the --ec2-attributes option to specify the subnet that you created in Step 1: Set up the VPC and subnet. We use step1-subnet in the example.

    • Use the AdditionalMasterSecurityGroups and AdditionalSlaveSecurityGroups of the --ec2-attributes option to specify that the security group associated with the AD domain controller from Step 2: Launch and install the Active Directory domain controller is associated with the cluster primary node as well as core and task nodes. We use sg-012xrlmdomain345 in the example.

    Use --kerberos-attributes to specify the following cluster-specific Kerberos attributes:

    The following example launches a Kerberized cluster.

    aws emr create-cluster --name "MyKerberosCluster" \ --release-label emr-5.10.0 \ --instance-type m5.xlarge \ --instance-count 3 \ --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2KeyPair,\ SubnetId=step1-subnet, AdditionalMasterSecurityGroups=sg-012xrlmdomain345, AdditionalSlaveSecurityGroups=sg-012xrlmdomain345\ --service-role EMR_DefaultRole \ --security-configuration MyKerberosConfig \ --applications Name=Hadoop Name=Hive Name=Oozie Name=Hue Name=HCatalog Name=Spark \ --kerberos-attributes Realm=EC2.INTERNAL,\ KdcAdminPassword=MyClusterKDCAdminPwd,\ ADDomainJoinUser=ADUserLogonName,ADDomainJoinPassword=ADUserPassword,\ CrossRealmTrustPrincipalPassword=MatchADTrustPwd

Step 7: Create HDFS users and set permissions on the cluster for Active Directory accounts

When setting up a trust relationship with Active Directory, Amazon EMR creates Linux users on the cluster for each Active Directory account. For example, the user logon name LiJuan in Active Directory has a Linux account of lijuan. Active Directory user names can contain upper-case letters, but Linux does not honor Active Directory casing.

To allow your users to log in to the cluster to run Hadoop jobs, you must add HDFS user directories for their Linux accounts, and grant each user ownership of their directory. To do this, we recommend that you run a script saved to Amazon S3 as a cluster step. Alternatively, you can run the commands in the script below from the command line on the primary node. Use the EC2 key pair that you specified when you created the cluster to connect to the primary node over SSH as the Hadoop user. For more information, see Use an EC2 key pair for SSH credentials for Amazon EMR.

Run the following command to add a step to the cluster that runs a script, AddHDFSUsers.sh.

aws emr add-steps --cluster-id <j-2AL4XXXXXX5T9> \ --steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\ Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/AddHDFSUsers.sh"]

The contents of the file AddHDFSUsers.sh is as follows.

#!/bin/bash # AddHDFSUsers.sh script # Initialize an array of user names from AD or Linux users and KDC principals created manually on the cluster ADUSERS=("lijuan" "marymajor" "richardroe" "myusername") # For each user listed, create an HDFS user directory # and change ownership to the user for username in ${ADUSERS[@]}; do hdfs dfs -mkdir /user/$username hdfs dfs -chown $username:$username /user/$username done

Active Directory groups mapped to Hadoop groups

Amazon EMR uses System Security Services Daemon (SSD) to map Active Directory groups to Hadoop groups. To confirm group mappings, after you log in to the primary node as described in Using SSH to connect to Kerberized clusters with Amazon EMR, you can use the hdfs groups command to confirm that Active Directory groups to which your Active Directory account belongs have been mapped to Hadoop groups for the corresponding Hadoop user on the cluster. You can also check other users' group mappings by specifying one or more user names with the command, for example hdfs groups lijuan. For more information, see groups in the Apache HDFS Commands Guide.