Connect to the Amazon EMR cluster primary node using SSH
Secure Shell (SSH) is a network protocol you can use to create a secure connection to a remote computer. After you make a connection, the terminal on your local computer behaves as if it is running on the remote computer. Commands you issue locally run on the remote computer, and the command output from the remote computer appears in your terminal window.
When you use SSH with Amazon, you are connecting to an EC2 instance, which is a virtual server running in the cloud. When working with Amazon EMR, the most common use of SSH is to connect to the EC2 instance that is acting as the primary node of the cluster.
Using SSH to connect to the primary node gives you the ability to monitor and interact with the cluster. You can issue Linux commands on the primary node, run applications such as Hive and Pig interactively, browse directories, read log files, and so on. You can also create a tunnel in your SSH connection to view the web interfaces hosted on the primary node. For more information, see View web interfaces hosted on Amazon EMR clusters.
To connect to the primary node using SSH, you need the public DNS name of the primary node. In addition, the security group associated with the primary node must have an inbound rule that allows SSH (TCP port 22) traffic from a source that includes the client where the SSH connection originates. You may need to add a rule to allow an SSH connection from your client. For more information about modifying security group rules, see Control network traffic with security groups for your Amazon EMR cluster and Adding rules to a security group in the Amazon EC2 User Guide.
Retrieve the public DNS name of the primary node
You can retrieve the primary public DNS name using the Amazon EMR console and the Amazon CLI.
Connect to the primary node using SSH and an Amazon EC2 private key on Linux, Unix, and Mac OS X
To create an SSH connection authenticated with a private key file, you need to specify the Amazon EC2 key pair private key when you launch a cluster. For more information about accessing your key pair, see Amazon EC2 key pairs in the Amazon EC2 User Guide.
Your Linux computer most likely includes an SSH client by default. For example, OpenSSH is
installed on most Linux, Unix, and macOS operating systems. You can check for an
SSH client by typing ssh at the command line. If your computer
does not recognize the command, install an SSH client to connect to the
primary node. The OpenSSH project provides a free implementation of the full suite of
SSH tools. For more information, see the OpenSSH
The following instructions demonstrate opening an SSH connection to the Amazon EMR primary node on Linux, Unix, and Mac OS X.
To configure the key pair private key file permissions
Before you can use your Amazon EC2 key pair private key to create an SSH
connection, you must set permissions on the .pem
file so
that only the key owner has permission to access the file. This is required for
creating an SSH connection using terminal or the Amazon CLI.
-
Ensure you've allowed inbound SSH traffic. For instructions, see Before you connect to Amazon EMR: Authorize inbound traffic.
-
Locate your
.pem
file. These instructions assume that the file is namedmykeypair.pem
and that it is stored in the current user's home directory. -
Type the following command to set the permissions. Replace
~/mykeypair.pem
with the full path and file name of your key pair private key file. For exampleC:/Users/<username>/.ssh/mykeypair.pem
.chmod 400
~/mykeypair.pem
If you do not set permissions on the
.pem
file, you will receive an error indicating that your key file is unprotected and the key will be rejected. To connect, you only need to set permissions on the key pair private key file the first time you use it.
To connect to the primary node using the terminal
-
Open a terminal window. On Mac OS X, choose Applications > Utilities > Terminal. On other Linux distributions, terminal is typically found at Applications > Accessories > Terminal.
-
To establish a connection to the primary node, type the following command. Replace
ec2-###-##-##-###.compute-1.amazonaws.com.cn
with the primary public DNS name of your cluster and replace~/mykeypair.pem
with the full path and file name of your.pem
file. For exampleC:/Users/<username>/.ssh/mykeypair.pem
.ssh hadoop@
ec2-###-##-##-###.compute-1.amazonaws.com.cn
-i~/mykeypair.pem
Important
You must use the login name
hadoop
when you connect to the Amazon EMR primary node; otherwise, you may see an error similar toServer refused our key
. -
A warning states that the authenticity of the host you are connecting to cannot be verified. Type
yes
to continue. -
When you are done working on the primary node, type the following command to close the SSH connection.
exit
If you're experiencing difficulty with using SSH to connect to your primary node, see Troubleshoot connecting to your instance.
Connect to the primary node using SSH on Windows
Windows users can use an SSH client such as PuTTY to connect to the primary
node. Before connecting to the Amazon EMR primary node, you should download and install
PuTTY and PuTTYgen. You can download these tools from the PuTTY download
page
PuTTY does not natively support the key pair private key file format
(.pem
) generated by Amazon EC2. You use PuTTYgen to convert
your key file to the required PuTTY format (.ppk
). You must
convert your key into this format (.ppk
) before attempting to
connect to the primary node using PuTTY.
For more information about converting your key, see Converting your private key using PuTTYgen in the Amazon EC2 User Guide.
To connect to the primary node using PuTTY
-
Ensure you've allowed inbound SSH traffic. For instructions, see Before you connect to Amazon EMR: Authorize inbound traffic.
-
Open
putty.exe
. You can also launch PuTTY from the Windows programs list. -
If necessary, in the Category list, choose Session.
-
For Host Name (or IP address), type
hadoop@
MasterPublicDNS
. For example:hadoop@
ec2-###-##-##-###.compute-1.amazonaws.com.cn
. -
In the Category list, choose Connection > SSH, Auth.
-
For Private key file for authentication, choose Browse and select the
.ppk
file that you generated. -
Choose Open and then Yes to dismiss the PuTTY security alert.
Important
When logging into the primary node, type
hadoop
if you are prompted for a user name . -
When you are done working on the primary node, you can close the SSH connection by closing PuTTY.
Note
To prevent the SSH connection from timing out, you can choose Connection in the Category list and select the option Enable TCP_keepalives. If you have an active SSH session in PuTTY, you can change your settings by opening the context (right-click) for the PuTTY title bar and choosing Change Settings.
If you're experiencing difficulty with using SSH to connect to your primary node, see Troubleshoot connecting to your instance.
Connect to the primary node using the Amazon CLI
You can create an SSH connection with the primary node using the Amazon CLI on Windows
and on Linux, Unix, and Mac OS X. Regardless of the platform, you need the public
DNS name of the primary node and your Amazon EC2 key pair private key. If you are using
the Amazon CLI on Linux, Unix, or Mac OS X, you must also set permissions on the private
key (.pem
or .ppk
) file as shown in To configure the key pair
private key file permissions.
To connect to the primary node using the Amazon CLI
-
Ensure you've allowed inbound SSH traffic. For instructions, see Before you connect to Amazon EMR: Authorize inbound traffic.
-
To retrieve the cluster identifier, type:
aws emr list-clusters
The output lists your clusters including the cluster IDs. Note the cluster ID for the cluster to which you are connecting.
"Status": { "Timeline": { "ReadyDateTime": 1408040782.374, "CreationDateTime": 1408040501.213 }, "State": "WAITING", "StateChangeReason": { "Message": "Waiting after step completed" } }, "NormalizedInstanceHours": 4, "Id": "j-2AL4XXXXXX5T9", "Name": "AWS CLI cluster"
-
Type the following command to open an SSH connection to the primary node. In the following example, replace
j-2AL4XXXXXX5T9
with the cluster ID and replace~/mykeypair.key
with the full path and file name of your.pem
file (for Linux, Unix, and Mac OS X) or.ppk
file (for Windows). For exampleC:\Users\<username>\.ssh\mykeypair.pem
.aws emr ssh --cluster-id
j-2AL4XXXXXX5T9
--key-pair-file~/mykeypair.key
-
When you are done working on the primary node, close the Amazon CLI window.
For more information, see Amazon EMR commands in the Amazon CLI. If you're experiencing difficulty with using SSH to connect to your primary node, see Troubleshoot connecting to your instance.