Connect to the master node using SSH
Secure Shell (SSH) is a network protocol you can use to create a secure connection to a remote computer. After you make a connection, the terminal on your local computer behaves as if it is running on the remote computer. Commands you issue locally run on the remote computer, and the command output from the remote computer appears in your terminal window.
When you use SSH with Amazon, you are connecting to an EC2 instance, which is a virtual server running in the cloud. When working with Amazon EMR, the most common use of SSH is to connect to the EC2 instance that is acting as the master node of the cluster.
Using SSH to connect to the master node gives you the ability to monitor and interact with the cluster. You can issue Linux commands on the master node, run applications such as Hive and Pig interactively, browse directories, read log files, and so on. You can also create a tunnel in your SSH connection to view the web interfaces hosted on the master node. For more information, see View web interfaces hosted on Amazon EMR clusters.
To connect to the master node using SSH, you need the public DNS name of the master node. In addition, the security group associated with the master node must have an inbound rule that allows SSH (TCP port 22) traffic from a source that includes the client where the SSH connection originates. You may need to add a rule to allow an SSH connection from your client. For more information about modifying security group rules, see Control network traffic with security groups and Adding rules to a security group in the Amazon EC2 User Guide for Linux Instances.
Retrieve the public DNS name of the master node
You can retrieve the master public DNS name using the Amazon EMR console and the Amazon CLI.
Connect to the master node using SSH and an Amazon EC2 private key on Linux, Unix, and Mac OS X
To create an SSH connection authenticated with a private key file, you need to specify the Amazon EC2 key pair private key when you launch a cluster. If you launch a cluster from the console, the Amazon EC2 key pair private key is specified in the Security and Access section on the Create Cluster page. For more information about accessing your key pair, see Amazon EC2 key pairs in the Amazon EC2 User Guide for Linux Instances.
Your Linux computer most likely includes an SSH client by default. For example, OpenSSH is
installed on most Linux, Unix, and macOS operating systems. You can check for an
SSH client by typing ssh at the command line. If your computer
does not recognize the command, install an SSH client to connect to the
master node. The OpenSSH project provides a free implementation of the full suite of
SSH tools. For more information, see the OpenSSH
The following instructions demonstrate opening an SSH connection to the Amazon EMR master node on Linux, Unix, and Mac OS X.
To configure the key pair private key file permissions
Before you can use your Amazon EC2 key pair private key to create an SSH
connection, you must set permissions on the .pem
file so
that only the key owner has permission to access the file. This is required for
creating an SSH connection using terminal or the Amazon CLI.
-
Ensure you've allowed inbound SSH traffic. For instructions, see Before you connect: Authorize inbound traffic.
-
Locate your
.pem
file. These instructions assume that the file is namedmykeypair.pem
and that it is stored in the current user's home directory. -
Type the following command to set the permissions. Replace
~/mykeypair.pem
with the full path and file name of your key pair private key file. For exampleC:\Users\<username>\.ssh\mykeypair.pem
.chmod 400
~/mykeypair.pem
If you do not set permissions on the
.pem
file, you will receive an error indicating that your key file is unprotected and the key will be rejected. To connect, you only need to set permissions on the key pair private key file the first time you use it.
To connect to the master node using the terminal
-
Open a terminal window. On Mac OS X, choose Applications > Utilities > Terminal. On other Linux distributions, terminal is typically found at Applications > Accessories > Terminal.
-
To establish a connection to the master node, type the following command. Replace
ec2-###-##-##-###.compute-1.amazonaws.com.cn
with the master public DNS name of your cluster and replace~/mykeypair.pem
with the full path and file name of your.pem
file. For exampleC:\Users\<username>\.ssh\mykeypair.pem
.ssh hadoop@
ec2-###-##-##-###.compute-1.amazonaws.com.cn
-i~/mykeypair.pem
Important You must use the login name
hadoop
when you connect to the Amazon EMR master node; otherwise, you may see an error similar toServer refused our key
. -
A warning states that the authenticity of the host you are connecting to cannot be verified. Type
yes
to continue. -
When you are done working on the master node, type the following command to close the SSH connection.
exit
If you're experiencing difficulty with using SSH to connect to your master node, see Troubleshoot connecting to your instance.
Connect to the master node using SSH on Windows
Windows users can use an SSH client such as PuTTY to connect to the master node.
Before connecting to the Amazon EMR master node, you should download and install PuTTY
and PuTTYgen. You can download these tools from the PuTTY download
page
PuTTY does not natively support the key pair private key file format
(.pem
) generated by Amazon EC2. You use PuTTYgen to convert
your key file to the required PuTTY format (.ppk
). You must
convert your key into this format (.ppk
) before attempting to
connect to the master node using PuTTY.
For more information about converting your key, see Converting your private key using PuTTYgen in the Amazon EC2 User Guide for Linux Instances.
To connect to the master node using PuTTY
-
Ensure you've allowed inbound SSH traffic. For instructions, see Before you connect: Authorize inbound traffic.
-
Open
putty.exe
. You can also launch PuTTY from the Windows programs list. -
If necessary, in the Category list, choose Session.
-
For Host Name (or IP address), type
hadoop@
MasterPublicDNS
. For example:hadoop@
ec2-###-##-##-###.compute-1.amazonaws.com.cn
. -
In the Category list, choose Connection > SSH, Auth.
-
For Private key file for authentication, choose Browse and select the
.ppk
file that you generated. -
Choose Open and then Yes to dismiss the PuTTY security alert.
Important When logging into the master node, type
hadoop
if you are prompted for a user name . -
When you are done working on the master node, you can close the SSH connection by closing PuTTY.
Note To prevent the SSH connection from timing out, you can choose Connection in the Category list and select the option Enable TCP_keepalives. If you have an active SSH session in PuTTY, you can change your settings by opening the context (right-click) for the PuTTY title bar and choosing Change Settings.
If you're experiencing difficulty with using SSH to connect to your master node, see Troubleshoot connecting to your instance.
Connect to the master node using the Amazon CLI
You can create an SSH connection with the master node using the Amazon CLI on Windows and on
Linux, Unix, and Mac OS X. Regardless of the platform, you need the public DNS name
of the master node and your Amazon EC2 key pair private key. If you are using the Amazon CLI
on Linux, Unix, or Mac OS X, you must also set permissions on the private key
(.pem
or .ppk
) file as shown in To configure the key pair
private key file permissions.
To connect to the master node using the Amazon CLI
-
Ensure you've allowed inbound SSH traffic. For instructions, see Before you connect: Authorize inbound traffic.
-
To retrieve the cluster identifier, type:
aws emr list-clusters
The output lists your clusters including the cluster IDs. Note the cluster ID for the cluster to which you are connecting.
"Status": { "Timeline": { "ReadyDateTime": 1408040782.374, "CreationDateTime": 1408040501.213 }, "State": "WAITING", "StateChangeReason": { "Message": "Waiting after step completed" } }, "NormalizedInstanceHours": 4, "Id": "j-2AL4XXXXXX5T9", "Name": "AWS CLI cluster"
-
Type the following command to open an SSH connection to the master node. In the following example, replace
j-2AL4XXXXXX5T9
with the cluster ID and replace~/mykeypair.key
with the full path and file name of your.pem
file (for Linux, Unix, and Mac OS X) or.ppk
file (for Windows). For exampleC:\Users\<username>\.ssh\mykeypair.pem
.aws emr ssh --cluster-id
j-2AL4XXXXXX5T9
--key-pair-file~/mykeypair.key
-
When you are done working on the master node, close the Amazon CLI window.
For more information, see Amazon EMR commands in the Amazon CLI. If you're experiencing difficulty with using SSH to connect to your master node, see Troubleshoot connecting to your instance.