Configure cluster hardware and networking - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Configure cluster hardware and networking

An important consideration when you create an Amazon EMR cluster is how you configure Amazon EC2 instances and network options. This chapter covers the following options, and then ties them all together with best practices and guidelines.

  • Node types – Amazon EC2 instances in an EMR cluster are organized into node types. There are three: primary nodes, core nodes, and task nodes. Each node type performs a set of roles defined by the distributed applications that you install on the cluster. During a Hadoop MapReduce or Spark job, for example, components on core and task nodes process data, transfer output to Amazon S3 or HDFS, and provide status metadata back to the primary node. With a single-node cluster, all components run on the primary node. For more information, see Understand node types: primary, core, and task nodes.

  • EC2 instances – When you create a cluster, you make choices about the Amazon EC2 instances that each type of node will run on. The EC2 instance type determines the processing and storage profile of the node. The choice of Amazon EC2 instance for your nodes is important because it determines the performance profile of individual node types in your cluster. For more information, see Configure Amazon EC2 instances.

  • Networking – You can launch your Amazon EMR cluster into a VPC using a public subnet, private subnet, or a shared subnet. Your networking configuration determines how customers and services can connect to clusters to perform work, how clusters connect to data stores and other Amazon resources, and the options you have for controlling traffic on those connections. For more information, see Configure networking.

  • Instance grouping – The collection of EC2 instances that host each node type is called either an instance fleet or a uniform instance group. The instance grouping configuration is a choice you make when you create a cluster. This choice determines how you can add nodes to your cluster while it is running. The configuration applies to all node types. It can't be changed later. For more information, see Create a cluster with instance fleets or uniform instance groups.

    Note

    The instance fleets configuration is available only in Amazon EMR releases 4.8.0 and later, excluding 5.0.0 and 5.0.3.