Configuring VPC access - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China.

Configuring VPC access

You can configure EMR Serverless applications to connect to your data stores within your VPC, such as Amazon Redshift clusters, Amazon RDS databases or Amazon S3 buckets with VPC endpoints.

Note

You must configure VPC access if you want to use an external Hive metastore database for your application. For information about how to configure an external Hive metastore, see Metastore configuration.

Create application

On the Create application page, you can choose custom settings and specify the VPC, subnets and security groups that EMR Serverless applications can use.

VPCs

Choose the name of the virtual private cloud (VPC) that contains your data stores. The Create application page lists all VPCs for your chosen Amazon Web Services Region.

Subnets

Choose the subnets within the VPC that contains your data store. The Create application page lists all subnets for the data stores in your VPC.

The subnets selected must be private subnets. This means that the associated route tables for the subnets should not have internet gateways.

For outbound connectivity to the internet, the subnets must have outbound routes using a NAT Gateway. To configure a NAT Gateway, see Work with NAT gateways.

For Amazon S3 connectivity, the subnets must have a NAT Gateway or a VPC endpoint configured. To configure an S3 VPC endpoint, see Create a gateway endpoint.

For connectivity to other Amazon Web Services outside the VPC, such as Amazon DynamoDB, you must configure either VPC endpoints or a NAT gateway. To configure VPC endpoints for Amazon Web Services, see Work with VPC endpoints.

Note

We recommend that you select multiple subnets across multiple Availability Zones. This is because the subnets that you choose determine the Availability Zones that are available for an EMR Serverless application to launch. Each worker will consume an IP address on the subnet where it is launched. Please ensure that the specified subnets have sufficient IP addresses for the number of workers you plan to launch.

Security groups

Choose one or more security groups that can communicate with your data stores. The Create application page lists all security groups in your VPC. EMR Serverless associates these security groups with elastic network interfaces that are attached to your VPC subnets.

Note

We recommend that you create a separate security group for EMR Serverless applications. This makes isolating and managing network rules more efficient. For example, to communicate with Amazon Redshift clusters, you can define the traffic rules between the Redshift and EMR Serverless security groups, as demonstrated in the example below.

Example — Communication with Amazon Redshift clusters

  1. Add a rule for inbound traffic to the Amazon Redshift security group from one of the EMR Serverless security groups.

    Type Protocol Port range Source

    All TCP

    TCP

    5439

    emr-serverless-security-group

  2. Add a rule for outbound traffic from one of the EMR Serverless security groups. You can do this in one of two ways. First, you can open outbound traffic to all ports.

    Type Protocol Port range Destination

    All traffic

    TCP

    ALL

    0.0.0.0/0

    Alternatively, you can restrict outbound traffic to Amazon Redshift clusters. This is useful only when the application must communicate with Amazon Redshift clusters and nothing else.

    Type Protocol Port range Source

    All TCP

    TCP

    5439

    redshift-security-group

Configure application

You can change the network configuration for an existing EMR Serverless application from the Configure application page.

View job run details

On the Job run detail page, you can view the subnet used by your job for a specific run. Note that a job runs only in one subnet selected from the specified subnets.