EMR clusters on Amazon Outposts - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China.

EMR clusters on Amazon Outposts

Beginning with Amazon EMR version 5.28.0, you can create and run EMR clusters on Amazon Outposts. Amazon Outposts enables native Amazon services, infrastructure, and operating models in on-premises facilities. In Amazon Outposts environments, you can use the same Amazon APIs, tools, and infrastructure that you use in the Amazon Cloud. Amazon EMR on Amazon Outposts is ideal for low latency workloads that need to be run in close proximity to on-premises data and applications. For more information about Amazon Outposts, see Amazon Outposts User Guide.

Prerequisites

The following are the prerequisites for using Amazon EMR on Amazon Outposts:

  • You must have installed and configured Amazon Outposts in your on-premises data center.

  • You must have a reliable network connection between your Outpost environment and an Amazon Region.

  • You must have sufficient capacity for EMR supported instance types available in your Outpost.

Limitations

The following are the limitations of using Amazon EMR on Amazon Outposts:

  • On-Demand Instances are the only supported option for Amazon EC2 instances. Spot Instances are not available for Amazon EMR on Amazon Outposts.

  • If you need additional Amazon EBS storage volumes, only General Purpose SSD (GP2) is supported.

  • S3 buckets that store objects in an Amazon Web Services Region that you specify is the only supported S3 option for Amazon EMR on Outposts. S3 on Outposts is not supported for Amazon EMR on Amazon Outposts.

  • Only the following instance types are supported by Amazon EMR on Amazon Outposts:

    Instance class Instance types
    General purpose m5.xlarge | m5.2xlarge | m5.4xlarge | m5.12xlarge | m5.24xlarge | m5d.xlarge | m5d.2xlarge | m5d.4xlarge | m5d.12xlarge | m5d.24xlarge
    Compute-optimized

    c5.xlarge | c5.2xlarge | c5.4xlarge | c5.9xlarge | c5.18xlarge | c5d.xlarge | c5d.2xlarge | c5d.4xlarge| c5d.9xlarge | c5d.18xlarge

    Memory-optimized

    r5.xlarge | r5.2xlarge | r5.4xlarge | r5.12xlarge | r5d.xlarge | r5d.2xlarge | r5d.4xlarge | r5d.12xlarge | r5d.24xlarge

    Storage-optimized

    i3en.xlarge | i3en.2xlarge | i3en.3xlarge | i3en.6xlarge | i3en.12xlarge | i3en.24xlarge

Network connectivity considerations

  • If network connectivity between your Outpost and its Amazon Region is lost, your clusters will continue to run. However, you cannot create new clusters or take new actions on existing clusters until connectivity is restored. In case of instance failures, the instance will not be automatically replaced. Additionally, actions such as adding steps to a running cluster, checking step execution status, and sending CloudWatch metrics and events will be delayed.

  • We recommend that you provide reliable and highly available network connectivity between your Outpost and the Amazon Region. If network connectivity between your Outpost and its Amazon Region is lost for more than a few hours, clusters that have enabled terminate protection will continue to run, and clusters that have disabled terminate protection may be terminated.

  • If network connectivity will be impacted due to routine maintenance, we recommend proactively enabling terminate protection. More generally, connectivity interruption means that any external dependencies that are not local to the Outpost or customer network will not be accessible. This includes Amazon S3, DynamoDB used with EMRFS consistency view, and Amazon RDS if an in-region instance is used for an EMR cluster with multiple master nodes.

Creating an Amazon EMR cluster on Amazon Outposts

Creating an Amazon EMR cluster on Amazon Outposts is similar to creating an Amazon EMR cluster in the Amazon Cloud. When you create an Amazon EMR cluster on Amazon Outposts, you must specify an Amazon EC2 subnet associated with your Outpost.

An Amazon VPC can span all of the Availability Zones in an Amazon Region. Amazon Outposts are extensions of Availability Zones, and you can extend an Amazon VPC in an account to span multiple Availability Zones and associated Outpost locations. When you configure your Outpost, you associate a subnet with it to extend your Regional VPC environment to your on-premises facility. Outpost instances and related services appear as part of your Regional VPC, similar to an Availability Zone with associated subnets. For information, see Amazon Outposts User Guide.

Console

To create a new Amazon EMR cluster on Amazon Outposts with the Amazon Web Services Management Console, specify an Amazon EC2 subnet that is associated with your Outpost.

  1. Open the Amazon EMR console.

  2. Choose Create cluster.

  3. Choose Go to advanced options.

  4. Under Software Configuration, for Release, choose 5.28.0 or later.

  5. Under Hardware Configuration, for EC2 Subnet, select an EC2 subnet with an Outpost ID in this format: op-123456789.

  6. Choose instance type or add Amazon EBS storage volumes for uniform instance groups or instance fleets. Limited Amazon EBS volume and instance types are supported for Amazon EMR on Amazon Outposts.

Amazon CLI

To create a new Amazon EMR cluster on Amazon Outposts with the Amazon CLI, specify an EC2 subnet that is associated with your Outpost.

The following example creates an Amazon EMR cluster on an Outpost. Replace subnet-22XXXX01 with an EC2 subnet that is associated with your Outpost.

aws emr create-cluster \ --name "Outpost cluster" \ --release-label emr-5.36.0 \ --applications Name=Spark \ --ec2-attributes KeyName=myKey SubnetId=subnet-22XXXX01 \ --instance-type m5.xlarge --instance-count 3 --use-default-roles