What happens when you submit work to an Amazon EMR on EKS virtual cluster

Registering Amazon EMR with a Kubernetes namespace on Amazon EKS creates a virtual cluster. Amazon EMR can then run analytics workloads on that namespace. When you use Amazon EMR on EKS to submit Spark jobs to the virtual cluster, Amazon EMR on EKS requests the Kubernetes scheduler on Amazon EKS to schedule pods.

The following steps and diagram illustrate the Amazon EMR on EKS workflow:

Use an existing Amazon EKS cluster or create one by using the eksctl command line utility or Amazon EKS console.
Create a virtual cluster by registering Amazon EMR with a namespace on an EKS cluster.
Submit your job to the virtual cluster using the Amazon CLI or SDK.

For each job that you run, Amazon EMR on EKS creates a container with an Amazon Linux 2 base image, Apache Spark, and associated dependencies. Each job runs in a pod that downloads the container and starts to run it. The pod terminates after the job terminates. If the container’s image has been previously deployed to the node, then a cached image is used and the download is bypassed. Sidecar containers, such as log or metric forwarders, can be deployed to the pod. After the job terminates, you can still debug it using Spark application UI in the Amazon EMR console.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Understanding Amazon EMR on EKS concepts and terminology

Getting started with Amazon EMR on EKS