Monitoring EMR Serverless applications and jobs - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Monitoring EMR Serverless applications and jobs

With Amazon CloudWatch metrics for EMR Serverless, you can receive 1-minute CloudWatch metrics and access CloudWatch dashboards to view near-real-time operations and performance of your EMR Serverless applications.

EMR Serverless sends metrics to CloudWatch every minute. EMR Serverless emits these metrics at the application level as well as the job, worker-type, and capacity-allocation-type levels.

To get started, use the EMR Serverless CloudWatch dashboard template provided in the EMR Serverless GitHub repository and deploy it.

Note

EMR Serverless interactive workloads have only application-level monitoring enabled, and have a new worker type dimension, Spark_Kernel. To monitor and debug your interactive workloads, you can view the logs and Apache Spark UI from within your EMR Studio Workspace.

The table below describes the EMR Serverless dimensions available within the Amazon/EMRServerless namespace.

Dimensions for EMR Serverless metrics
Dimension Description
ApplicationId

Filters for all metrics of an EMR Serverless application.

JobId

Filters for all metrics of an EMR Serverless job run.

WorkerType

Filters for all metrics of a given worker type. For example, you can filter for SPARK_DRIVER and SPARK_EXECUTORS for Spark jobs.

CapacityAllocationType

Filters for all metrics of a given capacity allocation type. For example, you can filter for PreInitCapacity for pre-initialized capacity and OnDemandCapacity for everything else.

Application-level monitoring

You can monitor capacity usage at the EMR Serverless application level with Amazon CloudWatch metrics. You can also set up a single view to monitor application capacity usage in a CloudWatch dashboard.

EMR Serverless application metrics
Metric Description Primary dimension Secondary dimension
CPUAllocated

The total numbers of vCPUs allocated.

ApplicationId ApplicationId, WorkerType, CapacityAllocationType
IdleWorkerCount

The number of total workers idle.

ApplicationId ApplicationId, WorkerType, CapacityAllocationType
MaxCPUAllowed

The maximum CPU allowed for the application.

ApplicationId N/A
MaxMemoryAllowed

The maximum memory in GB allowed for the application.

ApplicationId N/A
MaxStorageAllowed

The maximum storage in GB allowed for the application.

ApplicationId N/A
MemoryAllocated

The total memory in GB allocated.

ApplicationId ApplicationId, WorkerType, CapacityAllocationType
PendingCreationWorkerCount

The number of total workers pending creation.

ApplicationId ApplicationId, WorkerType, CapacityAllocationType
RunningWorkerCount

The number of total workers in use by the application.

ApplicationId ApplicationId, WorkerType, CapacityAllocationType
StorageAllocated

The total disk storage in GB allocated.

ApplicationId ApplicationId, WorkerType, CapacityAllocationType
TotalWorkerCount

The number of total workers available.

ApplicationId ApplicationId, WorkerType, CapacityAllocationType

Job-level monitoring

Amazon EMR Serverless sends the following job-level metrics to Amazon CloudWatch every one minute. You can view the metric values for aggregate job runs by job run state. The unit for each of the metrics is count.

EMR Serverless job-level metrics
Metric Description Primary dimension
SubmittedJobs

The number of jobs in a Submitted state.

ApplicationId
PendingJobs

The number of jobs in a Pending state.

ApplicationId
ScheduledJobs

The number of jobs in a Scheduled state.

ApplicationId
RunningJobs

The number of jobs in a Running state.

ApplicationId
SuccessJobs

The number of jobs in a Success state.

ApplicationId
FailedJobs

The number of jobs in a Failed state.

ApplicationId
CancellingJobs

The number of jobs in a Cancelling state.

ApplicationId
CancelledJobs

The number of jobs in a Cancelled state.

ApplicationId

You can monitor engine-specific metrics for both running and completed EMR Serverless jobs with engine-specific application UIs. When you view the UI for a running job, you see the live application UI with real-time updates. When you view the UI for a completed job, you see the persistent app UI.

Running jobs

For your running EMR Serverless jobs, you can view a real-time interface that provides engine-specific metrics. You can use either the Apache Spark UI or the Hive Tez UI to monitor and debug your jobs. To access these UIs, use the EMR Studio console or request a secure URL endpoint with the Amazon Command Line Interface.

Completed jobs

For your completed EMR Serverless jobs, you can use the Spark History Server or the Persistent Hive Tez UI to view jobs details, stages, tasks, and metrics for Spark or Hive jobs runs. To access these UIs, use the EMR Studio console, or request a secure URL endpoint with the Amazon Command Line Interface.

Job worker-level monitoring

Amazon EMR Serverless sends the following job worker level metrics that are available in the AWS/EMRServerless namespace and Job Worker Metrics metric group to Amazon CloudWatch. EMR Serverless collects data points from individual workers during job runs at the job level, worker-type, and the capacity-allocation-type level. You can use ApplicationId as a dimension to monitor multiple jobs that belong to the same application.

EMR Serverless job worker-level metrics
Metric Description Unit Primary dimension Secondary dimension
WorkerCpuAllocated

The total numbers of vCPU cores allocated for workers in a job run.

None JobId ApplicationId, WorkerType, and CapacityAllocationType
WorkerCpuUsed

The total numbers of vCPU cores utilized by workers in a job run.

None JobId ApplicationId, WorkerType, and CapacityAllocationType
WorkerMemoryAllocated

The total memory in GB allocated for workers in a job run.

Gigabytes (GB) JobId ApplicationId, WorkerType, and CapacityAllocationType
WorkerMemoryUsed

The total memory in GB utilized by workers in a job run.

Gigabytes (GB) JobId ApplicationId, WorkerType, and CapacityAllocationType
WorkerEphemeralStorageAllocated

The number of bytes of ephemeral storage allocated for workers in a job run.

Gigabytes (GB) JobId ApplicationId, WorkerType, and CapacityAllocationType
WorkerEphemeralStorageUsed

The number of bytes of ephemeral storage used by workers in a job run.

Gigabytes (GB) JobId ApplicationId, WorkerType, and CapacityAllocationType
WorkerStorageReadBytes

The number of bytes read from storage by workers in a job run.

Bytes JobId ApplicationId, WorkerType, and CapacityAllocationType
WorkerStorageWriteBytes

The number of bytes written to storage from workers in a job run.

Bytes JobId ApplicationId, WorkerType, and CapacityAllocationType

The steps below describe how to view the various types of metrics.

Console
To access your application UI with the console
  1. Navigate to your EMR Serverless application on the EMR Studio with the instructions in Getting started from the console.

  2. To view engine-specific application UIs and logs for a running job:

    1. Choose a job with a RUNNING status.

    2. Select the job on the Application details page, or navigate to the Job details page for your job.

    3. Under the Display UI dropdown menu, choose either Spark UI or Hive Tez UI to navigate to the application UI for your job type.

    4. To view Spark engine logs, navigate to the Executors tab in the Spark UI, and choose the Logs link for the driver. To view Hive engine logs, choose the Logs link for the appropriate DAG in the Hive Tez UI.

  3. To view engine-specific application UIs and logs for a completed job:

    1. Choose a job with a SUCCESS status.

    2. Select the job on your application's Application details page or navigate to the job's Job details page.

    3. Under the Display UI dropdown menu, choose either Spark History Server or Persistent Hive Tez UI to navigate to the application UI for your job type.

    4. To view Spark engine logs, navigate to the Executors tab in the Spark UI, and choose the Logs link for the driver. To view Hive engine logs, choose the Logs link for the appropriate DAG in the Hive Tez UI.

Amazon CLI
To access your application UI with the Amazon CLI
  • To generate a URL that you can use to access your application UI for both running and completed jobs, call the GetDashboardForJobRun API.

    aws emr-serverless get-dashboard-for-job-run / --application-id <application-id> / --job-run-id <job-id>

    The URL that you generate is valid for one hour.