Use CloudWatch metrics to monitor Elastic Graphics - Amazon Elastic Compute Cloud
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Use CloudWatch metrics to monitor Elastic Graphics

Important

Amazon Elastic Graphics reached end of life on January 8, 2024. For workloads that require graphics acceleration, we recommend that you use Amazon EC2 G4ad, G4dn, or G5 instances.

You can monitor your Elastic Graphics accelerator using Amazon CloudWatch, which collects metrics about your accelerator performance. These statistics are recorded for a period of two weeks, so that you can access historical information and gain a better perspective on how your service is performing.

By default, Elastic Graphics accelerators send metric data to CloudWatch in 5-minute periods.

For more information about Amazon CloudWatch, see the Amazon CloudWatch User Guide.

Elastic Graphics metrics

The AWS/ElasticGPUs namespace includes the following metrics for Elastic Graphics.

Metric Description

GPUConnectivityCheckFailed

Reports whether connectivity to the Elastic Graphics accelerator is active or has failed. A value of zero (0) indicates that the connection is active. A value of one (1) indicates a connectivity failure.

Units: Count

GPUHealthCheckFailed

Reports whether the Elastic Graphics accelerator has passed a status health check in the last minute. A value of zero (0) indicates that the status check passed. A value of one (1) indicates a status check failure.

Units: Count

GPUMemoryUtilization

The GPU memory used.

Units: MiB

Elastic Graphics dimensions

You can filter the metrics data for your Elastic Graphics accelerators using the following dimensions.

Dimension Description
EGPUId Filters the data by the Elastic Graphics accelerator.
InstanceId Filters the data by the instance to which the Elastic Graphics accelerator is attached.

View CloudWatch metrics for Elastic Graphics

Metrics are grouped first by the service namespace, and then by the supported dimensions. You can use the following procedures to view the metrics for your Elastic Graphics accelerators.

To view Elastic Graphics metrics using the CloudWatch console
  1. Open the CloudWatch console at https://console.amazonaws.cn/cloudwatch/.

  2. If necessary, change the Region. From the navigation bar, select the Region where your Elastic Graphics accelerator resides. For more information, see Regions and Endpoints.

  3. In the navigation pane, choose Metrics.

  4. For All metrics, select Elastic Graphics, Elastic Graphics Metrics.

To view Elastic Graphics metrics (Amazon CLI)

Use the following list-metrics command:

aws cloudwatch list-metrics --namespace "AWS/ElasticGPUs"

Create CloudWatch alarms to monitor Elastic Graphics

You can create a CloudWatch alarm that sends an Amazon SNS message when the alarm changes state. An alarm watches a single metric over a time period you specify, and sends a notification to an Amazon SNS topic based on the value of the metric relative to a given threshold over a number of time periods.

For example, you can create an alarm that monitors the health of an Elastic Graphics accelerator and sends a notification when the graphics accelerator fails a status health check for three consecutive 5-minute periods.

To create an alarm for an Elastic Graphics accelerator health status
  1. Open the CloudWatch console at https://console.amazonaws.cn/cloudwatch/.

  2. In the navigation pane, choose Alarms, Create Alarm.

  3. Choose Select metric, Elastic Graphics, Elastic Graphics Metrics.

  4. Select the GPUHealthCheckFailed metric and choose Select metric.

  5. Configure the alarm as follows:

    1. For Alarm details, type a name and description for your alarm. For Whenever, choose >= and type 1.

    2. For Actions, select an existing notification list or choose New list.

    3. Choose Create Alarm.