Monitoring Amazon Glue using Amazon CloudWatch metrics
You can profile and monitor Amazon Glue operations using Amazon Glue job profiler. It collects and processes raw data from Amazon Glue jobs into readable, near real-time metrics stored in Amazon CloudWatch. These statistics are retained and aggregated in CloudWatch so that you can access historical information for a better perspective on how your application is performing.
Note
You may incur additional charges when you enable job metrics and CloudWatch custom metrics are created.
For more information, see
Amazon CloudWatch pricing
Amazon Glue metrics overview
When you interact with Amazon Glue, it sends metrics to CloudWatch. You can view these metrics using the Amazon Glue console (the preferred method), the CloudWatch console dashboard, or the Amazon Command Line Interface (Amazon CLI).
To view metrics using the Amazon Glue console dashboard
You can view summary or detailed graphs of metrics for a job, or detailed graphs for a job run.
Sign in to the Amazon Web Services Management Console and open the Amazon Glue console at https://console.amazonaws.cn/glue/
. -
In the navigation pane, choose Job run monitoring.
-
In Job runs choose Actions to stop a job that is currently running, view a job, or rewind job bookmark.
-
Select a job, then choose View run details to view additional information about the job run.
To view metrics using the CloudWatch console dashboard
Metrics are grouped first by the service namespace, and then by the various dimension combinations within each namespace.
-
Open the CloudWatch console at https://console.amazonaws.cn/cloudwatch/
. -
In the navigation pane, choose Metrics.
-
Choose the Glue namespace.
To view metrics using the Amazon CLI
-
At a command prompt, use the following command.
aws cloudwatch list-metrics --namespace Glue
Amazon Glue reports metrics to CloudWatch every 30 seconds, and the CloudWatch metrics dashboards are configured to display them every minute. The Amazon Glue metrics represent delta values from the previously reported values. Where appropriate, metrics dashboards aggregate (sum) the 30-second values to obtain a value for the entire last minute.
Amazon Glue metrics behavior for Spark jobs
Amazon Glue metrics are enabled at
initialization of a GlueContext
in a script and are generally updated only at the
end of an Apache Spark task. They represent the aggregate values across all completed Spark
tasks so far.
However, the Spark metrics that Amazon Glue passes on to CloudWatch are generally absolute values representing the current state at the time they are reported. Amazon Glue reports them to CloudWatch every 30 seconds, and the metrics dashboards generally show the average across the data points received in the last 1 minute.
Amazon Glue metrics names are all preceded by one of the following types of prefix:
glue.driver.
– Metrics whose names begin with this prefix either represent Amazon Glue metrics that are aggregated from all executors at the Spark driver, or Spark metrics corresponding to the Spark driver.glue.
executorId.
– The executorId is the number of a specific Spark executor. It corresponds with the executors listed in the logs.glue.ALL.
– Metrics whose names begin with this prefix aggregate values from all Spark executors.
Amazon Glue metrics
Amazon Glue profiles and sends the following metrics to CloudWatch every 30 seconds, and the Amazon Glue Metrics Dashboard report them once a minute:
Metric | Description |
---|---|
|
The number of bytes read from all data sources by all completed Spark tasks running in all executors. Valid dimensions: Valid Statistics: SUM. This metric is a delta value from the last reported value, so on the Amazon Glue Metrics Dashboard, a SUM statistic is used for aggregation. Unit: Bytes Can be used to monitor:
This metric can be used the same way as the |
|
The ETL elapsed time in milliseconds (does not include the job bootstrap times). Valid dimensions: Valid Statistics: SUM. This metric is a delta value from the last reported value, so on the Amazon Glue Metrics Dashboard, a SUM statistic is used for aggregation. Unit: Milliseconds Can be used to determine how long it takes a job run to run on average. Some ways to use the data:
|
|
The number of completed stages in the job. Valid dimensions: Valid Statistics: SUM. This metric is a delta value from the last reported value, so on the Amazon Glue Metrics Dashboard, a SUM statistic is used for aggregation. Unit: Count Can be used to monitor:
Some ways to use the data:
|
|
The number of completed tasks in the job. Valid dimensions: Valid Statistics: SUM. This metric is a delta value from the last reported value, so on the Amazon Glue Metrics Dashboard, a SUM statistic is used for aggregation. Unit: Count Can be used to monitor:
|
|
The number of failed tasks. Valid dimensions: Valid Statistics: SUM. This metric is a delta value from the last reported value, so on the Amazon Glue Metrics Dashboard, a SUM statistic is used for aggregation. Unit: Count Can be used to monitor:
The data can be used to set alarms for increased failures that might suggest abnormalities in data, cluster or scripts. |
|
The number of tasks killed. Valid dimensions: Valid Statistics: SUM. This metric is a delta value from the last reported value, so on the Amazon Glue Metrics Dashboard, a SUM statistic is used for aggregation. Unit: Count Can be used to monitor:
Some ways to use the data:
|
|
The number of records read from all data sources by all completed Spark tasks running in all executors. Valid dimensions: Valid Statistics: SUM. This metric is a delta value from the last reported value, so on the Amazon Glue Metrics Dashboard, a SUM statistic is used for aggregation. Unit: Count Can be used to monitor:
This metric can be used in a similar way to the |
|
The number of bytes written by all executors to shuffle data between them since the previous report (aggregated by the Amazon Glue Metrics Dashboard as the number of bytes written for this purpose during the previous minute). Valid dimensions: Valid Statistics: SUM. This metric is a delta value from the last reported value, so on the Amazon Glue Metrics Dashboard, a SUM statistic is used for aggregation. Unit: Bytes Can be used to monitor: Data shuffle in jobs (large joins, groupBy, repartition, coalesce). Some ways to use the data:
|
|
The number of bytes read by all executors to shuffle data between them since the previous report (aggregated by the Amazon Glue Metrics Dashboard as the number of bytes read for this purpose during the previous minute). Valid dimensions: Valid Statistics: SUM. This metric is a delta value from the last reported value, so on the Amazon Glue Metrics Dashboard, a SUM statistic is used for aggregation. Unit: Bytes Can be used to monitor: Data shuffle in jobs (large joins, groupBy, repartition, coalesce). Some ways to use the data:
|
|
The number of megabytes of disk space used across all executors. Valid dimensions: Valid Statistics: Average. This is a Spark metric, reported as an absolute value. Unit: Megabytes Can be used to monitor:
Some ways to use the data:
|
|
The number of actively running job executors. Valid dimensions: Valid Statistics: Average. This is a Spark metric, reported as an absolute value. Unit: Count Can be used to monitor:
Some ways to use the data:
|
|
The number of maximum (actively running and pending) job executors needed to satisfy the current load. Valid dimensions: Valid Statistics: Maximum. This is a Spark metric, reported as an absolute value. Unit: Count Can be used to monitor:
Some ways to use the data:
|
|
The fraction of memory used by the JVM heap for this driver (scale: 0-1) for driver, executor identified by executorId, or ALL executors. Valid dimensions: Valid Statistics: Average. This is a Spark metric, reported as an absolute value. Unit: Percentage Can be used to monitor:
Some ways to use the data:
|
|
The number of memory bytes used by the JVM heap for the driver, the executor identified by executorId, or ALL executors. Valid dimensions: Valid Statistics: Average. This is a Spark metric, reported as an absolute value. Unit: Bytes Can be used to monitor:
Some ways to use the data:
|
|
The number of bytes read from Amazon S3 by the driver, an executor identified by executorId, or ALL executors since the previous report (aggregated by the Amazon Glue Metrics Dashboard as the number of bytes read during the previous minute). Valid dimensions: Valid Statistics: SUM. This metric is a delta value from the last reported value, so on the Amazon Glue Metrics Dashboard a SUM statistic is used for aggregation. The area under the curve on the Amazon Glue Metrics Dashboard can be used to visually compare bytes read by two different job runs. Unit: Bytes. Can be used to monitor:
Resulting data can be used for:
|
|
The number of bytes written to Amazon S3 by the driver, an executor identified by executorId, or ALL executors since the previous report (aggregated by the Amazon Glue Metrics Dashboard as the number of bytes written during the previous minute). Valid dimensions: Valid Statistics: SUM. This metric is a delta value from the last reported value, so on the Amazon Glue Metrics Dashboard a SUM statistic is used for aggregation. The area under the curve on the Amazon Glue Metrics Dashboard can be used to visually compare bytes written by two different job runs. Unit: Bytes Can be used to monitor:
Some ways to use the data:
|
|
The number of records that are received in a micro-batch. This metric is only available for Amazon Glue streaming jobs with Amazon Glue version 2.0 and above. Valid dimensions: Valid Statistics: Sum, Maximum, Minimum, Average, Percentile Unit: Count Can be used to monitor:
|
|
The time it takes to process the batches in milliseconds. This metric is only available for Amazon Glue streaming jobs with Amazon Glue version 2.0 and above. Valid dimensions: Valid Statistics: Sum, Maximum, Minimum, Average, Percentile Unit: Count Can be used to monitor:
|
|
The fraction of CPU system load used (scale: 0-1) by the driver, an executor identified by executorId, or ALL executors. Valid dimensions: Valid Statistics: Average. This metric is reported as an absolute value. Unit: Percentage Can be used to monitor:
Some ways to use the data:
|
Dimensions for Amazon Glue Metrics
Amazon Glue metrics use the Amazon Glue namespace and provide metrics for the following dimensions:
Dimension | Description |
---|---|
|
This dimension filters for metrics of all job runs of a specific Amazon Glue job. |
|
This dimension filters for metrics of a specific Amazon Glue job run by a JobRun ID, or |
|
This dimension filters for metrics by either |
For more information, see the Amazon CloudWatch User Guide.