Monitoring Ray jobs with metrics
You can monitor Ray jobs using Amazon Glue Studio and Amazon CloudWatch. CloudWatch collects and processes raw metrics from Amazon Glue with Ray, which makes them available for analysis. These metrics are visualized in the Amazon Glue Studio console, so you can monitor your job as it runs.
For a general overview of how to monitor Amazon Glue, see Monitoring Amazon Glue using Amazon CloudWatch metrics. For a general overview of how to use CloudWatch metrics that are published by Amazon Glue, see Monitoring with Amazon CloudWatch.
Monitoring Ray jobs in the Amazon Glue console
On the details page for a job run, below the Run details section, you can view pre-built aggregated graphs that visualize your available job metrics. Amazon Glue Studio sends job metrics to CloudWatch for every job run. With these, you can build a profile of your cluster and tasks, as well as access detailed information about each node.
For more information about available metrics graphs, see Viewing Amazon CloudWatch metrics for a Ray job run.
Overview of Ray jobs metrics in CloudWatch
We publish Ray metrics when detailed monitoring is enabled in CloudWatch. Metrics are published to the
Glue/Ray
CloudWatch namespace.
-
Instance metrics
We publish metrics about the CPU, memory and disk utilization of instances assigned to a job. These metrics are identified by features such as
ExecutorId
,ExecutorType
andhost
. These metrics are a subset of the standard Linux CloudWatch agent metrics. You can find information about metric names and features in the CloudWatch documentation. For more information, see Metrics collected by the CloudWatch agent. -
Ray cluster metrics
We forward metrics from the Ray processes that run your script to this namespace, then provide those most critical for you. The metrics that are available might differ by Ray version. For more information about which Ray version your job is running, see Amazon Glue versions.
Ray collects metrics at the instance level. It also provides metrics for tasks and the cluster. For more information about Ray's underlying metric strategy, see Metrics
in the Ray documentation.
Note
We don't publish Ray metrics to the Glue/Job Metrics/
namespace, which is only used for
Amazon Glue ETL jobs.