Container, queue, and database metrics for Amazon MWAA
In addition to Apache Airflow metrics, you can monitor the underlying components of your Amazon Managed Workflows for Apache Airflow environments using CloudWatch, which collects raw data and processes data into readable, near real-time metrics. With these environment metrics, you will have greater visibility into key performance indicators to help you appropriately size your environments and debug issues with your workflows. These metrics apply to all supported Apache Airflow versions on Amazon MWAA.
Amazon MWAA will provide CPU and memory utilization for each Amazon Elastic Container Service (Amazon ECS) container and Amazon Aurora PostgreSQL instance, and Amazon Simple Queue Service (Amazon SQS) metrics for the number of messages and the age of the oldest message, Amazon Relational Database Service (Amazon RDS) metrics for database connections, disk queue depth, write operations, latency, and throughput, and Amazon RDS Proxy metrics. These metrics also include the number of base workers, additional workers, schedulers, and web servers.
These statistics are kept for 15 months, so that you can access historical information and gain a better perspective on why a schedule is failing, and troubleshoot underlying issues. You can also set alarms that watch for certain thresholds, and send notifications or take actions when those thresholds are met. For more information, see the Amazon CloudWatch User Guide.
Terms
- Namespace
-
A namespace is a container for the CloudWatch metrics of an Amazon service. For Amazon MWAA, the namespace is
AWS/MWAA
. - CloudWatch metrics
-
A CloudWatch metric represents a time-ordered set of data points that are specific to CloudWatch.
- Dimension
-
A dimension is a name/value pair that is part of the identity of a metric.
- Unit
-
A statistic has a unit of measure. For Amazon MWAA, units include Count.
Dimensions
This section describes the CloudWatch dimensions grouping for Amazon MWAA metrics in CloudWatch.
Dimension | Description |
---|---|
Cluster |
Metrics for the minimum three Amazon ECS container that an Amazon MWAA environemnt uses to run Apache Airflow components: scheduler, worker, and web server. |
Queue |
Metrics for the Amazon SQS queues that decouple the scheduler from workers. When workers read the messages, they are considered in-flight and not available for other workers. Messages become available for other workers to read if they are not deleted before the 12 hours visibility timeout. |
Database |
Metrics the Aurora clusters used by Amazon MWAA. This includes metrics for the primary database instance and a read replica to support the read operations. Amazon MWAA publishes database metrics for both READER and WRITER instances. |
Accessing metrics in the CloudWatch console
This section describes how to access your Amazon MWAA metrics in CloudWatch.
To view performance metrics for a dimension
-
Open the Metrics page
on the CloudWatch console. -
Use the Amazon Region selector to select your region.
-
Choose the AWS/MWAA namespace.
-
In the All metrics tab, choose a dimension. For example, Cluster.
-
Choose a CloudWatch metric for a dimension. For example, NumSchedulers or CPUUtilization. Then, choose Graph all search results.
-
Choose the Graphed metrics tab to view performance metrics.
List of metrics
The following tables list the cluster, queue, and database service metrics for Amazon MWAA. To view descriptions for metrics directly emitted from Amazon ECS, Amazon SQS, or Amazon RDS, choose the respective documentation link.
Topics
Cluster metrics
The following metrics apply to each scheduler, base worker, additional worker, and web server. For more information and descriptions of each cluster metric, see Available metrics and dimensions in the Amazon ECS Developer Guide.
Namespace | Metric | Unit |
---|---|---|
|
|
Percent |
|
|
Percent |
Evaluating the number additional worker instances
You can use the component metrics provided under the Cluster dimension, as described in the following procedure, to evaluate the additional workers that an environment is utilizing at a given point in time.
You do this by graphing either the CPUUtilization or the MemoryUtilization metric and setting the statistic type to Sample Count. The resulting value is the
total number of RUNNING
tasks for the AdditionalWorker
component. Understanding the number of additional worker instances utilized by your environment can help you gauge how your
environment auto scales and allow you to optimize the number of additional workers.
-
Choose the AWS/MWAA namespace.
-
In the All metrics tab, choose the Cluster dimension.
-
Under the Cluster dimension, for the AdditionalWorker, choose either the CPUUtilization or the MemoryUtilization metric.
-
On the Graphed metrics tab, set Period to 1 Minute and Statistic to Sample Count.
For more information, see Service RUNNING
task count in the
Amazon Elastic Container Service Developer Guide.
Database metrics
The following metrics apply to each database instance until it is replaced by an Amazon RDS proxy. For more information and descriptions of the following database metrics, see CloudWatch metrics for Amazon RDS in the Amazon Relational Database Service User Guide.
Namespace | Metric | Unit |
---|---|---|
|
|
Percent |
|
|
Count |
|
|
Count |
|
|
Bytes |
|
|
Count per five minutes |
|
|
Count per second |
|
|
Seconds |
|
|
Bytes per second |
Database metrics for Amazon RDS Proxy (when available)
For more information descriptions of the following database proxy metrics, see Monitoring Amazon RDS Proxy metrics with CloudWatch in the Amazon Relational Database Service User Guide.
Namespace | Metric | Unit |
---|---|---|
|
|
Count |
|
|
Count |
|
|
Count |
|
|
Percentage |
|
|
Count |
|
|
Count |
|
|
Count |
|
|
Count |
|
|
Count |
|
|
Microseconds |
|
|
Count |
|
|
Microseconds |
Queue metrics
For more information on units and descriptions for the following queue metrics, see Available CloudWatch metrics for Amazon SQS in the Amazon Simple Queue Service Developer Guide.
Namespace | Metric | Unit |
---|---|---|
|
|
Seconds |
|
|
Count |
|
|
Count |