Apache Airflow v2 environment metrics in CloudWatch
Apache Airflow v2 is already set-up to collect and send StatsD
Contents
Terms
- Namespace
-
A namespace is a container for the CloudWatch metrics of an Amazon service. For Amazon MWAA, the namespace is AmazonMWAA.
- CloudWatch metrics
-
A CloudWatch metric represents a time-ordered set of data points that are specific to CloudWatch.
- Apache Airflow metrics
-
The Metrics
specific to Apache Airflow. - Dimension
-
A dimension is a name/value pair that is part of the identity of a metric.
- Unit
-
A statistic has a unit of measure. For Amazon MWAA, units include Count, Seconds, and Milliseconds. For Amazon MWAA, units are set based on the units in the original Airflow metrics.
Dimensions
This section describes the CloudWatch Dimensions grouping for Apache Airflow metrics in CloudWatch.
Dimension | Description |
---|---|
DAG |
Indicates a specific Apache Airflow DAG name. |
DAG Filename |
Indicates a specific Apache Airflow DAG file name. |
Function |
This dimension is used to improve the grouping of metrics in CloudWatch. |
Job |
Indicates an Apache Airflow Job run by the Scheduler. Always has a value of Job. |
Operator |
Indicates a specific Apache Airflow operator. |
Pool |
Indicates a specific Apache Airflow worker pool. |
Task |
Indicates a specific Apache Airflow task. |
HostName |
Indicates the hostname for a specific running Apache Airflow process. |
Accessing metrics in the CloudWatch console
This section describes how to access performance metrics in CloudWatch for a specific DAG.
To view performance metrics for a dimension
-
Open the Metrics page
on the CloudWatch console. -
Use the Amazon Region selector to select your region.
-
Choose the AmazonMWAA namespace.
-
In the All metrics tab, select a dimension. For example, DAG, Environment.
-
Choose a CloudWatch metric for a dimension. For example, TaskInstanceSuccesses or TaskInstanceDuration. Choose Graph all search results.
-
Choose the Graphed metrics tab to view performance statistics for Apache Airflow metrics, such as DAG, Environment, Task.
Apache Airflow metrics available in CloudWatch
This section describes the Apache Airflow metrics and dimensions sent to CloudWatch.
Apache Airflow Counters
The Apache Airflow metrics in this section contain data about Apache Airflow Counters
CloudWatch metric | Apache Airflow metric | Unit | Dimension |
---|---|---|---|
SLAMissed NoteAvailable for Apache Airflow v2.4.3 and above. |
sla_missed |
Count |
Function, Scheduler |
FailedSLACallback NoteAvailable for Apache Airflow v2.4.3 and above. |
sla_callback_notification_failure |
Count |
Function, Scheduler |
Updates NoteAvailable for Apache Airflow v2.6.3 and above. |
dataset.updates |
Count |
Function, Scheduler |
Orphaned NoteAvailable for Apache Airflow v2.6.3 and above. |
dataset.orphaned |
Count |
Function, Scheduler |
FailedCeleryTaskExecution NoteAvailable for Apache Airflow v2.4.3 and above. |
celery.execute_command.failure |
Count |
Function, Celery |
FilePathQueueUpdateCount NoteAvailable for Apache Airflow v2.6.3 and above. |
dag_processing.file_path_queue_update_count |
Count |
Function, Scheduler |
CriticalSectionBusy |
scheduler.critical_section_busy |
Count |
Function, Scheduler |
DagBagSize |
dagbag_size |
Count |
Function, DAG Processing |
DagCallbackExceptions |
dag.callback_exceptions |
Count |
DAG, All |
FailedSLAEmailAttempts |
sla_email_notification_failure |
Count |
Function, Scheduler |
TaskInstanceFinished |
ti.finish.{dag_id}.{task_id}.{state} |
Count |
DAG, {dag_id} Task, {task_id} State, {state} |
JobEnd |
{job_name}_end |
Count |
Job, {job_name} |
JobHeartbeatFailure |
{job_name}_heartbeat_failure |
Count |
Job, {job_name} |
JobStart |
{job_name}_start |
Count |
Job, {job_name} |
ManagerStalls |
dag_processing.manager_stalls |
Count |
Function, DAG Processing |
OperatorFailures |
operator_failures_{operator_name} |
Count |
Operator, {operator_name} |
OperatorSuccesses |
operator_successes_{operator_name} |
Count |
Operator, {operator_name} |
OtherCallbackCount NoteAvailable in Apache Airflow v2.6.3 and above. |
dag_processing.other_callback_count |
Count |
Function, Scheduler |
Processes |
dag_processing.processes |
Count |
Function, DAG Processing |
SchedulerHeartbeat |
scheduler_heartbeat |
Count |
Function, Scheduler |
StartedTaskInstances |
ti.start.{dag_id}.{task_id} |
Count |
DAG, All Task, All |
SlaCallbackCount |
dag_processing.sla_callback_count NoteAvailable for Apache Airflow v2.6.3 and above. |
Count |
Function, Scheduler |
TasksKilledExternally |
scheduler.tasks.killed_externally |
Count |
Function, Scheduler |
TaskTimeoutError |
celery.task_timeout_error |
Count |
Function, Celery |
TaskInstanceCreatedUsingOperator |
task_instance_created-{operator_name} |
Count |
Operator, {operator_name} |
TaskInstancePreviouslySucceeded |
previously_succeeded |
Count |
DAG, All Task, All |
TaskInstanceFailures |
ti_failures |
Count |
DAG, All Task, All |
TaskInstanceSuccesses |
ti_successes |
Count |
DAG, All Task, All |
TaskRemovedFromDAG |
task_removed_from_dag.{dag_id} |
Count |
DAG, {dag_id} |
TaskRestoredToDAG |
task_restored_to_dag.{dag_id} |
Count |
DAG, {dag_id} |
TriggersSucceeded NoteAvailable for Apache Airflow v2.7.2 and above. |
triggers.succeeded |
Count |
Function, Trigger |
TriggersFailed NoteAvailable for Apache Airflow v2.7.2 and above. |
triggers.failed |
Count |
Function, Trigger |
TriggersBlockedMainThread NoteAvailable for Apache Airflow v2.7.2 and above. |
triggers.blocked_main_thread |
Count |
Function, Trigger |
TriggerHeartbeat NoteAvailable for Apache Airflow v2.8.1 and above. |
triggerer_heartbeat |
Count |
Function, Triggerer |
TaskInstanceCreatedUsingOperator |
airflow.task_instance_created_ NoteAvailable for Apache Airflow v2.7.2 and above. |
Count |
Operator, |
ZombiesKilled |
zombies_killed |
Count |
DAG, All Task, All |
Apache Airflow Gauges
The Apache Airflow metrics in this section contain data about Apache Airflow Gauges
CloudWatch metric | Apache Airflow metric | Unit | Dimension |
---|---|---|---|
DAGFileRefreshError |
dag_file_refresh_error |
Count |
Function, DAG Processing |
ImportErrors |
dag_processing.import_errors |
Count |
Function, DAG Processing |
ExceptionFailures |
smart_sensor_operator.exception_failures |
Count |
Function, Smart Sensor Operator |
ExecutedTasks |
smart_sensor_operator.executed_tasks |
Count |
Function, Smart Sensor Operator |
InfraFailures |
smart_sensor_operator.infra_failures |
Count |
Function, Smart Sensor Operator |
LoadedTasks |
smart_sensor_operator.loaded_tasks |
Count |
Function, Smart Sensor Operator |
TotalParseTime |
dag_processing.total_parse_time |
Seconds |
Function, DAG Processing |
TriggeredDagRuns NoteAvailable in Apache Airflow v2.6.3 and above. |
dataset.triggered_dagruns |
Count |
Function, Scheduler |
TriggersRunning NoteAvailable in Apache Airflow v2.7.2 and above. |
triggers.running. |
Count |
Function, Trigger HostName, |
PoolDeferredSlots NoteAvailable in Apache Airflow v2.7.2 and above. |
pool.deferred_slots. |
Count |
Pool, {pool_name} |
DAGFileProcessingLastRunSecondsAgo |
dag_processing.last_run.seconds_ago.{dag_filename} |
Seconds |
DAG Filename, {dag_filename} |
OpenSlots |
executor.open_slots |
Count |
Function, Executor |
OrphanedTasksAdopted |
scheduler.orphaned_tasks.adopted |
Count |
Function, Scheduler |
OrphanedTasksCleared |
scheduler.orphaned_tasks.cleared |
Count |
Function, Scheduler |
PokedExceptions |
smart_sensor_operator.poked_exception |
Count |
Function, Smart Sensor Operator |
PokedSuccess |
smart_sensor_operator.poked_success |
Count |
Function, Smart Sensor Operator |
PokedTasks |
smart_sensor_operator.poked_tasks |
Count |
Function, Smart Sensor Operator |
PoolFailures |
pool.open_slots.{pool_name} |
Count |
Pool, {pool_name} |
PoolStarvingTasks |
pool.starving_tasks.{pool_name} |
Count |
Pool, {pool_name} |
PoolOpenSlots |
pool.open_slots.{pool_name} |
Count |
Pool, {pool_name} |
PoolQueuedSlots |
pool.queued_slots.{pool_name} |
Count |
Pool, {pool_name} |
PoolRunningSlots |
pool.running_slots.{pool_name} |
Count |
Pool, {pool_name} |
ProcessorTimeouts |
dag_processing.processor_timeouts |
Count |
Function, DAG Processing |
QueuedTasks |
executor.queued_tasks |
Count |
Function, Executor |
RunningTasks |
executor.running_tasks |
Count |
Function, Executor |
TasksExecutable |
scheduler.tasks.executable |
Count |
Function, Scheduler |
TasksPending NoteDoes not apply to Apache Airflow v2.2 and above. |
scheduler.tasks.pending |
Count |
Function, Scheduler |
TasksRunning |
scheduler.tasks.running |
Count |
Function, Scheduler |
TasksStarving |
scheduler.tasks.starving |
Count |
Function, Scheduler |
TasksWithoutDagRun |
scheduler.tasks.without_dagrun |
Count |
Function, Scheduler |
DAGFileProcessingLastNumOfDbQueries NoteAvailable in Apache Airflow v2.10.1 and above. |
dag_processing.last_num_of_db_queries.{dag_filename} | Count |
DAG Filename, {dag_filename} |
PoolScheduledSlotsNoteAvailable in Apache Airflow v2.10.1 and above. |
pool.scheduled_slots.{pool_name} | Count |
Pool, {pool_name} |
TaskCpuUsageNoteAvailable in Apache Airflow v2.10.1 and above. |
cpu.usage.{dag_id}.{task_id} | Percent |
DAG, {dag_id} Task, {task_id} |
TaskMemoryUsageNoteAvailable in Apache Airflow v2.10.1 and above. |
mem.usage.{dag_id}.{task_id} | Percent |
DAG, {dag_id} Task, {task_id} |
Apache Airflow Timers
The Apache Airflow metrics in this section contain data about Apache Airflow Timers
CloudWatch metric | Apache Airflow metric | Unit | Dimension |
---|---|---|---|
CollectDBDags |
collect_db_dags |
Milliseconds |
Function, DAG Processing |
CriticalSectionDuration |
scheduler.critical_section_duration |
Milliseconds |
Function, Scheduler |
CriticalSectionQueryDuration NoteAvailable for Apache Airflow v2.5.1 and above. |
scheduler.critical_section_query_duration |
Milliseconds |
Function, Scheduler |
DAGDependencyCheck |
dagrun.dependency-check.{dag_id} |
Milliseconds |
DAG, {dag_id} |
DAGDurationFailed |
dagrun.duration.failed.{dag_id} |
Milliseconds |
DAG, {dag_id} |
DAGDurationSuccess |
dagrun.duration.success.{dag_id} |
Milliseconds |
DAG, {dag_id} |
DAGFileProcessingLastDuration |
dag_processing.last_duration.{dag_filename} |
Seconds |
DAG Filename, {dag_filename} |
DAGScheduleDelay |
dagrun.schedule_delay.{dag_id} |
Milliseconds |
DAG, {dag_id} |
FirstTaskSchedulingDelay |
dagrun.{dag_id}.first_task_scheduling_delay |
Milliseconds |
DAG, {dag_id} |
SchedulerLoopDuration NoteAvailable for Apache Airflow v2.5.1 and above. |
scheduler.scheduler_loop_duration |
Milliseconds |
Function, Scheduler |
TaskInstanceDuration |
dag.{dag_id}.{task_id}.duration |
Milliseconds |
DAG, {dag_id} Task, {task_id} |
TaskInstanceQueuedDuration |
dag. NoteAvailable for Apache Airflow v2.7.2 and above. |
Milliseconds |
DAG, {dag_id} Task, {task_id} |
TaskInstanceScheduledDuration NoteAvailable for Apache Airflow v2.7.2 and above. |
dag. |
Milliseconds |
DAG, {dag_id} Task, {task_id} |
Choosing which metrics are reported
You can choose which Apache Airflow metrics are emitted to CloudWatch, or blocked by Apache Airflow, using the following Amazon MWAA configuration options:
metrics.metrics_allow_list
— A list of comma-separated prefixes you can use to select which metrics are emitted to CloudWatch by your environment. Use this option if you want Apache Airflow to not send all available metrics and instead select a subset of elements. For example,scheduler,executor,dagrun
.metrics.metrics_block_list
— A list of comma-separated prefixes to filter out metrics that start with the elements of the list. For example,scheduler,executor,dagrun
.
If you configure both metrics.metrics_allow_list
and metrics.metrics_block_list
, Apache Airflow ignores metrics.metrics_block_list
. If you configure metrics.metrics_block_list
but not metrics.metrics_allow_list
, Apache Airflow
filters out the elements you specify in metrics.metrics_block_list
.
Note
The metrics.metrics_allow_list
and metrics.metrics_block_list
configuration options only apply to Apache Airflow v2.6.3 and above. For previous version of Apache Airflow use metrics.statsd_allow_list
and metrics.statsd_block_list
instead.
What's next?
-
Explore the Amazon MWAA API operation used to publish environment health metrics at PublishMetrics.