Apache Airflow v2 environment metrics in CloudWatch - Amazon Managed Workflows for Apache Airflow
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Apache Airflow v2 environment metrics in CloudWatch

Apache Airflow v2 is already set-up to collect and send StatsD metrics for an Amazon Managed Workflows for Apache Airflow environment to Amazon CloudWatch. The complete list of metrics Apache Airflow sends is available on the Metrics page in the Apache Airflow reference guide. This page describes the Apache Airflow metrics available in CloudWatch, and how to access metrics in the CloudWatch console.

Terms

Namespace

A namespace is a container for the CloudWatch metrics of an Amazon service. For Amazon MWAA, the namespace is AmazonMWAA.

CloudWatch metrics

A CloudWatch metric represents a time-ordered set of data points that are specific to CloudWatch.

Apache Airflow metrics

The Metrics specific to Apache Airflow.

Dimension

A dimension is a name/value pair that is part of the identity of a metric.

Unit

A statistic has a unit of measure. For Amazon MWAA, units include Count, Seconds, and Milliseconds. For Amazon MWAA, units are set based on the units in the original Airflow metrics.

Dimensions

This section describes the CloudWatch Dimensions grouping for Apache Airflow metrics in CloudWatch.

Dimension Description

DAG

Indicates a specific Apache Airflow DAG name.

DAG Filename

Indicates a specific Apache Airflow DAG file name.

Function

This dimension is used to improve the grouping of metrics in CloudWatch.

Job

Indicates an Apache Airflow Job run by the Scheduler. Always has a value of Job.

Operator

Indicates a specific Apache Airflow operator.

Pool

Indicates a specific Apache Airflow worker pool.

Task

Indicates a specific Apache Airflow task.

HostName

Indicates the hostname for a specific running Apache Airflow process.

Accessing metrics in the CloudWatch console

This section describes how to access performance metrics in CloudWatch for a specific DAG.

To view performance metrics for a dimension
  1. Open the Metrics page on the CloudWatch console.

  2. Use the Amazon Region selector to select your region.

  3. Choose the AmazonMWAA namespace.

  4. In the All metrics tab, select a dimension. For example, DAG, Environment.

  5. Choose a CloudWatch metric for a dimension. For example, TaskInstanceSuccesses or TaskInstanceDuration. Choose Graph all search results.

  6. Choose the Graphed metrics tab to view performance statistics for Apache Airflow metrics, such as DAG, Environment, Task.

Apache Airflow metrics available in CloudWatch

This section describes the Apache Airflow metrics and dimensions sent to CloudWatch.

Apache Airflow Counters

The Apache Airflow metrics in this section contain data about Apache Airflow Counters.

CloudWatch metric Apache Airflow metric Unit Dimension

SLAMissed

Note

Available for Apache Airflow v2.4.3 and above.

sla_missed

Count

Function, Scheduler

FailedSLACallback

Note

Available for Apache Airflow v2.4.3 and above.

sla_callback_notification_failure

Count

Function, Scheduler

Updates

Note

Available for Apache Airflow v2.6.3 and above.

dataset.updates

Count

Function, Scheduler

Orphaned

Note

Available for Apache Airflow v2.6.3 and above.

dataset.orphaned

Count

Function, Scheduler

FailedCeleryTaskExecution

Note

Available for Apache Airflow v2.4.3 and above.

celery.execute_command.failure

Count

Function, Celery

FilePathQueueUpdateCount

Note

Available for Apache Airflow v2.6.3 and above.

dag_processing.file_path_queue_update_count

Count

Function, Scheduler

CriticalSectionBusy

scheduler.critical_section_busy

Count

Function, Scheduler

DagBagSize

dagbag_size

Count

Function, DAG Processing

DagCallbackExceptions

dag.callback_exceptions

Count

DAG, All

FailedSLAEmailAttempts

sla_email_notification_failure

Count

Function, Scheduler

TaskInstanceFinished

ti.finish.{dag_id}.{task_id}.{state}

Count

DAG, {dag_id}

Task, {task_id}

State, {state}

JobEnd

{job_name}_end

Count

Job, {job_name}

JobHeartbeatFailure

{job_name}_heartbeat_failure

Count

Job, {job_name}

JobStart

{job_name}_start

Count

Job, {job_name}

ManagerStalls

dag_processing.manager_stalls

Count

Function, DAG Processing

OperatorFailures

operator_failures_{operator_name}

Count

Operator, {operator_name}

OperatorSuccesses

operator_successes_{operator_name}

Count

Operator, {operator_name}

OtherCallbackCount

Note

Available in Apache Airflow v2.6.3 and above.

dag_processing.other_callback_count

Count

Function, Scheduler

Processes

dag_processing.processes

Count

Function, DAG Processing

SchedulerHeartbeat

scheduler_heartbeat

Count

Function, Scheduler

StartedTaskInstances

ti.start.{dag_id}.{task_id}

Count

DAG, All

Task, All

SlaCallbackCount

dag_processing.sla_callback_count

Note

Available for Apache Airflow v2.6.3 and above.

Count

Function, Scheduler

TasksKilledExternally

scheduler.tasks.killed_externally

Count

Function, Scheduler

TaskTimeoutError

celery.task_timeout_error

Count

Function, Celery

TaskInstanceCreatedUsingOperator

task_instance_created-{operator_name}

Count

Operator, {operator_name}

TaskInstancePreviouslySucceeded

previously_succeeded

Count

DAG, All

Task, All

TaskInstanceFailures

ti_failures

Count

DAG, All

Task, All

TaskInstanceSuccesses

ti_successes

Count

DAG, All

Task, All

TaskRemovedFromDAG

task_removed_from_dag.{dag_id}

Count

DAG, {dag_id}

TaskRestoredToDAG

task_restored_to_dag.{dag_id}

Count

DAG, {dag_id}

TriggersSucceeded

Note

Available for Apache Airflow v2.7.2 and above.

triggers.succeeded

Count

Function, Trigger

TriggersFailed

Note

Available for Apache Airflow v2.7.2 and above.

triggers.failed

Count

Function, Trigger

TriggersBlockedMainThread

Note

Available for Apache Airflow v2.7.2 and above.

triggers.blocked_main_thread

Count

Function, Trigger

TriggerHeartbeat

Note

Available for Apache Airflow v2.8.1 and above.

triggerer_heartbeat

Count

Function, Triggerer

TaskInstanceCreatedUsingOperator

airflow.task_instance_created_{operator_name}

Note

Available for Apache Airflow v2.7.2 and above.

Count

Operator, {operator_name}

ZombiesKilled

zombies_killed

Count

DAG, All

Task, All

Apache Airflow Gauges

The Apache Airflow metrics in this section contain data about Apache Airflow Gauges.

CloudWatch metric Apache Airflow metric Unit Dimension

DAGFileRefreshError

dag_file_refresh_error

Count

Function, DAG Processing

ImportErrors

dag_processing.import_errors

Count

Function, DAG Processing

ExceptionFailures

smart_sensor_operator.exception_failures

Count

Function, Smart Sensor Operator

ExecutedTasks

smart_sensor_operator.executed_tasks

Count

Function, Smart Sensor Operator

InfraFailures

smart_sensor_operator.infra_failures

Count

Function, Smart Sensor Operator

LoadedTasks

smart_sensor_operator.loaded_tasks

Count

Function, Smart Sensor Operator

TotalParseTime

dag_processing.total_parse_time

Seconds

Function, DAG Processing

TriggeredDagRuns

Note

Available in Apache Airflow v2.6.3 and above.

dataset.triggered_dagruns

Count

Function, Scheduler

TriggersRunning

Note

Available in Apache Airflow v2.7.2 and above.

triggers.running.{hostname}

Count

Function, Trigger

HostName, {hostname}

PoolDeferredSlots

Note

Available in Apache Airflow v2.7.2 and above.

pool.deferred_slots.{pool_name}

Count

Pool, {pool_name}

DAGFileProcessingLastRunSecondsAgo

dag_processing.last_run.seconds_ago.{dag_filename}

Seconds

DAG Filename, {dag_filename}

OpenSlots

executor.open_slots

Count

Function, Executor

OrphanedTasksAdopted

scheduler.orphaned_tasks.adopted

Count

Function, Scheduler

OrphanedTasksCleared

scheduler.orphaned_tasks.cleared

Count

Function, Scheduler

PokedExceptions

smart_sensor_operator.poked_exception

Count

Function, Smart Sensor Operator

PokedSuccess

smart_sensor_operator.poked_success

Count

Function, Smart Sensor Operator

PokedTasks

smart_sensor_operator.poked_tasks

Count

Function, Smart Sensor Operator

PoolFailures

pool.open_slots.{pool_name}

Count

Pool, {pool_name}

PoolStarvingTasks

pool.starving_tasks.{pool_name}

Count

Pool, {pool_name}

PoolOpenSlots

pool.open_slots.{pool_name}

Count

Pool, {pool_name}

PoolQueuedSlots

pool.queued_slots.{pool_name}

Count

Pool, {pool_name}

PoolRunningSlots

pool.running_slots.{pool_name}

Count

Pool, {pool_name}

ProcessorTimeouts

dag_processing.processor_timeouts

Count

Function, DAG Processing

QueuedTasks

executor.queued_tasks

Count

Function, Executor

RunningTasks

executor.running_tasks

Count

Function, Executor

TasksExecutable

scheduler.tasks.executable

Count

Function, Scheduler

TasksPending

Note

Does not apply to Apache Airflow v2.2 and above.

scheduler.tasks.pending

Count

Function, Scheduler

TasksRunning

scheduler.tasks.running

Count

Function, Scheduler

TasksStarving

scheduler.tasks.starving

Count

Function, Scheduler

TasksWithoutDagRun

scheduler.tasks.without_dagrun

Count

Function, Scheduler

Apache Airflow Timers

The Apache Airflow metrics in this section contain data about Apache Airflow Timers.

CloudWatch metric Apache Airflow metric Unit Dimension

CollectDBDags

collect_db_dags

Milliseconds

Function, DAG Processing

CriticalSectionDuration

scheduler.critical_section_duration

Milliseconds

Function, Scheduler

CriticalSectionQueryDuration

Note

Available for Apache Airflow v2.5.1 and above.

scheduler.critical_section_query_duration

Milliseconds

Function, Scheduler

DAGDependencyCheck

dagrun.dependency-check.{dag_id}

Milliseconds

DAG, {dag_id}

DAGDurationFailed

dagrun.duration.failed.{dag_id}

Milliseconds

DAG, {dag_id}

DAGDurationSuccess

dagrun.duration.success.{dag_id}

Milliseconds

DAG, {dag_id}

DAGFileProcessingLastDuration

dag_processing.last_duration.{dag_filename}

Seconds

DAG Filename, {dag_filename}

DAGScheduleDelay

dagrun.schedule_delay.{dag_id}

Milliseconds

DAG, {dag_id}

FirstTaskSchedulingDelay

dagrun.{dag_id}.first_task_scheduling_delay

Milliseconds

DAG, {dag_id}

SchedulerLoopDuration

Note

Available for Apache Airflow v2.5.1 and above.

scheduler.scheduler_loop_duration

Milliseconds

Function, Scheduler

TaskInstanceDuration

dag.{dag_id}.{task_id}.duration

Milliseconds

DAG, {dag_id}

Task, {task_id}

TaskInstanceQueuedDuration

dag.{dag_id}.{task_id}.queued_duration

Note

Available for Apache Airflow v2.7.2 and above.

Milliseconds

DAG, {dag_id}

Task, {task_id}

TaskInstanceScheduledDuration

Note

Available for Apache Airflow v2.7.2 and above.

dag.{dag_id}.{task_id}.scheduled_duration

Milliseconds

DAG, {dag_id}

Task, {task_id}

Choosing which metrics are reported

You can choose which Apache Airflow metrics are emitted to CloudWatch, or blocked by Apache Airflow, using the following Amazon MWAA configuration options:

  • metrics.metrics_allow_list — A list of comma-separated prefixes you can use to select which metrics are emitted to CloudWatch by your environment. Use this option if you want Apache Airflow to not send all available metrics and instead select a subset of elements. For example, scheduler,executor,dagrun.

  • metrics.metrics_block_list — A list of comma-separated prefixes to filter out metrics that start with the elements of the list. For example, scheduler,executor,dagrun.

If you configure both metrics.metrics_allow_list and metrics.metrics_block_list, Apache Airflow ignores metrics.metrics_block_list. If you configure metrics.metrics_block_list but not metrics.metrics_allow_list, Apache Airflow filters out the elements you specify in metrics.metrics_block_list.

Note

The metrics.metrics_allow_list and metrics.metrics_block_list configuration options only apply to Apache Airflow v2.6.3 and above. For previous version of Apache Airflow use metrics.statsd_allow_list and metrics.statsd_block_list instead.

What's next?

  • Explore the Amazon MWAA API operation used to publish environment health metrics at PublishMetrics.