Configuring Amazon MWAA automatic scaling
The autoscaling mechanism automatically increases the number of Apache Airflow workers in response to running and queued tasks on your Amazon Managed Workflows for Apache Airflow environment and disposes of extra workers when there are no more tasks queued or executing. This page describes how you can configure autoscaling by specifying the maximum number of Apache Airflow workers that run on your environment using the Amazon MWAA console.
Note
Amazon MWAA uses Apache Airflow metrics to determine when additional Celery Executormax-workers
. When that number is zero, Amazon MWAA removes additional workers,
downscaling back to the min-workers
value. For more information, see the following How it works section.
When downscaling occurs, it is possible for new tasks to be scheduled. Furthermore, it's possible for workers that are set for deletion to pick up those tasks before the worker containers are removed. This period can last between two to five minutes, due to a combination of factors: the time it takes for the Apache Airflow metrics to be sent, the time to detect a steady state of zero tasks, and the time it takes the Fargate workers to be removed.
If you use Amazon MWAA with periods of sustained workload, followed by periods of no workload, you will be unaffected by this limitation. However, if you have very intermittent workloads with repeated high usage, followed by zero tasks for approximately five minutes, you might be affected by this issue when tasks running on the downscaled workers are deleted and marked as failed. If you are affected by this limitation, we recommend doing either of the following:
-
Set
min-workers
equal tomax-workers
at a sufficient capacity to meet your average workload, preferable if this pattern persists through the majority of a 24-hour period, as autoscaling would have limited value in such a case. -
Ensure that at least one task in one DAG, such as a DateTimeSensor
, is running for this period of intermittent activity to prevent unwanted downscaling.
Sections
Maximum worker count
The following image shows where you can customize the Maximum worker count to configure autoscaling on the Amazon MWAA console.

How it works
Amazon MWAA uses RunningTasks
and QueuedTasks
metrics, where
(tasks running + tasks queued) / (tasks per worker) = (required workers). If the required number of workers is greater than the current number of workers,
Amazon MWAA will add Fargate worker containers to that value, up to the maximum value specified by max-workers
.
When the RunningTasks
and QueuedTasks
metrics sum to zero for a period of two minutes, Amazon MWAA requests Fargate to set the number of workers to the environment's min-workers
value.
Amazon MWAA provides Fargate a stopTimeout
value of 120 seconds, currently the maximum available time,
to allow any work to complete on the workers, after which the container is removed and any remaining work in progress is deleted. In most cases, this occurs while no tasks are in the queue,
however under certain conditions mentioned in preceding section of this page, tasks might be queued while downscaling is taking place.
When you create an environment, Amazon MWAA creates an AWS-managed Amazon Aurora PostgreSQL metadata database and an Fargate container in each of your two private subnets in different availability zones. For example, a metadata database and container in us-east-1a
and a metadata database and container in us-east-1b
availability zones for the us-east-1
region.
-
The Apache Airflow workers on an Amazon MWAA environment use the Celery Executor
to queue and distribute tasks to multiple Celery workers from an Apache Airflow platform. The Celery Executor runs in an Amazon Fargate container. If a Fargate container in one availability zone fails, Amazon MWAA switches to the other container in a different availability zone to run the Celery Executor, and the Apache Airflow scheduler creates a new task instance in the Amazon Aurora PostgreSQL metadata database. -
By default, Amazon MWAA configures an environment to run hundreds of tasks in parallel (in
core.parallelism
) and workers concurrently (incore.dag_concurrency
). As tasks are queued, Amazon MWAA adds workers to meet demand, up to and until it reaches the number you define in Maximum worker count. -
For example, if you specified a value of
10
, Amazon MWAA adds up to 9 additional workers to meet demand. This autoscaling mechanism will continue running the additional workers, until there are no more tasks to run. When there are no more tasks running, or tasks in the queue, Amazon MWAA disposes of the workers and scales back down to a single worker.
Using the Amazon MWAA console
You can choose the maximum number of workers that can run on your environment concurrently on the Amazon MWAA console. By default, you can specify a maximum value up to 25.
To configure the number of workers
-
Open the Environments page
on the Amazon MWAA console. -
Choose an environment.
-
Choose Edit.
-
Choose Next.
-
On the Environment class pane, enter a value in Maximum worker count.
-
Choose Save.
Note
It can take a few minutes before changes take effect on your environment.
Example high performance use case
The following section describes the type of configurations you can use to enable high performance and parallelism on an environment.
On-premise Apache Airflow
Typically, in an on-premise Apache Airflow platform, you would configure task parallelism, autoscaling, and concurrency settings in your airflow.cfg
file:
-
core.parallelism
– The maximum number of task instances that can run simultaneously per scheduler. -
core.dag_concurrency
– The maximum concurrency for DAGs (not workers). -
celery.worker_autoscale
– The maximum and minimum number of tasks that can run concurrently on any worker.
For example, if core.parallelism
was set to 100
and core.dag_concurrency
was set to 7
, you would still only be able to run a total of 14
tasks concurrently if you had 2 DAGs. Given, each DAG is set to run only seven tasks concurrently (in core.dag_concurrency
), even though overall parallelism is set to 100
(in core.parallelism
).
On an Amazon MWAA environment
On an Amazon MWAA environment, you can configure these settings directly on the Amazon MWAA console using Using Apache Airflow configuration options on Amazon MWAA, Configuring the Amazon MWAA environment class, and the Maximum worker count autoscaling mechanism. While core.dag_concurrency
is not available in the dropdown list as an Apache Airflow configuration option on the Amazon MWAA console, you can add it as a custom Apache Airflow configuration option.
Let's say, when you created your environment, you chose the following settings:
-
The mw1.small environment class which controls the maximum number of concurrent tasks each worker can run by default and the vCPU of containers.
-
The default setting of
10
Workers in Maximum worker count. -
An Apache Airflow configuration option for
celery.worker_autoscale
of5,5
tasks per worker.
This means you can run 50 concurrent tasks in your environment. Any tasks beyond 50 will be queued, and wait for the running tasks to complete.
Run more concurrent tasks. You can modify your environment to run more tasks concurrently using the following configurations:
-
Increase the maximum number of concurrent tasks each worker can run by default and the vCPU of containers by choosing the
mw1.medium
(10 concurrent tasks by default) environment class. -
Add
celery.worker_autoscale
as an Apache Airflow configuration option. -
Increase the Maximum worker count. In this example, increasing maximum workers from
10
to20
would double the number of concurrent tasks the environment can run.
Specify Minimum workers. You can also specify the minimum and maximum number of Apache Airflow Workers that run in your environment using the Amazon Command Line Interface (Amazon CLI). For example:
aws mwaa update-environment --max-workers 10 --min-workers 10 --name
YOUR_ENVIRONMENT_NAME
To learn more, see the update-environment command in the Amazon CLI.
Troubleshooting tasks stuck in the running state
In rare cases, Apache Airflow may think there are tasks still running. To resolve this issue, you need to clear the stranded task in your Apache Airflow UI. For more information, see the I see my tasks stuck or not completing troubleshooting topic.
What's next?
-
Learn more about the best practices we recommend to tune the performance of your environment in Performance tuning for Apache Airflow on Amazon MWAA.