Explore Amazon MWAA network architecture - Amazon Managed Workflows for Apache Airflow
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Explore Amazon MWAA network architecture

The following section describes the main components that make up an Amazon MWAA environment, and the set of Amazon services that each environment integrates with to manage its resources, keep your data secure, and provide monitoring and visibility for your workflows.

Amazon MWAA components

Amazon MWAA environments consist of the following four main components:

  1. Scheduler — Parses and monitors all of your DAGs, and queues tasks for execution when a DAG's dependencies are met. Amazon MWAA deploys the scheduler as a Amazon Fargate cluster with a minimum of 2 schedulers. You can increase the scheduler count up to five, depending on your workload. For more information about Amazon MWAA environment classes, see Amazon MWAA environment class.

  2. Workers — One or more Fargate tasks that runs your scheduled tasks. The number of workers for your environment is determined by a range between a minimum and maximum number that you specify. Amazon MWAA starts auto-scaling workers when the number of queued and running tasks is more than your existing workers can handle. When running and queued tasks sum to zero for more than two minutes, Amazon MWAA scales back the number of workers to its minimum. For more information about how Amazon MWAA handles auto-scaling workers, see Amazon MWAA automatic scaling.

  3. Web server — Runs the Apache Airflow web UI. You can configure the web server with private or public network access. In both cases, access to your Apache Airflow users is controlled by the access control policy you define in Amazon Identity and Access Management (IAM). For more information about configuring IAM access policies for your environment, see Accessing an Amazon MWAA environment.

  4. Database — Stores metadata about the Apache Airflow environment and your workflows, including DAG run history. The database is a single-tenant Aurora PostgreSQL database managed by Amazon, and accessible to the Scheduler and Workers' Fargate containers via a privately-secured Amazon VPC endpoint.

Every Amazon MWAA environment also interacts with a set of Amazon services to handle a variety of tasks, including storing and accessing DAGs and task dependencies, securing your data at rest, and logging and monitoring you environment. The following diagram demonstrates the different components of an Amazon MWAA environment.

This image shows the architecture of an Amazon MWAA environment.
Note

The service Amazon VPC is not a shared VPC. Amazon MWAA creates an Amazon owned VPC for every environment you create.

  • Amazon S3 — Amazon MWAA stores all of your workflow resources, such as DAGs, requirements, and plugin files in an Amazon S3 bucket. For more information about creating the bucket as part of environment creation, and uploading your Amazon MWAA resources, see Create an Amazon S3 bucket for Amazon MWAA in the Amazon MWAA User Guide.

  • Amazon SQS — Amazon MWAA uses Amazon SQS for queueing your workflow tasks with a Celery executor.

  • Amazon ECR — Amazon ECR hosts all Apache Airflow images. Amazon MWAA only supports Amazon managed Apache Airflow images.

  • Amazon KMS — Amazon MWAA uses Amazon KMS to ensure your data is secure at rest. By default, Amazon MWAA uses Amazon managed Amazon KMS keys, but you can configure your environment to use your own customer-managed Amazon KMS key. For more information about using your own customer-managed Amazon KMS key, see Customer managed keys for Data Encryption in the Amazon MWAA User Guide.

  • CloudWatch — Amazon MWAA integrates with CloudWatch and delivers Apache Airflow logs and environment metrics to CloudWatch, allowing you to monitor your Amazon MWAA resources and troubleshoot issues.

Connectivity

Your Amazon MWAA environment needs access to all Amazon services it integrates with. The Amazon MWAA execution role controls how access is granted to Amazon MWAA to connect to other Amazon services on your behalf. For network connectivity, you can either provide public internet access to your Amazon VPC or create Amazon VPC endpoints. For more information on configuring Amazon VPC endpoints (Amazon PrivateLink) for your environment, see Managing access to VPC endpoints on Amazon MWAA in the Amazon MWAA User Guide.

Amazon MWAA installs requirements on the scheduler and worker. If your requirements are sourced from a public PyPi repository, your environment needs connectivity to the internet to download the required libraries. For private environments, you can either use a private PyPi repository, or bundle the libraries in .whl files as custom plugins for your environment.

When you configure the Apache Airflow in private mode, the Apache Airflow UI can only be accessible to your Amazon VPC though Amazon VPC endpoints.

For more information about networking, see Networking in the Amazon MWAA User Guide.