What Is Amazon Managed Workflows for Apache Airflow? - Amazon Managed Workflows for Apache Airflow
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

What Is Amazon Managed Workflows for Apache Airflow?

Amazon Managed Workflows for Apache Airflow is a managed orchestration service for Apache Airflow that you can use to setup and operate data pipelines in the cloud at scale. Apache Airflow is an open-source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as workflows. With Amazon MWAA, you can use Apache Airflow and Python to create workflows without having to manage the underlying infrastructure for scalability, availability, and security. Amazon MWAA automatically scales its workflow execution capacity to meet your needs, Amazon MWAA integrates with Amazon security services to help provide you with fast and secure access to your data.

Features

  • Automatic Airflow setup – Quickly setup Apache Airflow by choosing an Apache Airflow version when you create an Amazon MWAA environment. Amazon MWAA sets up Apache Airflow for you using the same Apache Airflow user interface and open-source code that you can download on the Internet.

  • Automatic scaling – Automatically scale Apache Airflow Workers by setting the minimum and maximum number of Workers that run in your environment. Amazon MWAA monitors the Workers in your environment and uses its autoscaling component to add Workers to meet demand, up to and until it reaches the maximum number of Workers you defined.

  • Built-in authentication – Enable role-based authentication and authorization for your Apache Airflow Web server by defining the access control policies in Amazon Identity and Access Management (IAM). The Apache Airflow Workers assume these policies for secure access to Amazon services.

  • Built-in security – The Apache Airflow Workers and Schedulers run in Amazon MWAA's Amazon VPC. Data is also automatically encrypted using Amazon Key Management Service, so your environment is secure by default.

  • Public or private access modes – Access your Apache Airflow Web server using a private, or public access mode. The Public network access mode uses a VPC endpoint for your Apache Airflow Web server that is accessible over the Internet. The Private network access mode uses a VPC endpoint for your Apache Airflow Web server that is accessible in your VPC. In both cases, access for your Apache Airflow users is controlled by the access control policy you define in Amazon Identity and Access Management (IAM), and Amazon SSO.

  • Streamlined upgrades and patches – Amazon MWAA provides new versions of Apache Airflow periodically. The Amazon MWAA team will update and patch the images for these versions.

  • Workflow monitoring – View Apache Airflow logs and Apache Airflow metrics in Amazon CloudWatch to identify Apache Airflow task delays or workflow errors without the need for additional third-party tools. Amazon MWAA automatically sends environment metrics—and if enabled—Apache Airflow logs to CloudWatch.

  • Amazon integration – Amazon MWAA supports open-source integrations with Amazon Athena, Amazon Batch, Amazon CloudWatch, Amazon DynamoDB, Amazon DataSync, Amazon EMR, Amazon Fargate, Amazon EKS, Amazon Data Firehose, Amazon Glue, Amazon Lambda, Amazon Redshift, Amazon SQS, Amazon SNS, Amazon SageMaker, and Amazon S3, as well as hundreds of built-in and community-created operators and sensors.

  • Worker fleets – Amazon MWAA offers support for using containers to scale the worker fleet on demand and reduce scheduler outages using Amazon ECS on Amazon Fargate. Operators that invoke tasks on Amazon ECS containers, and Kubernetes operators that create and run pods on a Kubernetes cluster are supported.

Architecture

All of the components contained in the outer box (in the image below) appear as a single Amazon MWAA environment in your account. The Apache Airflow Scheduler and Workers are Amazon Fargate (Fargate) containers that connect to the private subnets in the Amazon VPC for your environment. Each environment has its own Apache Airflow metadatabase managed by Amazon that is accessible to the Scheduler and Workers Fargate containers via a privately-secured VPC endpoint.

Amazon CloudWatch, Amazon S3, Amazon SQS, and Amazon KMS are separate from Amazon MWAA and need to be accessible from the Apache Airflow Scheduler(s) and Workers in the Fargate containers.

The Apache Airflow Web server can be accessed either over the Internet by selecting the Public network Apache Airflow access mode, or within your VPC by selecting the Private network Apache Airflow access mode. In both cases, access for your Apache Airflow users is controlled by the access control policy you define in Amazon Identity and Access Management (IAM).

Note

Multiple Apache Airflow Schedulers are only available with Apache Airflow v2 and above. Learn more about the Apache Airflow task lifecycle at Concepts in the Apache Airflow reference guide.

This image shows the architecture of an Amazon MWAA environment.

Integration

The active and growing Apache Airflow open-source community provides operators (plugins that simplify connections to services) for Apache Airflow to integrate with Amazon services. This includes services such as Amazon S3, Amazon Redshift, Amazon EMR, Amazon Batch, and Amazon SageMaker, as well as services on other cloud platforms.

Using Apache Airflow with Amazon MWAA fully supports integration with Amazon services and popular third-party tools such as Apache Hadoop, Presto, Hive, and Spark to perform data processing tasks. Amazon MWAA is committed to maintaining compatibility with the Amazon MWAA API, and Amazon MWAA intends to provide reliable integrations to Amazon services and make them available to the community, and be involved in community feature development.

For sample code, see Code examples for Amazon Managed Workflows for Apache Airflow.

Supported versions

Amazon MWAA supports multiple versions of Apache Airflow. For more information about the Apache Airflow versions we support and the Apache Airflow components included with each version, see Apache Airflow versions on Amazon Managed Workflows for Apache Airflow.

What's next?