Amazon MWAA frequently asked questions - Amazon Managed Workflows for Apache Airflow
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon MWAA frequently asked questions

This page describes common questions you may encounter when using Amazon Managed Workflows for Apache Airflow.

Contents

Supported versions

What does Amazon MWAA support for Apache Airflow v2?

To learn what Amazon MWAA supports, see Apache Airflow versions on Amazon Managed Workflows for Apache Airflow.

Why are older versions of Apache Airflow not supported?

We are only supporting the latest (as of launch) Apache Airflow version Apache Airflow v1.10.12 due to security concerns with older versions.

What Python version should I use?

The following Apache Airflow versions are supported on Amazon Managed Workflows for Apache Airflow.

Note
  • Beginning with Apache Airflow v2.2.2, Amazon MWAA supports installing Python requirements, provider packages, and custom plugins directly on the Apache Airflow web server.

  • Beginning with Apache Airflow v2.7.2, your requirements file must include a --constraint statement. If you do not provide a constraint, Amazon MWAA will specify one for you to ensure the packages listed in your requirements are compatible with the version of Apache Airflow you are using.

    For more information on setting up constraints in your requirements file, see Installing Python dependencies.

For more information about migrating your self-managed Apache Airflow deployments, or migrating an existing Amazon MWAA environment, including instructions for backing up your metadata database, see the Amazon MWAA Migration Guide.

What version of pip does Amazon MWAA use?

For environments running Apache Airflow v1.10.12, Amazon MWAA installs pip version 21.1.2.

Note

Amazon MWAA will not upgrade pip for Apache Airflow v1.10.12 environments.

For environments running Apache Airflow v2 and above, Amazon MWAA installs pip version 21.3.1.

Use cases

When should I use Amazon Step Functions vs. Amazon MWAA?

  1. You can use Step Functions to process individual customer orders, since Step Functions can scale to meet demand for one order or one million orders.

  2. If you’re running an overnight workflow that processes the previous day’s orders, you can use Step Functions or Amazon MWAA. Amazon MWAA allows you an open source option to abstract the workflow from the Amazon resources you're using.

Environment specifications

How much task storage is available to each environment?

The task storage is limited to 20 GB, and is specified by Amazon ECS Fargate 1.4. The amount of RAM is determined by the environment class you specify. For more information about environment classes, see Configuring the Amazon MWAA environment class.

What is the default operating system used for Amazon MWAA environments?

Amazon MWAA environments are created on instances running Amazon Linux 2 for versions 2.6 and older, and on instances running Amazon Linux 2023 for versions 2.7 and newer.

Can I use a custom image for my Amazon MWAA environment?

Custom images are not supported. Amazon MWAA uses images that are built on Amazon Linux AMI. Amazon MWAA installs the additional requirements by running pip3 -r install for the requirements specified in the requirements.txt file you add to the Amazon S3 bucket for the environment.

Is Amazon MWAA HIPAA compliant?

Amazon MWAA is Health Insurance Portability and Accountability Act (HIPAA) eligible. If you have a HIPAA Business Associate Addendum (BAA) in place with Amazon, you can use Amazon MWAA for workflows handling Protected Health Information (PHI) on environments created on, or after, November 14th, 2022.

Does Amazon MWAA support Spot Instances?

Amazon MWAA does not currently support on-demand Amazon EC2 Spot Instance types for Apache Airflow. However, an Amazon MWAA environment can trigger Spot Instances on, for example, Amazon EMR and Amazon EC2.

Does Amazon MWAA support a custom domain?

To be able to use a custom domain for your Amazon MWAA hostname, do one of the following:

Can I SSH into my environment?

While SSH is not supported on a Amazon MWAA environment, it's possible to use a DAG to run bash commands using the BashOperator. For example:

from airflow import DAG from airflow.operators.bash_operator import BashOperator from airflow.utils.dates import days_ago with DAG(dag_id="any_bash_command_dag", schedule_interval=None, catchup=False, start_date=days_ago(1)) as dag: cli_command = BashOperator( task_id="bash_command", bash_command="{{ dag_run.conf['command'] }}" )

To trigger the DAG in the Apache Airflow UI, use:

{ "command" : "your bash command"}

Why is a self-referencing rule required on the VPC security group?

By creating a self-referencing rule, you're restricting the source to the same security group in the VPC, and it's not open to all networks. To learn more, see Security in your VPC on Amazon MWAA.

Can I hide environments from different groups in IAM?

You can limit access by specifying an environment name in Amazon Identity and Access Management, however, visibility filtering isn't available in the Amazon console—if a user can see one environment, they can see all environments.

Can I store temporary data on the Apache Airflow Worker?

Your Apache Airflow Operators can store temporary data on the Workers. Apache Airflow Workers can access temporary files in the /tmp on the Fargate containers for your environment.

Note

Total task storage is limited to 20 GB, according to Amazon ECS Fargate 1.4. There's no guarantee that subsequent tasks will run on the same Fargate container instance, which might use a different /tmp folder.

Can I specify more than 25 Apache Airflow Workers?

Yes. Although you can specify up to 25 Apache Airflow workers on the Amazon MWAA console, you can configure up to 50 on an environment by requesting a quota increase. For more information, see Requesting a quota increase.

Does Amazon MWAA support shared Amazon VPCs or shared subnets?

Amazon MWAA does not support shared Amazon VPCs or shared subnets. The Amazon VPC you select when you create an environment should be owned by the account that is attempting to create the environment. However, you can route traffic from an Amazon VPC in the Amazon MWAA account to a shared VPC. For more information, and to see an example of routing traffic to a shared Amazon VPC, see Centralized outbound routing to the internet in the Amazon VPC Transit Gateways Guide.

Metrics

What metrics are used to determine whether to scale Workers?

Amazon MWAA monitors the QueuedTasks and RunningTasks in CloudWatch to determine whether to scale Apache Airflow Workers on your environment. To learn more, see Monitoring and metrics for Amazon Managed Workflows for Apache Airflow.

Can I create custom metrics in CloudWatch?

Not on the CloudWatch console. However, you can create a DAG that writes custom metrics in CloudWatch. For more information, see Using a DAG to write custom metrics in CloudWatch.

DAGs, Operators, Connections, and other questions

Can I use the PythonVirtualenvOperator?

The PythonVirtualenvOperator is not explicitly supported on Amazon MWAA, but you can create a custom plugin that uses the PythonVirtualenvOperator. For sample code, see Creating a custom plugin for Apache Airflow PythonVirtualenvOperator.

How long does it take Amazon MWAA to recognize a new DAG file?

DAGs are periodically synchronized from the Amazon S3 bucket to your environment. If you add a new DAG file, it takes about 300 seconds for Amazon MWAA to start using the new file. If you update an existing DAG, it takes Amazon MWAA about 30 seconds to recognize your updates.

These values, 300 seconds for new DAGs, and 30 seconds for updates to existing DAGs, correspond to Apache Airflow configuration options dag_dir_list_interval, and min_file_process_interval respectively.

Why is my DAG file not picked up by Apache Airflow?

The following are possible solutions for this issue:

  1. Check that your execution role has sufficient permissions to your Amazon S3 bucket. To learn more, see Amazon MWAA execution role.

  2. Check that the Amazon S3 bucket has Block Public Access configured, and Versioning enabled. To learn more, see Create an Amazon S3 bucket for Amazon MWAA.

  3. Verify the DAG file itself. For example, be sure that each DAG has a unique DAG ID.

Can I remove a plugins.zip or requirements.txt from an environment?

Currently, there is no way to remove a plugins.zip or requirements.txt from an environment once they’ve been added, but we're working on the issue. In the interim, a workaround is to point to an empty text or zip file, respectively. To learn more, see Deleting files on Amazon S3.

Why don't I see my plugins in the Apache Airflow v2.0.2 Admin Plugins menu?

For security reasons, the Apache Airflow Web server on Amazon MWAA has limited network egress, and does not install plugins nor Python dependencies directly on the Apache Airflow web server for version 2.0.2 environments. The plugin that's shown allows Amazon MWAA to authenticate your Apache Airflow users in Amazon Identity and Access Management (IAM).

To be able to install plugins and Python dependencies directly on the web server, we recommend creating a new environemnt with Apache Airflow v2.2 and above. Amazon MWAA installs Python dependencies and and custom plugins directly on the web server for Apache Airflow v2.2 and above.

Can I use Amazon Database Migration Service (DMS) Operators?

Amazon MWAA supports DMS Operators. However, this operator cannot be used to perform actions on the Amazon Aurora PostgreSQL metadata database associated with an Amazon MWAA environment.

When I access the Airflow REST API using the Amazon credentials, can I increase the throttling limit to more than 10 transactions per second (TPS)?

Yes, you can. To increase the throttling limit, please contact Amazon Customer Support.