Amazon MWAA frequently asked questions
This page describes common questions you may encounter when using Amazon Managed Workflows for Apache Airflow.
Contents
- Supported versions
- Use cases
- Environment specifications
- How much task storage is available to each environment?
- What is the default operating system used for Amazon MWAA environments?
- Can I use a custom image for my Amazon MWAA environment?
- Is Amazon MWAA HIPAA compliant?
- Does Amazon MWAA support Spot Instances?
- Does Amazon MWAA support a custom domain?
- Can I SSH into my environment?
- Why is a self-referencing rule required on the VPC security group?
- Can I hide environments from different groups in IAM?
- Can I store temporary data on the Apache Airflow Worker?
- Can I specify more than 25 Apache Airflow Workers?
- Does Amazon MWAA support shared Amazon VPCs or shared subnets?
- Metrics
- DAGs, Operators, Connections, and other questions
- Can I use the PythonVirtualenvOperator?
- How long does it take Amazon MWAA to recognize a new DAG file?
- Why is my DAG file not picked up by Apache Airflow?
- Can I remove a plugins.zip or requirements.txt from an environment?
- Why don't I see my plugins in the Apache Airflow v2.0.2 Admin Plugins menu?
- Can I use Amazon Database Migration Service (DMS) Operators?
- When I access the Airflow REST API using the Amazon credentials, can I increase the throttling limit to more than 10 transactions per second (TPS)?
Supported versions
What does Amazon MWAA support for Apache Airflow v2?
To learn what Amazon MWAA supports, see Apache Airflow versions on Amazon Managed Workflows for Apache Airflow.
Why are older versions of Apache Airflow not supported?
We are only supporting the latest (as of launch) Apache Airflow version Apache Airflow v1.10.12 due to security concerns with older versions.
What Python version should I use?
The following Apache Airflow versions are supported on Amazon Managed Workflows for Apache Airflow.
Note
-
Beginning with Apache Airflow v2.2.2, Amazon MWAA supports installing Python requirements, provider packages, and custom plugins directly on the Apache Airflow web server.
-
Beginning with Apache Airflow v2.7.2, your requirements file must include a
--constraint
statement. If you do not provide a constraint, Amazon MWAA will specify one for you to ensure the packages listed in your requirements are compatible with the version of Apache Airflow you are using.For more information on setting up constraints in your requirements file, see Installing Python dependencies.
Apache Airflow version | Apache Airflow guide | Apache Airflow constraints | Python version |
---|---|---|---|
For more information about migrating your self-managed Apache Airflow deployments, or migrating an existing Amazon MWAA environment, including instructions for backing up your metadata database, see the Amazon MWAA Migration Guide.
What version of pip
does Amazon MWAA use?
For environments running Apache Airflow v1.10.12, Amazon MWAA installs pip
version 21.1.2.
Note
Amazon MWAA will not upgrade pip
for Apache Airflow v1.10.12 environments.
For environments running Apache Airflow v2 and above, Amazon MWAA installs pip
version 21.3.1.
Use cases
When should I use Amazon Step Functions vs. Amazon MWAA?
-
You can use Step Functions to process individual customer orders, since Step Functions can scale to meet demand for one order or one million orders.
-
If you’re running an overnight workflow that processes the previous day’s orders, you can use Step Functions or Amazon MWAA. Amazon MWAA allows you an open source option to abstract the workflow from the Amazon resources you're using.
Environment specifications
How much task storage is available to each environment?
The task storage is limited to 20 GB, and is specified by Amazon ECS Fargate 1.4. The amount of RAM is determined by the environment class you specify. For more information about environment classes, see Configuring the Amazon MWAA environment class.
What is the default operating system used for Amazon MWAA environments?
Amazon MWAA environments are created on instances running Amazon Linux 2 for versions 2.6 and older, and on instances running Amazon Linux 2023 for versions 2.7 and newer.
Can I use a custom image for my Amazon MWAA environment?
Custom images are not supported. Amazon MWAA uses images that are built on Amazon Linux AMI. Amazon MWAA installs the additional requirements by running pip3 -r install
for the requirements specified in the requirements.txt file you add to the Amazon S3 bucket for the environment.
Is Amazon MWAA HIPAA compliant?
Amazon MWAA is Health Insurance Portability and Accountability Act (HIPAA)
Does Amazon MWAA support Spot Instances?
Amazon MWAA does not currently support on-demand Amazon EC2 Spot Instance types for Apache Airflow. However, an Amazon MWAA environment can trigger Spot Instances on, for example, Amazon EMR and Amazon EC2.
Does Amazon MWAA support a custom domain?
To be able to use a custom domain for your Amazon MWAA hostname, do one of the following:
-
For Amazon MWAA deployments with public web server access, you can use Amazon CloudFront with Lambda@Edge to direct traffic to your environment, and map a custom domain name to CloudFront. For more information and an example of setting up a custom domain for a public environment, see the Amazon MWAA custom domain for public web server
sample in the Amazon MWAA examples GitHub repository. -
For Amazon MWAA deployments with private web server access, see Setting up a custom domain for the Apache Airflow web server.
Can I SSH into my environment?
While SSH is not supported on a Amazon MWAA environment, it's possible to use a DAG to run bash commands using the BashOperator
. For example:
from airflow import DAG from airflow.operators.bash_operator import BashOperator from airflow.utils.dates import days_ago with DAG(dag_id="any_bash_command_dag", schedule_interval=None, catchup=False, start_date=days_ago(1)) as dag: cli_command = BashOperator( task_id="bash_command", bash_command="{{ dag_run.conf['command'] }}" )
To trigger the DAG in the Apache Airflow UI, use:
{ "command" : "your bash command"}
Why is a self-referencing rule required on the VPC security group?
By creating a self-referencing rule, you're restricting the source to the same security group in the VPC, and it's not open to all networks. To learn more, see Security in your VPC on Amazon MWAA.
Can I hide environments from different groups in IAM?
You can limit access by specifying an environment name in Amazon Identity and Access Management, however, visibility filtering isn't available in the Amazon console—if a user can see one environment, they can see all environments.
Can I store temporary data on the Apache Airflow Worker?
Your Apache Airflow Operators can store temporary data on the Workers. Apache Airflow Workers can access temporary files in the /tmp
on the Fargate containers for your environment.
Note
Total task storage is limited to 20 GB, according to Amazon ECS Fargate 1.4. There's no guarantee that subsequent tasks will run on the
same Fargate container instance, which might use a different /tmp
folder.
Can I specify more than 25 Apache Airflow Workers?
Yes. Although you can specify up to 25 Apache Airflow workers on the Amazon MWAA console, you can configure up to 50 on an environment by requesting a quota increase. For more information, see Requesting a quota increase.
Does Amazon MWAA support shared Amazon VPCs or shared subnets?
Amazon MWAA does not support shared Amazon VPCs or shared subnets. The Amazon VPC you select when you create an environment should be owned by the account that is attempting to create the environment. However, you can route traffic from an Amazon VPC in the Amazon MWAA account to a shared VPC. For more information, and to see an example of routing traffic to a shared Amazon VPC, see Centralized outbound routing to the internet in the Amazon VPC Transit Gateways Guide.
Metrics
What metrics are used to determine whether to scale Workers?
Amazon MWAA monitors the QueuedTasks and RunningTasks in CloudWatch to determine whether to scale Apache Airflow Workers on your environment. To learn more, see Monitoring and metrics for Amazon Managed Workflows for Apache Airflow.
Can I create custom metrics in CloudWatch?
Not on the CloudWatch console. However, you can create a DAG that writes custom metrics in CloudWatch. For more information, see Using a DAG to write custom metrics in CloudWatch.
DAGs, Operators, Connections, and other questions
Can I use the PythonVirtualenvOperator
?
The PythonVirtualenvOperator
is not explicitly supported on Amazon MWAA, but you can create a custom plugin that uses the PythonVirtualenvOperator
. For sample code, see Creating a custom plugin for Apache Airflow PythonVirtualenvOperator.
How long does it take Amazon MWAA to recognize a new DAG file?
DAGs are periodically synchronized from the Amazon S3 bucket to your environment. If you add a new DAG file, it takes about 300 seconds for Amazon MWAA to start using the new file. If you update an existing DAG, it takes Amazon MWAA about 30 seconds to recognize your updates.
These values, 300 seconds for new DAGs, and 30 seconds for updates to existing DAGs, correspond to Apache Airflow configuration options
dag_dir_list_interval
min_file_process_interval
Why is my DAG file not picked up by Apache Airflow?
The following are possible solutions for this issue:
-
Check that your execution role has sufficient permissions to your Amazon S3 bucket. To learn more, see Amazon MWAA execution role.
-
Check that the Amazon S3 bucket has Block Public Access configured, and Versioning enabled. To learn more, see Create an Amazon S3 bucket for Amazon MWAA.
-
Verify the DAG file itself. For example, be sure that each DAG has a unique DAG ID.
Can I remove a plugins.zip
or requirements.txt
from an environment?
Currently, there is no way to remove a plugins.zip or requirements.txt from an environment once they’ve been added, but we're working on the issue. In the interim, a workaround is to point to an empty text or zip file, respectively. To learn more, see Deleting files on Amazon S3.
Why don't I see my plugins in the Apache Airflow v2.0.2 Admin Plugins menu?
For security reasons, the Apache Airflow Web server on Amazon MWAA has limited network egress, and does not install plugins nor Python dependencies directly on the Apache Airflow web server for version 2.0.2 environments. The plugin that's shown allows Amazon MWAA to authenticate your Apache Airflow users in Amazon Identity and Access Management (IAM).
To be able to install plugins and Python dependencies directly on the web server, we recommend creating a new environemnt with Apache Airflow v2.2 and above. Amazon MWAA installs Python dependencies and and custom plugins directly on the web server for Apache Airflow v2.2 and above.
Can I use Amazon Database Migration Service (DMS) Operators?
Amazon MWAA supports DMS Operators
When I access the Airflow REST API using the Amazon credentials, can I increase the throttling limit to more than 10 transactions per second (TPS)?
Yes, you can. To increase the throttling limit, please contact Amazon Customer Support