Troubleshooting: DAGs, Operators, Connections, and other issues in Apache Airflow v1 - Amazon Managed Workflows for Apache Airflow
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Troubleshooting: DAGs, Operators, Connections, and other issues in Apache Airflow v1

The topics on this page contains resolutions to Apache Airflow v1.10.12 Python dependencies, custom plugins, DAGs, Operators, Connections, tasks, and Web server issues you may encounter on an Amazon Managed Workflows for Apache Airflow environment.

Updating requirements.txt

The following topic describes the errors you may receive when updating your requirements.txt.

Adding apache-airflow-providers-amazon causes my environment to fail

apache-airflow-providers-xyz is only compatible with Apache Airflow v2. apache-airflow-backport-providers-xyz is compatible with Apache Airflow 1.10.12.

Broken DAG

The following topic describes the errors you may receive when running DAGs.

I received a 'Broken DAG' error when using Amazon DynamoDB operators

We recommend the following steps:

  1. Test your DAGs, custom plugins, and Python dependencies locally using the aws-mwaa-local-runner on GitHub.

  2. Add the following package to your requirements.txt.

    boto
  3. Explore ways to specify Python dependencies in a requirements.txt file, see Managing Python dependencies in requirements.txt.

I received 'Broken DAG: No module named psycopg2' error

We recommend the following steps:

  1. Test your DAGs, custom plugins, and Python dependencies locally using the aws-mwaa-local-runner on GitHub.

  2. Add the following to your requirements.txt with your Apache Airflow version. For example:

    apache-airflow[postgres]==1.10.12
  3. Explore ways to specify Python dependencies in a requirements.txt file, see Managing Python dependencies in requirements.txt.

I received a 'Broken DAG' error when using the Slack operators

We recommend the following steps:

  1. Test your DAGs, custom plugins, and Python dependencies locally using the aws-mwaa-local-runner on GitHub.

  2. Add the following package to your requirements.txt and specify your Apache Airflow version. For example:

    apache-airflow[slack]==1.10.12
  3. Explore ways to specify Python dependencies in a requirements.txt file, see Managing Python dependencies in requirements.txt.

I received various errors installing Google/GCP/BigQuery

Amazon MWAA uses Amazon Linux which requires a specific version of Cython and cryptograpy libraries. We recommend the following steps:

  1. Test your DAGs, custom plugins, and Python dependencies locally using the aws-mwaa-local-runner on GitHub.

  2. Add the following package to your requirements.txt.

    grpcio==1.27.2 cython==0.29.21 pandas-gbq==0.13.3 cryptography==3.3.2 apache-airflow-backport-providers-amazon[google]
  3. If you’re not using backport providers, you can use:

    grpcio==1.27.2 cython==0.29.21 pandas-gbq==0.13.3 cryptography==3.3.2 apache-airflow[gcp]==1.10.12
  4. Explore ways to specify Python dependencies in a requirements.txt file, see Managing Python dependencies in requirements.txt.

I received 'Broken DAG: No module named Cython' error

Amazon MWAA uses Amazon Linux which requires a specific version of Cython. We recommend the following steps:

  1. Test your DAGs, custom plugins, and Python dependencies locally using the aws-mwaa-local-runner on GitHub.

  2. Add the following package to your requirements.txt.

    cython==0.29.21
  3. Cython libraries have various required pip dependency versions. For example, using awswrangler==2.4.0 requires pyarrow<3.1.0,>=2.0.0, so pip3 tries to install pyarrow==3.0.0 which causes a Broken DAG error. We recommend specifying the oldest acceptible version explicity. For example, if you specify the minimum value pyarrow==2.0.0 before awswrangler==2.4.0 then the error goes away, and the requirements.txt installs correctly. The final requirements should look like this:

    cython==0.29.21 pyarrow==2.0.0 awswrangler==2.4.0
  4. Explore ways to specify Python dependencies in a requirements.txt file, see Managing Python dependencies in requirements.txt.

Operators

The following topic describes the errors you may receive when using Operators.

I received an error using the BigQuery operator

Amazon MWAA does not support operators with UI extensions. We recommend the following steps:

  1. Test your DAGs, custom plugins, and Python dependencies locally using the aws-mwaa-local-runner on GitHub.

  2. A workaround is to override the extension by adding a line in the DAG to set <operator name>.operator_extra_links = None after importing the problem operators. For example:

    from airflow.contrib.operators.bigquery_operator import BigQueryOperator BigQueryOperator.operator_extra_links = None
  3. You can use this approach for all DAGs by adding the above to a plugin. For an example, see Creating a custom plugin for Apache Airflow PythonVirtualenvOperator.

Connections

The following topic describes the errors you may receive when using an Apache Airflow connection, or using another Amazon database.

I can't connect to Snowflake

We recommend the following steps:

  1. Test your DAGs, custom plugins, and Python dependencies locally using the aws-mwaa-local-runner on GitHub.

  2. Add the following entries to the requirements.txt for your environment.

    asn1crypto == 0.24.0 snowflake-connector-python == 1.7.2
  3. Add the following imports to your DAG:

    from airflow.contrib.hooks.snowflake_hook import SnowflakeHook from airflow.contrib.operators.snowflake_operator import SnowflakeOperator

Ensure the Apache Airflow connection object includes the following key-value pairs:

  1. Conn Id: snowflake_conn

  2. Conn Type: Snowflake

  3. Host: <my account>.<my region if not us-west-2>.snowflakecomputing.com

  4. Schema: <my schema>

  5. Login: <my user name>

  6. Password: ********

  7. Port: <port, if any>

  8. Extra:

    { "account": "<my account>", "warehouse": "<my warehouse>", "database": "<my database>", "region": "<my region if not using us-west-2 otherwise omit this line>" }

For example:

>>> import json >>> from airflow.models.connection import Connection >>> myconn = Connection( ... conn_id='snowflake_conn', ... conn_type='Snowflake', ... host='YOUR_ACCOUNT.YOUR_REGION.snowflakecomputing.com', ... schema='YOUR_SCHEMA' ... login='YOUR_USERNAME', ... password='YOUR_PASSWORD', ... port='YOUR_PORT' ... extra=json.dumps(dict(account='YOUR_ACCOUNT', warehouse='YOUR_WAREHOUSE', database='YOUR_DB_OPTION', region='YOUR_REGION')), ... )

I can't connect to Secrets Manager

We recommend the following steps:

  1. Learn how to create secret keys for your Apache Airflow connection and variables in Configuring an Apache Airflow connection using a Amazon Secrets Manager secret.

  2. Learn how to use the secret key for an Apache Airflow variable (test-variable) in Using a secret key in Amazon Secrets Manager for an Apache Airflow variable.

  3. Learn how to use the secret key for an Apache Airflow connection (myconn) in Using a secret key in Amazon Secrets Manager for an Apache Airflow connection.

I can't connect to my MySQL server on '<DB-identifier-name>.cluster-id.<region>.rds.amazonaws.com'

Amazon MWAA's security group and the RDS security group need an ingress rule to allow traffic to and from one another. We recommend the following steps:

  1. Modify the RDS security group to allow all traffic from Amazon MWAA's VPC security group.

  2. Modify Amazon MWAA's VPC security group to allow all traffic from the RDS security group.

  3. Rerun your tasks again and verify whether the SQL query succeeded by checking Apache Airflow logs in CloudWatch Logs.

Web server

The following topic describes the errors you may receive for your Apache Airflow Web server on Amazon MWAA.

I'm using the BigQueryOperator and it's causing my web server to crash

We recommend the following steps:

  1. Apache Airflow operators such as the BigQueryOperator and QuboleOperator that contain operator_extra_links could cause your Apache Airflow web server to crash. These operators attempt to load code to your web server, which is not permitted for security reasons. We recommend patching the operators in your DAG by adding the following code after your import statements:

    BigQueryOperator.operator_extra_links = None
  2. Test your DAGs, custom plugins, and Python dependencies locally using the aws-mwaa-local-runner on GitHub.

I see a 5xx error accessing the web server

We recommend the following steps:

  1. Check Apache Airflow configuration options. Verify that the key-value pairs you specified as an Apache Airflow configuration option, such as Amazon Secrets Manager, were configured correctly. To learn more, see I can't connect to Secrets Manager.

  2. Check the requirements.txt. Verify the Airflow "extras" package and other libraries listed in your requirements.txt are compatible with your Apache Airflow version.

  3. Explore ways to specify Python dependencies in a requirements.txt file, see Managing Python dependencies in requirements.txt.

I see a 'The scheduler does not appear to be running' error

If the scheduler doesn't appear to be running, or the last "heart beat" was received several hours ago, your DAGs may not appear in Apache Airflow, and new tasks will not be scheduled.

We recommend the following steps:

  1. Confirm that your VPC security group allows inbound access to port 5432. This port is needed to connect to the Amazon Aurora PostgreSQL metadata database for your environment. After this rule is added, give Amazon MWAA a few minutes, and the error should disappear. To learn more, see Security in your VPC on Amazon MWAA.

    Note
    • The Aurora PostgreSQL metadatabase is part of the Amazon MWAA service architecture and is not visible in your Amazon Web Services account.

    • Database-related errors are usually a symptom of scheduler failure and not the root cause.

  2. If the scheduler is not running, it might be due to a number of factors such as dependency installation failures, or an overloaded scheduler. Confirm that your DAGs, plugins, and requirements are working correctly by viewing the corresponding log groups in CloudWatch Logs. To learn more, see Monitoring and metrics for Amazon Managed Workflows for Apache Airflow.

Tasks

The following topic describes the errors you may receive for Apache Airflow tasks in an environment.

I see my tasks stuck or not completing

If your Apache Airflow tasks are "stuck" or not completing, we recommend the following steps:

  1. There may be a large number of DAGs defined. Reduce the number of DAGs and perform an update of the environment (such as changing a log level) to force a reset.

    1. Airflow parses DAGs whether they are enabled or not. If you're using greater than 50% of your environment's capacity you may start overwhelming the Apache Airflow Scheduler. This leads to large Total Parse Time in CloudWatch Metrics or long DAG processing times in CloudWatch Logs. There are other ways to optimize Apache Airflow configurations which are outside the scope of this guide.

    2. To learn more about the best practices we recommend to tune the performance of your environment, see Performance tuning for Apache Airflow on Amazon MWAA.

  2. There may be a large number of tasks in the queue. This often appears as a large—and growing—number of tasks in the "None" state, or as a large number in Queued Tasks and/or Tasks Pending in CloudWatch. This can occur for the following reasons:

    1. If there are more tasks to run than the environment has the capacity to run, and/or a large number of tasks that were queued before autoscaling has time to detect the tasks and deploy additional Workers.

    2. If there are more tasks to run than an environment has the capacity to run, we recommend reducing the number of tasks that your DAGs run concurrently, and/or increasing the minimum Apache Airflow Workers.

    3. If there are a large number of tasks that were queued before autoscaling has had time to detect and deploy additional workers, we recommend staggering task deployment and/or increasing the minimum Apache Airflow Workers.

    4. You can use the update-environment command in the Amazon Command Line Interface (Amazon CLI) to change the minimum or maximum number of Workers that run on your environment.

      aws mwaa update-environment --name MyEnvironmentName --min-workers 2 --max-workers 10
    5. To learn more about the best practices we recommend to tune the performance of your environment, see Performance tuning for Apache Airflow on Amazon MWAA.

  3. There may be tasks being deleted mid-execution that appear as task logs which stop with no further indication in Apache Airflow. This can occur for the following reasons:

    1. If there is a brief moment where 1) the current tasks exceed current environment capacity, followed by 2) a few minutes of no tasks executing or being queued, then 3) new tasks being queued.

    2. Amazon MWAA autoscaling reacts to the first scenario by adding additional workers. In the second scenario, it removes the additional workers. Some of the tasks being queued may result with the workers in the process of being removed, and will end when the container is deleted.

    3. We recommend increasing the minimum number of workers on your environment. Another option is to adjust the timing of your DAGs and tasks to ensure that that these scenarios don't occur.

    4. You can also set the minimum workers equal to the maximum workers on your environment, effectively disabling autoscaling. Use the update-environment command in the Amazon Command Line Interface (Amazon CLI) to disable autoscaling by setting the minimum and maximum number of workers to be the same.

      aws mwaa update-environment --name MyEnvironmentName --min-workers 5 --max-workers 5
    5. To learn more about the best practices we recommend to tune the performance of your environment, see Performance tuning for Apache Airflow on Amazon MWAA.

  4. If your tasks are stuck in the "running" state, you can also clear the tasks or mark them as succeeded or failed. This allows the autoscaling component for your environment to scale down the number of workers running on your environment. The following image shows an example of a stranded task.

    This is an image with a stranded task.
    1. Choose the circle for the stranded task, and then select Clear (as shown). This allows Amazon MWAA to scale down workers; otherwise, Amazon MWAA can't determine which DAGs are enabled or disabled, and can't scale down, if there are still queued tasks.

      Apache Airflow Actions
  5. Learn more about the Apache Airflow task lifecycle at Concepts in the Apache Airflow reference guide.

CLI

The following topic describes the errors you may receive when running Airflow CLI commands in the Amazon Command Line Interface.

I see a '503' error when triggering a DAG in the CLI

The Airflow CLI runs on the Apache Airflow Web server, which has limited concurrency. Typically a maximum of 4 CLI commands can run simultaneously.