Adding or updating DAGs
Directed Acyclic Graphs (DAGs) are defined within a Python file that defines the DAG's structure as code.
You can use the Amazon CLI, or the Amazon S3 console to upload DAGs to your environment.
This topic describes the steps to add or update Apache Airflow DAGs on your Amazon Managed Workflows for Apache Airflow environment using the dags
folder in your Amazon S3 bucket.
Sections
Prerequisites
You'll need the following before you can complete the steps on this page.
-
Permissions — Your Amazon account must have been granted access by your administrator to the AmazonMWAAFullConsoleAccess access control policy for your environment. In addition, your Amazon MWAA environment must be permitted by your execution role to access the Amazon resources used by your environment.
-
Access — If you require access to public repositories to install dependencies directly on the web server, your environment must be configured with public network web server access. For more information, see Apache Airflow access modes.
-
Amazon S3 configuration — The Amazon S3 bucket used to store your DAGs, custom plugins in
plugins.zip
, and Python dependencies inrequirements.txt
must be configured with Public Access Blocked and Versioning Enabled.
How it works
A Directed Acyclic Graph (DAG) is defined within a single Python file that defines the DAG's structure as code. It consists of the following:
-
A DAG
definition. -
Operators
that describe how to run the DAG and the tasks to run. -
Operator relationships
that describe the order in which to run the tasks.
To run an Apache Airflow platform on an Amazon MWAA environment, you need to copy your DAG definition to the dags
folder in your storage bucket. For example, the DAG folder in your storage bucket may look like this:
Example DAG folder
dags/ └ dag_def.py
Amazon MWAA automatically syncs new and changed objects from your Amazon S3 bucket to Amazon MWAA scheduler and worker containers’
/usr/local/airflow/dags
folder every 30 seconds, preserving the Amazon S3 source’s file hierarchy, regardless of file type. The time that
new DAGs take to appear in your Apache Airflow UI is controlled by scheduler.dag_dir_list_interval
.
Changes to existing DAGs will be picked up on the next DAG processing loop.
Note
You do not need to include the airflow.cfg
configuration file in your DAG folder. You can override the default Apache Airflow configurations from the Amazon MWAA console. For more information, see Using Apache Airflow configuration options on Amazon MWAA.
What's changed in v2
-
New: Operators, Hooks, and Executors. The import statements in your DAGs, and the custom plugins you specify in a
plugins.zip
on Amazon MWAA have changed between Apache Airflow v1 and Apache Airflow v2. For example,from airflow.contrib.hooks.aws_hook import AwsHook
in Apache Airflow v1 has changed tofrom airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
in Apache Airflow v2. To learn more, see Python API Referencein the Apache Airflow reference guide.
Testing DAGs using the Amazon MWAA CLI utility
-
The command line interface (CLI) utility replicates an Amazon Managed Workflows for Apache Airflow environment locally.
-
The CLI builds a Docker container image locally that’s similar to an Amazon MWAA production image. This allows you to run a local Apache Airflow environment to develop and test DAGs, custom plugins, and dependencies before deploying to Amazon MWAA.
-
To run the CLI, see the aws-mwaa-local-runner
on GitHub.
Uploading DAG code to Amazon S3
You can use the Amazon S3 console or the Amazon Command Line Interface (Amazon CLI) to upload DAG code to your Amazon S3 bucket. The following steps assume you are uploading code (.py
) to a folder named dags
in your Amazon S3 bucket.
Using the Amazon CLI
The Amazon Command Line Interface (Amazon CLI) is an open source tool that enables you to interact with Amazon services using commands in your command-line shell. To complete the steps on this page, you need the following:
To upload using the Amazon CLI
-
Use the following command to list all of your Amazon S3 buckets.
aws s3 ls
-
Use the following command to list the files and folders in the Amazon S3 bucket for your environment.
aws s3 ls s3://
YOUR_S3_BUCKET_NAME
-
The following command uploads a
dag_def.py
file to adags
folder.aws s3 cp dag_def.py s3://
YOUR_S3_BUCKET_NAME
/dags/If a folder named
dags
does not already exist on your Amazon S3 bucket, this command creates thedags
folder and uploads the file nameddag_def.py
to the new folder.
Using the Amazon S3 console
The Amazon S3 console is a web-based user interface that allows you to create and manage the resources in your Amazon S3 bucket. The following steps assume you have a DAGs folder named dags
.
To upload using the Amazon S3 console
-
Open the Environments page
on the Amazon MWAA console. -
Choose an environment.
-
Select the S3 bucket link in the DAG code in S3 pane to open your storage bucket on the Amazon S3 console.
-
Choose the
dags
folder. -
Choose Upload.
-
Choose Add file.
-
Select the local copy of your
dag_def.py
, choose Upload.
Specifying the path to your DAGs folder on the Amazon MWAA console (the first time)
The following steps assume you are specifying the path to a folder on your Amazon S3 bucket named dags
.
-
Open the Environments page
on the Amazon MWAA console. -
Choose the environment where you want to run DAGs.
-
Choose Edit.
-
On the DAG code in Amazon S3 pane, choose Browse S3 next to the DAG folder field.
-
Select your
dags
folder. -
Choose Choose.
-
Choose Next, Update environment.
Viewing changes on your Apache Airflow UI
Logging into Apache Airflow
You need Apache Airflow UI access policy: AmazonMWAAWebServerAccess permissions for your Amazon account in Amazon Identity and Access Management (IAM) to view your Apache Airflow UI.
To access your Apache Airflow UI
-
Open the Environments page
on the Amazon MWAA console. -
Choose an environment.
-
Choose Open Airflow UI.
What's next?
-
Test your DAGs, custom plugins, and Python dependencies locally using the aws-mwaa-local-runner
on GitHub.