Troubleshooting: CloudWatch Logs and CloudTrail errors
The topics on this page contains resolutions to Amazon CloudWatch Logs and Amazon CloudTrail errors you may encounter on an Amazon Managed Workflows for Apache Airflow environment.
Contents
- Logs
- I can't see my task logs, or I received a 'Reading remote log from Cloudwatch log_group' error
- Tasks are failing without any logs
- I see a 'ResourceAlreadyExistsException' error in CloudTrail
- I see an 'Invalid request' error in CloudTrail
- I see a 'Cannot locate a 64-bit Oracle Client library: "libclntsh.so: cannot open shared object file: No such file or directory' in Apache Airflow logs
- I see psycopg2 'server closed the connection unexpectedly' in my Scheduler logs
- I see 'Executor reports task instance %s finished (%s) although the task says its %s' in my DAG processing logs
- I see 'Could not read remote logs from log_group: airflow-*{*environmentName}-Task log_stream:* {*DAG_ID}/*{*TASK_ID}/*{*time}/*{*n}.log.' in my task logs
Logs
The following topic describes the errors you may receive when viewing Apache Airflow logs.
I can't see my task logs, or I received a 'Reading remote log from Cloudwatch log_group' error
Amazon MWAA has configured Apache Airflow to read and write logs directly from and to Amazon CloudWatch Logs. If a worker fails to start a task, or fails to write any logs, you will see the error:
*** Reading remote log from Cloudwatch log_group: airflow-
environmentName
-Task log_stream:DAG_ID
/TASK_ID
/timestamp
/n
.log.Could not read remote logs from log_group: airflow-environmentName
-Task log_stream:DAG_ID
/TASK_ID
/time
/n
.log.
-
We recommend the following steps:
-
Verify that you have enabled task logs at the
INFO
level for your environment. For more information, see Viewing Airflow logs in Amazon CloudWatch. -
Verify that the environment execution role has the correct permission policies.
-
Verify that your operator or task is working correctly, has sufficient resources to parse the DAG, and has the appropriate Python libraries to load. To verify your whether you have the correct dependencies, try eliminating imports until you find the one that is causing the issue. We recommend testing your Python dependencies using the Amazon MWAA local-runner tool
.
-
Tasks are failing without any logs
If tasks are failing in a workflow and you can't locate any logs for the failed tasks, check if you are setting the queue
parameter
in your default arguments, as shown in the following.
from airflow import DAG from airflow.operators.bash_operator import BashOperator from airflow.utils.dates import days_ago # Setting queue argument to default. default_args = { "start_date": days_ago(1), "queue": "default" } with DAG(dag_id="any_command_dag", schedule_interval=None, catchup=False, default_args=default_args) as dag: cli_command = BashOperator( task_id="bash_command", bash_command="{{ dag_run.conf['command'] }}" )
To resovle the issue, remove queue
from your code, and invoke the DAG again.
I see a 'ResourceAlreadyExistsException' error in CloudTrail
"errorCode": "ResourceAlreadyExistsException", "errorMessage": "The specified log stream already exists", "requestParameters": { "logGroupName": "airflow-MyAirflowEnvironment-DAGProcessing", "logStreamName": "scheduler_cross-account-eks.py.log" }
Certain Python requirements such as apache-airflow-backport-providers-amazon
roll back the watchtower
library that Amazon MWAA uses to communicate with CloudWatch to an older version. We recommend the following steps:
-
Add the following library to your
requirements.txt
watchtower==1.0.6
I see an 'Invalid request' error in CloudTrail
Invalid request provided: Provided role does not have sufficient permissions for s3 location airflow-xxx-xxx/dags
If you're creating an Amazon MWAA environment and an Amazon S3 bucket using the same Amazon CloudFormation template, you need to add a DependsOn
section within your Amazon CloudFormation template. The two resources (MWAA Environment and MWAA Execution Policy) have a dependency in Amazon CloudFormation. We recommend the following steps:
-
Add the following
DependsOn
statement to your Amazon CloudFormation template.... MaxWorkers: 5 NetworkConfiguration: SecurityGroupIds: - !GetAtt SecurityGroup.GroupId SubnetIds: !Ref subnetIds WebserverAccessMode: PUBLIC_ONLY
DependsOn: MwaaExecutionPolicy
MwaaExecutionPolicy: Type: AWS::IAM::ManagedPolicy Properties: Roles: - !Ref MwaaExecutionRole PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Action: airflow:PublishMetrics Resource: ...For an example, see Quick start tutorial for Amazon Managed Workflows for Apache Airflow.
I see a 'Cannot locate a 64-bit Oracle Client library: "libclntsh.so: cannot open shared object file: No such file or directory' in Apache Airflow logs
-
We recommend the following steps:
-
If you're using Apache Airflow v2, add
core.lazy_load_plugins : False
as an Apache Airflow configuration option. To learn more, see Using configuration options to load plugins in 2.
-
I see psycopg2 'server closed the connection unexpectedly' in my Scheduler logs
If you see an error similar to the following, your Apache Airflow Scheduler may have run out of resources.
2021-06-14T10:20:24.581-05:00 sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly 2021-06-14T10:20:24.633-05:00 This probably means the server terminated abnormally 2021-06-14T10:20:24.686-05:00 before or while processing the request.
We recommend the following steps:
-
Consider upgrading to Apache Airflow v2.0.2, which allows you to specify up to 5 Schedulers.
I see 'Executor reports task instance %s finished (%s) although the task says its %s' in my DAG processing logs
If you see an error similar to the following, your long-running tasks may have reached the task time limit on Amazon MWAA. Amazon MWAA has a limit of 12 hours for any one Airflow task, to prevent tasks from getting stuck in the queue and blocking activities like autoscaling.
Executor reports task instance %s finished (%s) although the task says its %s. (Info: %s) Was the task killed externally
We recommend the following steps:
-
Consider breaking up the task into multiple, shorter running tasks. Airflow typically has a model whereby operators are asynchronous. It invokes activities on external systems, and Apache Airflow Sensors poll to see when its complete. If a Sensor fails, it can be safely retried without impacting the Operator's functionality.
I see 'Could not read remote logs from log_group: airflow-*{*environmentName}-Task log_stream:* {*DAG_ID}/*{*TASK_ID}/*{*time}/*{*n}.log.' in my task logs
If you see an error similar to the following, the execution role for your environment may not contain a permissions policy to create log streams for task logs.
Could not read remote logs from log_group: airflow-*{*environmentName}-Task log_stream:* {*DAG_ID}/*{*TASK_ID}/*{*time}/*{*n}.log.
We recommend the following steps:
-
Modify the execution role for your environment using one of the sample policies at Amazon MWAA execution role.
You may have also specified a provider package in your requirements.txt
file that is incompatible with your Apache Airflow version. For example, if you're using Apache Airflow v2.0.2, you may have specified a package, such as the apache-airflow-providers-databricks
We recommend the following steps:
-
If you're using Apache Airflow v2.0.2, modify the
requirements.txt
file and addapache-airflow[databricks]
. This installs the correct version of the Databricks package that is compatible with Apache Airflow v2.0.2. -
Test your DAGs, custom plugins, and Python dependencies locally using the aws-mwaa-local-runner
on GitHub.