Troubleshooting blueprint errors in Amazon Glue
If you encounter errors when using Amazon Glue blueprints, use the following solutions to help you find the source of the problems and fix them.
Topics
- Error: missing PySpark module
- Error: missing blueprint config file
- Error: missing imported file
- Error: not authorized to perform iamPassRole on resource
- Error: invalid cron schedule
- Error: a trigger with the same name already exists
- Error: workflow with name: foo already exists.
- Error: module not found in specified layoutGenerator path
- Error: validation error in Connections field
Error: missing PySpark module
Amazon Glue returns the error "Unknown error executing layout generator function ModuleNotFoundError: No module named 'pyspark'".
When you unzip the blueprint archive it could be like either of the following:
$ unzip compaction.zip Archive: compaction.zip creating: compaction/ inflating: compaction/blueprint.cfg inflating: compaction/layout.py inflating: compaction/README.md inflating: compaction/compaction.py $ unzip compaction.zip Archive: compaction.zip inflating: blueprint.cfg inflating: compaction.py inflating: layout.py inflating: README.md
In the first case, all the files related to the blueprint were placed under a folder named compaction and it was then converted into a zip file named compaction.zip.
In the second case, all the files required for the blueprint were not included into a folder and were added as root files under the zip file compaction.zip.
Creating a file in either of the above formats is allowed. However make sure that blueprint.cfg
has the correct path to the name of the function in the script that generates the layout.
Examples
In case 1: blueprint.cfg
should have layoutGenerator
as the following:
layoutGenerator": "compaction.layout.generate_layout"
In case 2: blueprint.cfg
should have layoutGenerator
as the following
layoutGenerator": "layout.generate_layout"
If this path is not included correctly, you could see an error as indicated. For example, if you have the folder structure as mentioned in case 2 and you have the layoutGenerator
indicated as in case 1, you can see the above error.
Error: missing blueprint config file
Amazon Glue returns the error "Unknown error executing layout generator function FileNotFoundError: [Errno 2] No such file or directory: '/tmp/compaction/blueprint.cfg'".
The blueprint.cfg should be placed at the root level of the ZIP archive or within a folder which has the same name as the ZIP archive.
When we extract the blueprint ZIP archive, blueprint.cfg is expected to be found in one of the following paths. If it is not found in one of the following paths, you can see the above error.
$ unzip compaction.zip Archive: compaction.zip creating: compaction/ inflating: compaction/blueprint.cfg $ unzip compaction.zip Archive: compaction.zip inflating: blueprint.cfg
Error: missing imported file
Amazon Glue returns the error "Unknown error executing layout generator function FileNotFoundError: [Errno 2] No such file or directory:* *'demo-project/foo.py'".
If your layout generation script has functionality to read other files, make sure you give a full path for the file to be imported. For example, the Conversion.py script may be referenced in Layout.py. For more information, see Sample blueprint Project.
Error: not authorized to perform iamPassRole on resource
Amazon Glue returns the error "User: arn:aws:sts::123456789012:assumed-role/AWSGlueServiceRole/GlueSession is not authorized to perform: iam:PassRole on resource: arn:aws:iam::123456789012:role/AWSGlueServiceRole"
If the jobs and crawlers in the workflow assume the same role as the role passed to create workflow from the blueprint, then the blueprint role needs to include the iam:PassRole
permission on itself.
If the jobs and crawlers in the workflow assume a role other than the role passed to create the entities of the workflow from the blueprint, then the blueprint role needs to include the iam:PassRole
permission on that other role instead of on the blueprint role.
For more information, see Permissions for blueprint Roles.
Error: invalid cron schedule
Amazon Glue returns the error "The schedule cron(0 0 * * * *) is invalid."
Provide a valid cron
Error: a trigger with the same name already exists
Amazon Glue returns the error "Trigger with name 'foo_starting_trigger' already submitted with different configuration".
A blueprint does not require you to define triggers in the layout script for workflow creation. Trigger creation is managed by the blueprint library based on the dependencies defined between two actions.
The naming for the triggers is as follows:
For the starting trigger in the workflow the naming is <workflow_name>_starting_trigger.
For a node(job/crawler) in the workflow that depends on the completion of either one or multiple upstream nodes; Amazon Glue defines a trigger with the name <workflow_name>_<node_name>_trigger
This error means a trigger with same name already exists. You can delete the existing trigger and re-run the workflow creation.
Note
Deleting a workflow doesn’t delete the nodes within the workflow. It is possible that though the workflow is deleted, triggers are left behind. Due to this, you may not receive a 'workflow already exists' error, but you may receive a 'trigger already exists' error in a case where you create a workflow, delete it and then try to re-create it with the same name from same blueprint.
Error: workflow with name: foo already exists.
The workflow name should be unique. Please try with a different name.
Error: module not found in specified layoutGenerator path
Amazon Glue returns the error "Unknown error executing layout generator function ModuleNotFoundError: No module named 'crawl_s3_locations'".
layoutGenerator": "crawl_s3_locations.layout.generate_layout"
For example, if you have the above layoutGenerator path, then when you unzip the blueprint archive, it needs to look like the following:
$ unzip crawl_s3_locations.zip Archive: crawl_s3_locations.zip creating: crawl_s3_locations/ inflating: crawl_s3_locations/blueprint.cfg inflating: crawl_s3_locations/layout.py inflating: crawl_s3_locations/README.md
When you unzip the archive, if the blueprint archive looks like the following, then you can get the above error.
$ unzip crawl_s3_locations.zip Archive: crawl_s3_locations.zip inflating: blueprint.cfg inflating: layout.py inflating: README.md
You can see that there is no folder named crawl_s3_locations
and when the layoutGenerator
path refers to the layout file via the module crawl_s3_locations
, you can get the above error.
Error: validation error in Connections field
Amazon Glue returns the error "Unknown error executing layout generator function TypeError: Value ['foo'] for key Connections should be of type <class 'dict'>!".
This is a validation error. The Connections
field in the Job
class is expecting a dictionary and instead a list of values are provided causing the error.
User input was list of values Connections= ['string'] Should be a dict like the following Connections*=*{'Connections': ['string']}
To avoid these run time errors while creating a workflow from a blueprint, you can validate the workflow, job and crawler definitions as outlined in Testing a blueprint.
Refer to the syntax in Amazon Glue blueprint Classes Reference for defining the Amazon Glue job, crawler and workflow in the layout script.