Amazon Glue blueprint classes reference
The libraries for Amazon Glue blueprints define three classes that you use in your workflow
layout script: Job
, Crawler
, and Workflow
.
Job class
The Job
class represents an Amazon Glue ETL job.
Mandatory constructor arguments
The following are mandatory constructor arguments for the Job
class.
Argument name | Type | Description |
---|---|---|
Name |
str |
Name to assign to the job. Amazon Glue adds a randomly generated suffix to the name to distinguish the job from those created by other blueprint runs. |
Role |
str |
Amazon Resource Name (ARN) of the role that the job should assume while executing. |
Command |
dict |
Job command, as specified in the JobCommand structure in the API documentation. |
Optional constructor arguments
The following are optional constructor arguments for the Job
class.
Argument name | Type | Description |
---|---|---|
DependsOn |
dict |
List of workflow entities that the job depends on. For more information, see Using the DependsOn argument. |
WaitForDependencies |
str |
Indicates whether the job should wait until all entities on which it depends complete before executing or until any completes. For more information, see Using the WaitForDependencies argument. Omit if the job depends on only one entity. |
(Job properties) | - | Any of the job properties listed in Job structure
in the Amazon Glue API documentation (except CreatedOn and
LastModifiedOn ). |
Crawler class
The Crawler
class represents an Amazon Glue crawler.
Mandatory constructor arguments
The following are mandatory constructor arguments for the Crawler
class.
Argument name | Type | Description |
---|---|---|
Name |
str |
Name to assign to the crawler. Amazon Glue adds a randomly generated suffix to the name to distinguish the crawler from those created by other blueprint runs. |
Role |
str |
ARN of the role that the crawler should assume while running. |
Targets |
dict |
Collection of targets to crawl. Targets class constructor
arguments are defined in the CrawlerTargets structure in the API documentation.
All Targets constructor arguments are optional, but you must pass at
least one. |
Optional constructor arguments
The following are optional constructor arguments for the Crawler
class.
Argument name | Type | Description |
---|---|---|
DependsOn |
dict |
List of workflow entities that the crawler depends on. For more information, see Using the DependsOn argument. |
WaitForDependencies |
str |
Indicates whether the crawler should wait until all entities on which it depends complete before running or until any completes. For more information, see Using the WaitForDependencies argument. Omit if the crawler depends on only one entity. |
(Crawler properties) | - | Any of the crawler properties listed in Crawler structure in the Amazon Glue API documentation,
with the following exceptions:
|
Workflow class
The Workflow
class represents an Amazon Glue workflow. The workflow layout script
returns a Workflow
object. Amazon Glue creates a workflow based on this
object.
Mandatory constructor arguments
The following are mandatory constructor arguments for the Workflow
class.
Argument name | Type | Description |
---|---|---|
Name |
str |
Name to assign to the workflow. |
Entities |
Entities |
A collection of entities (jobs and crawlers) to include in the workflow. The
Entities class constructor accepts a Jobs argument,
which is a list of Job objects, and a Crawlers argument,
which is a list of Crawler objects. |
Optional constructor arguments
The following are optional constructor arguments for the Workflow
class.
Argument name | Type | Description |
---|---|---|
Description |
str |
See Workflow structure. |
DefaultRunProperties |
dict |
See Workflow structure. |
OnSchedule |
str |
A cron expression. |
Class methods
All three classes include the following methods.
- validate()
-
Validates the properties of the object and if errors are found, outputs a message and exits. Generates no output if there are no errors. For the
Workflow
class, calls itself on every entity in the workflow. - to_json()
-
Serializes the object to JSON. Also calls
validate()
. For theWorkflow
class, the JSON object includes job and crawler lists, and a list of triggers generated by the job and crawler dependency specifications.