Authoring Jobs in Amazon Glue - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China.

Authoring Jobs in Amazon Glue

A job is the business logic that performs the extract, transform, and load (ETL) work in Amazon Glue. When you start a job, Amazon Glue runs a script that extracts data from sources, transforms the data, and loads it into targets. You can create jobs in the ETL section of the Amazon Glue console. For more information, see Working with Jobs on the Amazon Glue Console.

The following diagram summarizes the basic workflow and steps involved in authoring a job in Amazon Glue:


      Workflow showing how to author a job with Amazon Glue in 6 basic steps.

Workflow Overview

When you author a job, you supply details about data sources, targets, and other information. The result is a generated Apache Spark API (PySpark) script. You can then store your job definition in the Amazon Glue Data Catalog.

The following describes an overall process of authoring jobs in the Amazon Glue console:

  1. You choose a data source for your job. The tables that represent your data source must already be defined in your Data Catalog. If the source requires a connection, the connection is also referenced in your job. If your job requires multiple data sources, you can add them later by editing the script.

  2. You choose a data target of your job. The tables that represent the data target can be defined in your Data Catalog, or your job can create the target tables when it runs. You choose a target location when you author the job. If the target requires a connection, the connection is also referenced in your job. If your job requires multiple data targets, you can add them later by editing the script.

  3. You customize the job-processing environment by providing arguments for your job and generated script. For more information, see Adding Jobs in Amazon Glue.

  4. Initially, Amazon Glue generates a script, but you can also edit this script to add sources, targets, and transforms. For more information about transforms, see Built-In Transforms.

  5. You specify how your job is invoked, either on demand, by a time-based schedule, or by an event. For more information, see Starting Jobs and Crawlers Using Triggers.

  6. Based on your input, Amazon Glue generates a PySpark or Scala script. You can tailor the script based on your business needs. For more information, see Editing Scripts in Amazon Glue.