Working with jobs on the Amazon Glue console - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Working with jobs on the Amazon Glue console

A job in Amazon Glue consists of the business logic that performs extract, transform, and load (ETL) work. You can create jobs in the ETL section of the Amazon Glue console.

To view existing jobs, sign in to the Amazon Web Services Management Console and open the Amazon Glue console at https://console.amazonaws.cn/glue/. Then choose the Jobs tab in Amazon Glue. The Jobs list displays the location of the script that is associated with each job, when the job was last modified, and the current job bookmark option.

From the Jobs list, you can do the following:

  • To start an existing job, choose Action, and then choose Run job.

  • To stop a Running or Starting job, choose Action, and then choose Stop job run.

  • To add triggers that start a job, choose Action, Choose job triggers.

  • To modify an existing job, choose Action, and then choose Edit job or Delete.

  • To change a script that is associated with a job, choose Action, Edit script.

  • To reset the state information that Amazon Glue stores about your job, choose Action, Reset job bookmark.

  • To create a development endpoint with the properties of this job, choose Action, Create development endpoint.

To add a new job using the console
  1. Open the Amazon Glue console, and choose the Jobs tab.

  2. Choose Add job, and follow the instructions in the Add job wizard.

    If you decide to have Amazon Glue generate a script for your job, you must specify the job properties, data sources, and data targets, and verify the schema mapping of source columns to target columns. The generated script is a starting point for you to add code to perform your ETL work. Verify the code in the script and modify it to meet your business needs.

    Note

    To get step-by-step guidance for adding a job with a generated script, see the Add job tutorial in the console.

    Optionally, you can add a security configuration to a job to specify at-rest encryption options.

    If you provide or author the script, your job defines the sources, targets, and transforms. But you must specify any connections that are required by the script in the job. For information about creating your own script, see Providing your own custom scripts.

Note

The job assumes the permissions of the IAM role that you specify when you create it. This IAM role must have permission to extract data from your data source and write to your target. The Amazon Glue console only lists IAM roles that have attached a trust policy for the Amazon Glue principal service. For more information about providing roles for Amazon Glue, see Identity-based policies for Amazon Glue.

If the job reads Amazon KMS encrypted Amazon Simple Storage Service (Amazon S3) data, then the IAM role must have decrypt permission on the KMS key. For more information, see Step 2: Create an IAM role for Amazon Glue.

Important

Check Troubleshooting errors in Amazon Glue for Spark for known problems when a job runs.

To learn about the properties that are required for each job, see Defining job properties for Spark jobs.

To get step-by-step guidance for adding a job with a generated script, see the Add job tutorial in the Amazon Glue console.

Viewing job details

To see details of a job, select the job in the Jobs list and review the information on the following tabs:

  • History

  • Details

  • Script

  • Metrics

History

The History tab shows your job run history and how successful a job has been in the past. For each job, the run metrics include the following:

  • Run ID is an identifier created by Amazon Glue for each run of this job.

  • Retry attempt shows the number of attempts for jobs that required Amazon Glue to automatically retry.

  • Run status shows the success of each run listed with the most recent run at the top. If a job is Running or Starting, you can choose the action icon in this column to stop it.

  • Error shows the details of an error message if the run was not successful.

  • Logs links to the logs written to stdout for this job run.

    The Logs link takes you to Amazon CloudWatch Logs, where you can see all the details about the tables that were created in the Amazon Glue Data Catalog and any errors that were encountered. You can manage your log retention period in the CloudWatch console. The default log retention is Never Expire. For more information about how to change the retention period, see Change Log Data Retention in CloudWatch Logs in the Amazon CloudWatch Logs User Guide.

  • Error logs links to the logs written to stderr for this job run.

    This link takes you to CloudWatch Logs, where you can see details about any errors that were encountered. You can manage your log retention period on the CloudWatch console. The default log retention is Never Expire. For more information about how to change the retention period, see Change log data retention in CloudWatch logs in the Amazon CloudWatch Logs User Guide.

  • Execution time shows the length of time during which the job run consumed resources. The amount is calculated from when the job run starts consuming resources until it finishes.

  • Timeout shows the maximum execution time during which this job run can consume resources before it stops and goes into timeout status.

  • Delay shows the threshold before sending a job delay notification. When a job run execution time reaches this threshold, Amazon Glue sends a notification ("Glue Job Run Status") to CloudWatch Events.

  • Triggered by shows the trigger that fired to start this job run.

  • Start time shows the date and time (local time) that the job started.

  • End time shows the date and time (local time) that the job ended.

For a specific job run, you can View run metrics, which displays graphs of metrics for the selected job run. For more information about how to turn on metrics and interpret the graphs, see Job monitoring and debugging.

Details

The Details tab includes attributes of your job. It shows you the details about the job definition and also lists the triggers that can start this job. Each time one of the triggers in the list fires, the job is started. For the list of triggers, the details include the following:

  • Trigger name shows the names of triggers that start this job when fired.

  • Trigger type lists the type of trigger that starts this job.

  • Trigger status displays whether the trigger is created, activated, or deactivated.

  • Trigger parameters shows parameters that define when the trigger fires.

  • Jobs to trigger shows the list of jobs that start when this trigger fires.

Note

The Details tab does not include source and target information. Review the script to see the source and target details.

Script

The Script tab shows the script that runs when your job is started. You can invoke an Edit script view from this tab. For more information about the script editor in the Amazon Glue console, see Jobs (legacy). For information about the functions that are called in your script, see Program Amazon Glue ETL scripts in PySpark.

Metrics

The Metrics tab shows metrics collected when a job runs and profiling is turned on. The content available will differ by job type.