Build visual ETL jobs with Amazon Glue Studio Build visual ETL flows with Amazon SageMaker

Building visual ETL jobs

Build visual ETL jobs with Amazon Glue Studio

Amazon Glue Studio provides a visual interface for creating, running, and monitoring Extract/Transform/Load (ETL) jobs in Amazon Glue. A job in Amazon Glue consists of the business logic that performs extract, transform, and load (ETL) work. With Amazon Glue Studio, you can visually compose data transformation workflows and seamlessly run them on Amazon Glue's Apache Spark-based serverless ETL engine. You can create jobs that move and transform data between various data stores and streams using a drag-and-drop interface without having to learn Spark or write code.

An Amazon Glue job encapsulates a script that connects to your source data, processes it, and then writes it out to your data target. Typically, a job runs extract, transform, and load (ETL) scripts. Jobs can run scripts designed for Apache Spark and Ray runtime environments. Jobs can also run general-purpose Python scripts (Python shell jobs.) Amazon Glue triggers can start jobs based on a schedule or event, or on demand. You can monitor job runs to understand runtime metrics such as completion status, duration, and start time.

You can use scripts that Amazon Glue generates or you can provide your own. With a source schema and target location or schema, the Amazon Glue Studio code generator can automatically create an Apache Spark API (PySpark) script. You can use this script as a starting point and edit it to meet your goals.

Amazon Glue can write output files in several data formats. Each job type may support different output formats. For some data formats, common compression formats can be written.

Managing Amazon Glue Jobs in the Amazon Console

To view existing jobs, sign in to the Amazon Web Services Management Console and open the Amazon Glue console at https://console.amazonaws.cn/glue/. Then choose the Jobs tab in Amazon Glue. The Jobs list displays the location of the script that is associated with each job, when the job was last modified, and the current job bookmark option.

You can create jobs in the ETL section of the Amazon Glue console. While creating a new job, or after you have saved your job, you can use can Amazon Glue Studio to modify your ETL jobs. You can do this by editing the nodes in the visual editor or by editing the job script in developer mode. You can also add and remove nodes in the visual editor to create more complicated ETL jobs.

Next steps for creating a job in Amazon Glue Studio

You use the visual job editor to configure nodes for your job. Each node represents an action, such as reading data from the source location or applying a transform to the data. Each node you add to your job has properties that provide information about either the data location or the transform.

The next steps for creating and managing your jobs are:

Build visual ETL flows with Amazon SageMaker

With an Amazon SageMaker Unified Studio workflow, you can set up and run a series of tasks in Amazon SageMaker Unified Studio. Amazon SageMaker Unified Studio workflows use Apache Airflow to model data processing procedures and orchestrate your Amazon SageMaker Unified Studio code artifacts. For more information, see Using workflows in Amazon SageMaker Unified Studio .

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Managing notebooks

Starting visual ETL jobs in Amazon Glue Studio