Creating ETL jobs with Amazon Glue Studio
You can use the simple visual interface in Amazon Glue Studio to create your ETL jobs. You use the Jobs page to create new jobs. You can also use a script editor or notebook to work directly with code in the Amazon Glue Studio ETL job script.
On the Jobs page, you can see all the jobs that you have created either with Amazon Glue Studio or Amazon Glue. You can view, manage, and run your jobs on this page.
Topics
Start the job creation process
You use the visual editor to create and customize your jobs. When you create a new job, you have the option of starting with an empty canvas, a job with a data source, transform, and data target node, or writing an ETL script.
To create a job in Amazon Glue Studio
Sign in to the Amazon Web Services Management Console and open the Amazon Glue Studio console at https://console.amazonaws.cn/gluestudio/
. -
You can either choose Create and manage jobs from the Amazon Glue Studio landing page, or you can choose Jobs from the navigation pane.
The Jobs page appears.
-
In the Create job section, choose a configuration option for your job.
-
Visual with a blank canvas – To create a job starting with an empty canvas
-
Visual with a source and target – To create a job starting with source node, or with a source, transform and target node
You then choose the data source type. You can also choose the data target type, or you can choose the Choose later option from the Target drop-down list to start with only a data source node in the graph.
-
Spark script editor – For those familiar with programming and writing ETL scripts, choose this option to create a new Spark ETL job. You then have the option of writing Python or Scala code in a script editor window, or uploading an existing script from a local file. If you choose to use the script editor, you can't use the visual job editor to design or edit your job.
A Spark job is run in an Apache Spark environment managed by Amazon Glue. By default, new scripts are coded in Python. To write a new Scala script, see Creating and editing Scala scripts in Amazon Glue Studio.
-
Python Shell script editor – For those familiar with programming and writing ETL scripts, choose this option to create a new Python shell job. You write code in a script editor window starting with a template (boilerplate), or you can upload an existing script from a local file. If you choose to use the Python shell editor, you can't use the visual job editor to design or edit your job.
A Python shell job runs Python scripts as a shell and supports a Python version that depends on the Amazon Glue version you choose for the job. You can use these jobs to schedule and run tasks that don't require an Apache Spark environment.
-
Jupyter Notebook – For those familiar with programming and writing ETL scripts, choose this option to create a new Python or Scala job script using a notebook interface based on Jupyter notebook. You write code in a notebook. If you choose to use the notebook interface to create your job, you can't use the visual job editor to design or edit your job.
You can also use a command line interface to easily configure a notebook for authoring jobs.
-
-
Choose Create to create a job in the editing interface that you selected.
-
If you chose the Jupyter notebook option, the Create job in Jupyter notebook page appears instead of the job editor interface. You must provide additional information before creating a notebook authoring session. For more information about how to specify this information, see Getting started with notebooks in Amazon Glue Studio.
Create jobs that use a connector
After you have added a connector to Amazon Glue Studio and created a connection for that connector, you can create a job that uses the connection for the data source.
For detailed instructions, see Authoring jobs with custom connectors.
Next steps for creating a job in Amazon Glue Studio
You use the visual job editor to configure nodes for your job. Each node represents an action, such as reading data from the source location or applying a transform to the data. Each node you add to your job has properties that provide information about either the data location or the transform.
The next steps for creating and managing your jobs are: