Using a data preparation recipe in Amazon Glue Studio
The Data preparation recipe transform allows you to author a data preparation recipe from scratch using an interactive grid style authoring interface. It also allows you to import an existing Amazon Glue DataBrew recipe and then edit it in Amazon Glue Studio.
The Data Preparation Recipe node is available from the Resource panel. You can connect the Data Preparation Recipe node to another node in the visual workflow, whether it is a Data source node or another transformation node. After choosing a Amazon Glue DataBrew recipe and version, the applied steps in the recipe are visible in the node properties tab.
Prerequisites
-
If importing an Amazon Glue DataBrew recipe, you have the required IAM permissions as described in Import a Amazon Glue DataBrew recipe in Amazon Glue Studio .
-
A data preview session must be created.
Limitations
-
Amazon Glue DataBrew recipes are only supported in commercial DataBrew regions
. -
Not all Amazon Glue DataBrew recipes are supported by Amazon Glue. Some recipes will not be able to be run in Amazon Glue Studio.
-
Recipes with
UNION
andJOIN
transforms are not supported, however, Amazon Glue Studio already has "Join" and "Union" transform nodes which can be used before or after a Data Preparation Recipe node.
-
-
Data Preparation Recipe nodes are supported for jobs starting with Amazon Glue version 4.0. This version will be auto-selected after a Data Preparation Recipe node is added to the job.
-
Data Preparation Recipe nodes require Python. This is automatically set when the Data Preparation Recipe node is added to the job.
-
Adding a new Data Preparation Recipe node to the visual graph will automatically restart your Data Preview session with the correct libraries to use the Data Preparation Recipe node.
-
The following transforms are not supported for import or editing in a Data Preparation Recipe node:
GROUP_BY
,PIVOT
,UNPIVOT
, andTRANSPOSE
.
Additional features
When you've selected the Data Preparation Recipe transform, you have the ability to take additional actions after choosing Author recipe.
-
Add step – you can add additional steps to a recipe as needed by choosing the add step icon, or use the toolbar in the Preview pane by choosing an action.
-
Import recipe – choose More then Import recipe to use in your Amazon Glue Studio job.
-
Download as YAML – choose More then Download as YAML to download your recipe to save outside of Amazon Glue Studio.
-
Download as JSON – choose More then Download as JSON to download your recipe to save outside of Amazon Glue Studio.
-
Undo and redo recipe steps – You can undo and redo recipe steps in the Preview pane when working with data in the grid.