Import a Amazon Glue DataBrew recipe in Amazon Glue Studio - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Import a Amazon Glue DataBrew recipe in Amazon Glue Studio

In Amazon Glue DataBrew, a recipe is a set of data transformation steps. Amazon Glue DataBrew recipes prescribes how to transform data that have already been read and doesn't describe where and how to read data, as well as how and where to write data. This is configured in Source and Target nodes in Amazon Glue Studio. For more information on recipes, see Creating and using Amazon Glue DataBrew recipes .

To use Amazon Glue DataBrew recipes in Amazon Glue Studio, begin with creating recipes in Amazon Glue DataBrew. If you have recipes you want to use, you can skip this step.

IAM permissions for Amazon Glue DataBrew

This topic provides information to help you understand the actions and resources that you an IAM administrator can use in an Amazon Identity and Access Management (IAM) policy for the Data Preparation Recipe transform.

For additional information about security in Amazon Glue, see Access Management.

Note

The following table lists the permissions that a user needs if importing an existing Amazon Glue DataBrew recipe.

Data Preparation Recipe transform actions
Action Description
databrew:ListRecipes Grants permission to retrieve Amazon Glue DataBrew recipes.
databrew:ListRecipeVersions Grants permission to retrieve Amazon Glue DataBrew recipe versions.
databrew:DescribeRecipe Grants permission to retrieve Amazon Glue DataBrew recipe description.

The role you’re using for accessing this functionality should have a policy that allows several Amazon Glue DataBrew You can achieve this by either using AWSGlueConsoleFullAccess policy that includes necessary actions or add the following inline policy to your role:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "databrew:ListRecipes", "databrew:ListRecipeVersions", "databrew:DescribeRecipe" ], "Resource": [ "*" ] } ] }

To use the Data Preparation Recipe transform, you must add the IAM:PassRole action to the permissions policy.

Additional required permissions
Action Description
iam:PassRole Grants permission for IAM to allow the user to pass the approved roles.

Without these permissions the following error occurs:

"errorCode": "AccessDenied" "errorMessage": "User: arn:aws:sts::account_id:assumed-role/AWSGlueServiceRole is not authorized to perform: iam:PassRole on resource: arn:aws:iam::account_id:role/service-role/AWSGlueServiceRole because no identity-based policy allows the iam:PassRole action"

To create a Amazon Glue DataBrew recipe in Amazon Glue DataBrew:
  1. Author a recipe in Amazon Glue DataBrew. For more information, see Getting started with Amazon Glue DataBrew.

  2. Save your recipe.

  3. Publish your recipe. This will publish your recipe as version 1.0.

To import an Amazon Glue DataBrew recipe and use in Amazon Glue Studio:

If you have an existing Data Preparation Recipe node and you want to edit the recipe steps directly in Amazon Glue Studio, you will have to import the recipe steps into your Amazon Glue Studio job.

  1. Start a Amazon Glue job in Amazon Glue Studio with a datasource.

  2. Add the Data Preparation Recipe node to your datasource.

  3. In the Transform panel, enter a name for your recipe.

  4. Select your recipe from the recipe drop-down and select your published recipe version. Then choose Import steps.

    The screenshot shows a modal to choose a recipe and the recipe version. Then, choose Import Steps.
  5. Once you import your Amazon Glue DataBrew recipe, you can edit this recipe directly in Amazon Glue Studio. Choose Ok to close the message.

    The screenshot shows a message after you import a recipe that you can edit the recipe in Amazon Glue Studio and the DataBrew recipe will no longer be referenced.
  6. After this, the steps will be imported as part of your Amazon Glue job. Make necessary configuration changes in the Job details tab, like naming your job and adjusting allocated capacity as needed. Choose Save to save your job and recipe.

    Note

    JOIN, UNION, GROUP_BY, PIVOT, UNPIVOT, TRANSPOSE are not supported for recipe import, nor will they be available in recipe authoring mode.

  7. Optionally, you can finish authoring the job by adding other transformations nodes as needed and add Data target node(s).

    If you reorder steps after you import a recipe, Amazon Glue performs validation on those steps. For example, if you renamed and then deleted a column, and you moved the delete step on top, then the rename step would be invalid. You can then edit the steps to fix the validation error.

To change schema if the data source is Amazon S3 and the data format is CSV:

If all the columns in a CSV file are initially loaded as string data type in Amazon Glue Studio, you need to ensure that the column data type is compatible with the rest of the steps in the Amazon Glue DataBrew recipe.

Amazon Glue DataBrew recipes only prescribes how to transform data that have already been read. It doesn't describe where and how to read data.

  1. Add a Change Schema node before the Multi-step recipe node.

  2. Click the Change Schema node and change the schema to be the same as the column data types in Amazon Glue DataBrew by selecting the new data type in the Transform for columns as needed.

    The screenshot shows a Change Schema transform with data type for a column highlighted with a red rectangle.

To change schema if the data source is headerless:

Amazon Glue DataBrew recipes only prescribes how to transform data that have already been read. It doesn't describe where and how to read data.

When loading header-less datasets in Amazon Glue Studio, the default header names are different than what are loaded in Amazon Glue DataBrew.

  1. In the ETL job, add a Change Schema node before the Data Preparation Recipe node.

  2. Choose the Change Schema node and change the column names to the same names used in the Amazon Glue DataBrew recipe.