ONE_HOT_ENCODING - Amazon Glue DataBrew
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

ONE_HOT_ENCODING

Creates n numerical columns, where n is the number of unique values in a selected categorical variable.

For example, consider a column named shirt_size. Shirts are available in small, medium, large, or extra large. The column data might look like the following.

shirt_size ----------- L XL M S M M S XL M L XL M

In this scenario, there are four distinct values for shirt_size. Therefore, ONE_HOT_ENCODING generates four new columns. Each new column is named shirt_size_x, where x represents a distinct shirt_size value.

The results of shirt_size and the four generated columns look like this.

shirt_size shirt_size_S shirt_size_M shirt_size_L shirt_size_XL ------------ ------------ ------------ ------------ ------------- L 0 0 1 0 XL 0 0 0 1 M 0 1 0 0 S 1 0 0 0 M 0 1 0 0 M 0 1 0 0 S 1 0 0 0 XL 0 0 0 1 M 0 1 0 0 L 0 0 1 0 XL 0 0 0 1 M 0 1 0 0

The column that you specify for ONE_HOT_ENCODING can have a maximum of ten (10) distinct values.

Parameters
  • sourceColumn – The name of an existing column. The column can have a maximum of 10 distinct values.

Example

{ "RecipeAction": { "Operation": "ONE_HOT_ENCODING", "Parameters": { "sourceColumn": "shirt_size" } } }