BUCKETIZATION - Amazon Glue DataBrew
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

BUCKETIZATION

Bucketization (called Binning in the console) takes the items in a column of numeric values, groups them into bins defined by numeric ranges, and outputs a new column that displays the bin for each row. Bucketization can be done using splits or percentage. The first example below uses splits and the second example uses a percentage.

Parameters
  • sourceColumn – The name of an existing column.

    targetColumn – The name of the new column to be created.

    bucketNames – List of bucket names.

    splits – List of bucket levels. Buckets are consecutive, and an upper bound for a bucket will be a lower bound for the next bucket.

    percentage – Each bucket will be described as a percentage.

Example using splits

{ "Action": { "Operation": "BUCKETIZATION", "Parameters": { "sourceColumn": "level", "targetColumn": "bin", "bucketNames": "[\"Bin1\",\"Bin2\",\"Bin3\"]", "splits": "[\"-Infinity\",\"2\",\"20\",\"Infinity\"]" } } }
Example using a percentage
{ "Action": { "Operation": "BUCKETIZATION", "Parameters": { "sourceColumn": "level", "targetColumn": "bin", "bucketNames": "[\"Bin1\",\"Bin2\"]", "percentage": "50" } } }