StandardDeviation - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

StandardDeviation

Checks the standard deviation of all of the values in a column against a given expression.

Syntax

StandardDeviation <COL_NAME> <EXPRESSION>
  • COL_NAME – The name of the column that you want to evaluate the data quality rule against.

    Supported column types: Byte, Decimal, Double, Float, Integer, Long, Short

  • EXPRESSION – An expression to run against the rule type response in order to produce a Boolean value. For more information, see Expressions.

Example: Standard deviation

The following example rule checks whether the standard deviation of the values in a column named colA is less than a specified value.

StandardDeviation "Star_Rating" < 1.5 StandardDeviation "Salary" < 3500 where "Customer_ID < 10"

Sample dynamic rules

  • StandardDeviation "colA" > avg(last(10) + 0.1

  • StandardDeviation "colA" between min(last(10)) - 1 and max(last(10)) + 1

Null behavior

The StandardDeviation rule will ignore rows with NULL values in the calculation of standard deviation. For example:

+---+-----------+-----------+ |id |units1 |units2 | +---+-----------+-----------+ |100|0 |0 | |101|null |0 | |102|20 |20 | |103|null |0 | |104|40 |40 | +---+-----------+-----------+

The standard deviation of column units1 will not consider rows 101 and 103 and result to 16.33. The standard deviation for column units2 will result in 16.