# ColumnCorrelation

Checks the *correlation* between two columns against a
given expression. Amazon Glue Data Quality uses the Pearson correlation coefficient to measure the linear
correlation between two columns. The result is a number between -1 and 1 that measures the
strength and direction of the relationship.

**Syntax**

`ColumnCorrelation`

`<COL_1_NAME>`

`<COL_2_NAME>`

`<EXPRESSION>`

**COL_1_NAME**– The name of the first column that you want to evaluate the data quality rule against.**Supported column types**: Byte, Decimal, Double, Float, Integer, Long, Short**COL_2_NAME**– The name of the second column that you want to evaluate the data quality rule against.**Supported column types**: Byte, Decimal, Double, Float, Integer, Long, Short**EXPRESSION**– An expression to run against the rule type response in order to produce a Boolean value. For more information, see Expressions.

**Example: Column correlation**

The following example rule checks whether the correlation coefficient between the
columns `height`

and `weight`

has a strong positive correlation (a
coefficient value greater than 0.8).

`ColumnCorrelation "height" "weight" > 0.8`

`ColumnCorrelation "weightinkgs" "Salary" > 0.8 where "weightinkgs > 40"`

**Sample dynamic rules**

`ColumnCorrelation "colA" "colB" between min(last(10)) and max(last(10))`

`ColumnCorrelation "colA" "colB" < avg(last(5)) + std(last(5))`

**Null behavior**

The `ColumnCorrelation`

rule will ignore rows with `NULL`

values in the calculation of the
correlation. For example:

`+---+-----------+ |id |units | +---+-----------+ |100|0 | |101|null | |102|20 | |103|null | |104|40 | +---+-----------+`

Rows 101 and 103 will be ignored, and the `ColumnCorrelation`

will be 1.0.