ColumnDataType - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

ColumnDataType

Checks the inherent data type of the values in a given column against the provided expected type. Accepts a with threshold expression to check for a subset of the values in the column.

Syntax

ColumnDataType <COL_NAME> = <EXPECTED_TYPE> ColumnDataType <COL_NAME> = <EXPECTED_TYPE> with threshold <EXPRESSION>
  • COL_NAME – The name of the column that you want to evaluate the data quality rule against.

    Supported column types: String type

    Supported column types: Byte, Decimal, Double, Float, Integer, Long, Short

  • EXPECTED_TYPE – The expected type of the values in the column.

    Supported values: Boolean, Date, Timestamp, Integer, Double, Float, Long

    Supported column types: Byte, Decimal, Double, Float, Integer, Long, Short

  • EXPRESSION – An optional expression to specify the percentage of values that should be of the expected type.

    Supported column types: Byte, Decimal, Double, Float, Integer, Long, Short

Example: Column data type integers as strings

The following example rule checks whether the values in the given column, which is of type string, are actually integers.

ColumnDataType "colA" = "INTEGER"

Example: Column data type integers as strings check for a subset of the values

The following example rule checks whether more than 90% of the values in the given column, which is of type string, are actually integers.

ColumnDataType "colA" = "INTEGER" with threshold > 0.9