DQDL rule type reference
This section provides a reference for each rule type that Amazon Glue Data Quality supports.
Note
DQDL doesn't currently support nested or list-type column data.
Bracketed values in the below table will be replaced with the information provided in rule arguments.
Rules typically require an additional argument for expression.
| Ruletype | Description | Arguments | Reported Metrics | Supported as Rule? | Supported as Analyzer? | Returns row-level Results? | Dynamic rule support? | Generates Observations | Supports Where Clause Syntax? |
|---|---|---|---|---|---|---|---|---|---|
| AggregateMatch | Checks if two datasets match by comparing summary metrics like total sales amount. Useful for financial institutions to compare if all data is ingested from source systems. | One or more aggregations |
When first and second aggregation column names match:
When first and second aggregation column names different:
|
Yes | No | No | No | No | No |
| AllStatistics | Standalone analyzer to gather multiple metrics for the provided column in a dataset. | A single column name |
For columns of all types:
Additional metrics for string-valued columns:
Additional metrics for numeric-valued columns:
|
No | Yes | No | No | No | No |
| ColumnCorrelation | Checks how well two columns are correlated. | Exactly two column names | Multicolumn.[Column1,Column2].ColumnCorrelation |
Yes | Yes | No | Yes | No | Yes |
| ColumnCount | Checks if any columns are dropped. | None | Dataset.*.ColumnCount |
Yes | Yes | No | Yes | Yes | No |
| ColumnDataType | Checks if a column is compliant with a datatype. | Exactly one column name | Column.[Column].ColumnDataType.Compliance |
Yes | No | No | Yes, in row-level threshold expression | No | Yes |
| ColumnExists | Checks if columns exist in a dataset. This allows customers building self service data platforms to ensure certain columns are made available. | Exactly one column name | N/A | Yes | No | No | No | No | No |
| ColumnLength | Checks if length of data is consistent. | Exactly one column name |
Additional metric when row-level threshold provided:
|
Yes | Yes | Yes, when row-level threshold provided | No | Yes. Only generates observations by analyzing Minimum and Maximum length | Yes |
| ColumnNamesMatchPattern | Checks if column names match defined patterns. Useful for governance teams to enforce column name consistency. | A regex for column names | Dataset.*.ColumnNamesPatternMatchRatio |
Yes | No | No | No | No | No |
| ColumnValues | Checks if data is consistent per defined values. This rule supports regular expressions. | Exactly one column name |
Additional metric when row-level threshold provided:
|
Yes | Yes | Yes, when row-level threshold provided | No | Yes. Only generates observations by analyzing Minimum and Maximum values | Yes |
| Completeness | Checks for any blank or NULLs in data. | Exactly one column name |
|
Yes | Yes | Yes | Yes | Yes | Yes |
| CustomSql | Customers can implement almost any type of data quality checks in SQL. |
A SQL statement (Optional) A row-level threshold |
Additional metric when row-level threshold provided:
|
Yes | No | Yes, when row-level threshold provided | Yes | No | No |
| DataFreshness | Checks if data is fresh. | Exactly one column name | Column.[Column].DataFreshness.Compliance |
Yes | No | Yes | No | No | Yes |
| DatasetMatch | Compares two datasets and identifies if they are in synch. |
Name of a reference dataset A column mapping (Optional) Columns to check for matches |
Dataset.[ReferenceDatasetAlias].DatasetMatch |
Yes | No | Yes | Yes | No | No |
| DistinctValuesCount | Checks for duplicate values. | Exactly one column name | Column.[Column].DistinctValuesCount |
Yes | Yes | Yes | Yes | Yes | Yes |
| DetectAnomalies | Checks for anomalies in another rule type's reported metrics. | A rule type | Metric(s) reported by the rule type argument | Yes | No | No | No | No | No |
| Entropy | Checks for entropy of the data. | Exactly one column name | Column.[Column].Entropy |
Yes | Yes | No | Yes | No | Yes |
| IsComplete | Checks if 100% of the data is complete. | Exactly one column name | Column.[Column].Completeness |
Yes | No | Yes | No | No | Yes |
| IsPrimaryKey | Checks if a column is a primary key (not NULL and unique). | Exactly one column name |
For single column:
For multiple columns:
|
Yes | No | Yes | No | No | Yes |
| IsUnique | Checks if 100% of the data is unique. | Exactly one column name | Column.[Column].Uniqueness |
Yes | No | Yes | No | No | Yes |
| Mean | Checks if the mean matches the set threshold. | Exactly one column name | Column.[Column].Mean |
Yes | Yes | Yes | Yes | No | Yes |
| ReferentialIntegrity | Checks if two datasets have referential integrity. |
One or more column names from dataset One or more column names from reference dataset |
Column.[ReferenceDatasetAlias].ReferentialIntegrity |
Yes | No | Yes | Yes | No | No |
| RowCount | Checks if record counts match a threshold. | None | Dataset.*.RowCount |
Yes | Yes | No | Yes | Yes | Yes |
| RowCountMatch | Checks if record counts between two datasets match. | Reference dataset alias | Dataset.[ReferenceDatasetAlias].RowCountMatch |
Yes | No | No | Yes | No | No |
| StandardDeviation | Checks if standard deviation matches the threshold. | Exactly one column name | Column.[Column].StandardDeviation |
Yes | Yes | Yes | Yes | No | Yes |
| SchemaMatch | Checks if schema between two datasets match. | Reference dataset alias | Dataset.[ReferenceDatasetAlias].SchemaMatch |
Yes | No | No | Yes | No | No |
| Sum | Checks if sum matches a set threshold. | Exactly one column name | Column.[Column].Sum |
Yes | Yes | No | Yes | No | Yes |
| Uniqueness | Checks if uniqueness of dataset matches threshold. | Exactly one column name | Column.[Column].Uniqueness |
Yes | Yes | Yes | Yes | No | Yes |
| UniqueValueRatio | Checks if the unique value ration matches threshold. | Exactly one column name | Column.[Column].UniqueValueRatio |
Yes | Yes | Yes | Yes | No | Yes |
| FileFreshness | Checks if files in Amazon S3 are fresh. | File or Folder path and a threshold. |
|
Yes | No | No | No | No | No |
| FileMatch | Checks if contents of file match to a checksum or with other file. This rule uses checksums to validate if two files are same. | Source File or Folder path and Target file or folder path. | No statistics are generated. | Yes | No | No | No | No | No |
| FileSize | Checks if the size of a file matches with a specified condition. | File or folder path and threshold. |
|
Yes | No | No | No | No | No |
| FileUniqueness | Checks if files are unique using checksums. | File or folder path and threshold. |
|
Yes | No | No | No | No | No |