Data Quality rule builder - Amazon Glue Studio
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Data Quality rule builder

The DQDL rule builder allows you to create data quality rules that will evaluate your data. Start with selecting a rule type and then specifying the parameters in the rule editor. The rule editor will also show you any errors and warnings as you create rules.

The DQDL guide provides comprehensive documentation on how to construct rules using the DQDL syntax, built-in rule types and examples.


            The screenshot shows the DQDL rule builder with a rule type selected. The ColumnCorrelationRules rule type
                is visible in the rule editor.

Evaluate Data Quality node

When working with the Evaluate Data Quality transform node and the DQDL rule builder, you can expand the working space.

  • To expand the Transform tab to fill the entire screen, click on the expand icon in the upper-right hand corner of the node details panel.

  • To expand the DQDL rule editor, click on the << icon to expand the rule editor and collapse the Rule types and Schema tabs.


                The screenshot shows the DQDL rule builder with a rule type selected. The ColumnCorrelationRules rule type
                    is visible in the rule editor.

Components

There are 18 rule types that are built-in to Amazon Glue Studio. Each rule type has a description and examples of how they can be used.

Data quality rule types

Amazon Glue Studio provides built-in rule types for ease of creating a rule. For more information on rule types, see DQDL rule type reference.

Schema

The Schema tab displays the column names and data type from the parent node. You can view the input schema, search by column name, and insert the column into the rule editor.


                    The screenshot shows the rule editor with a complete rule using the Completeness rule type.

Rule editor

The rule editor is a text editor where you can write and edit rules. If you select a rule type from the DQDL rule builder, the rule type is added to the rule editor. You can then specify parameters, add rules and edit rules as needed by modifying the text. Amazon Glue Studio validates the rules in the rule editor and displays errors and warnings if present.

Errors and warnings

If a rule does not follow the DQDL rule syntax, the rule editor will show several visual indicators that there is an error:

  • The rule editor displays an error icon and red color on the line with the error.

  • The rule editor displays the number of errors next to the red error icon.

  • When you click on the line with the error, a description of the error and location (line and column) are displayed at the bottom of the rule editor.


                    The screenshot shows the DQDL rule editor with error indicators on line 1 and at the bottom of the rule
                        editor with the number of errors. Beneath this is the description of the error.

Data quality actions

By default, this action is not selected and the job will complete its run even if data quality rules fail.

Choose between the following actions. Actions enable you to publish results to CloudWatch or stop jobs based on specific criteria. Actions are only available after you create a rule.

  • Publish results to CloudWatch – when you run a job, add the results to CloudWatch.

  • Fail job when data quality fails – if data quality rules fail, the job will also fail as a result.

Data quality transform output

  • Original data — Choose to output original input data. This option is ideal if you want to stop the job when quality issues are detected.

  • Data quality metrics — Choose to output configured rules and their pass or fail status. This option is useful if you want to take a custom action.

Data quality output settings

Sets the data quality result location by specifying the Amazon S3 location as the data quality output target.