Data Quality API - Amazon Glue
 —  data types  —DataSourceDataQualityRulesetListDetailsDataQualityTargetTableDataQualityRulesetEvaluationRunDescriptionDataQualityRulesetEvaluationRunFilterDataQualityEvaluationRunAdditionalRunOptionsDataQualityRuleRecommendationRunDescriptionDataQualityRuleRecommendationRunFilterDataQualityResultDataQualityAnalyzerResultDataQualityObservationMetricBasedObservationDataQualityMetricValuesDataQualityRuleResultDataQualityResultDescriptionDataQualityResultFilterCriteriaDataQualityRulesetFilterCriteria —  operations  —StartDataQualityRulesetEvaluationRun (start_data_quality_ruleset_evaluation_run)CancelDataQualityRulesetEvaluationRun (cancel_data_quality_ruleset_evaluation_run)GetDataQualityRulesetEvaluationRun (get_data_quality_ruleset_evaluation_run)ListDataQualityRulesetEvaluationRuns (list_data_quality_ruleset_evaluation_runs)StartDataQualityRuleRecommendationRun (start_data_quality_rule_recommendation_run)CancelDataQualityRuleRecommendationRun (cancel_data_quality_rule_recommendation_run)GetDataQualityRuleRecommendationRun (get_data_quality_rule_recommendation_run)ListDataQualityRuleRecommendationRuns (list_data_quality_rule_recommendation_runs)GetDataQualityResult (get_data_quality_result)BatchGetDataQualityResult (batch_get_data_quality_result)ListDataQualityResults (list_data_quality_results)CreateDataQualityRuleset (create_data_quality_ruleset)DeleteDataQualityRuleset (delete_data_quality_ruleset)GetDataQualityRuleset (get_data_quality_ruleset)ListDataQualityRulesets (list_data_quality_rulesets)UpdateDataQualityRuleset (update_data_quality_ruleset)
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Data Quality API

The Data Quality API describes the data quality data types, and includes the API for creating, deleting, or updating data quality rulesets, runs and evaluations.

Data types

DataSource structure

A data source (an Amazon Glue table) for which you want data quality results.

Fields
  • GlueTableRequired: A GlueTable object.

    An Amazon Glue table.

DataQualityRulesetListDetails structure

Describes a data quality ruleset returned by GetDataQualityRuleset.

Fields
  • Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the data quality ruleset.

  • Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    A description of the data quality ruleset.

  • CreatedOn – Timestamp.

    The date and time the data quality ruleset was created.

  • LastModifiedOn – Timestamp.

    The date and time the data quality ruleset was last modified.

  • TargetTable – A DataQualityTargetTable object.

    An object representing an Amazon Glue table.

  • RecommendationRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    When a ruleset was created from a recommendation run, this run ID is generated to link the two together.

  • RuleCount – Number (integer).

    The number of rules in the ruleset.

DataQualityTargetTable structure

An object representing an Amazon Glue table.

Fields
  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the Amazon Glue table.

  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the database where the Amazon Glue table exists.

  • CatalogId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The catalog id where the Amazon Glue table exists.

DataQualityRulesetEvaluationRunDescription structure

Describes the result of a data quality ruleset evaluation run.

Fields
  • RunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The unique run identifier associated with this run.

  • Status – UTF-8 string (valid values: STARTING | RUNNING | STOPPING | STOPPED | SUCCEEDED | FAILED | TIMEOUT).

    The status for this run.

  • StartedOn – Timestamp.

    The date and time when the run started.

  • DataSource – A DataSource object.

    The data source (an Amazon Glue table) associated with the run.

DataQualityRulesetEvaluationRunFilter structure

The filter criteria.

Fields
  • DataSourceRequired: A DataSource object.

    Filter based on a data source (an Amazon Glue table) associated with the run.

  • StartedBefore – Timestamp.

    Filter results by runs that started before this time.

  • StartedAfter – Timestamp.

    Filter results by runs that started after this time.

DataQualityEvaluationRunAdditionalRunOptions structure

Additional run options you can specify for an evaluation run.

Fields
  • CloudWatchMetricsEnabled – Boolean.

    Whether or not to enable CloudWatch metrics.

  • ResultsS3Prefix – UTF-8 string.

    Prefix for Amazon S3 to store results.

  • CompositeRuleEvaluationMethod – UTF-8 string (valid values: COLUMN | ROW).

    Set the evaluation method for composite rules in the ruleset to ROW/COLUMN

DataQualityRuleRecommendationRunDescription structure

Describes the result of a data quality rule recommendation run.

Fields
  • RunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The unique run identifier associated with this run.

  • Status – UTF-8 string (valid values: STARTING | RUNNING | STOPPING | STOPPED | SUCCEEDED | FAILED | TIMEOUT).

    The status for this run.

  • StartedOn – Timestamp.

    The date and time when this run started.

  • DataSource – A DataSource object.

    The data source (Amazon Glue table) associated with the recommendation run.

DataQualityRuleRecommendationRunFilter structure

A filter for listing data quality recommendation runs.

Fields
  • DataSourceRequired: A DataSource object.

    Filter based on a specified data source (Amazon Glue table).

  • StartedBefore – Timestamp.

    Filter based on time for results started before provided time.

  • StartedAfter – Timestamp.

    Filter based on time for results started after provided time.

DataQualityResult structure

Describes a data quality result.

Fields
  • ResultId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    A unique result ID for the data quality result.

  • Score – Number (double), not more than 1.0.

    An aggregate data quality score. Represents the ratio of rules that passed to the total number of rules.

  • DataSource – A DataSource object.

    The table associated with the data quality result, if any.

  • RulesetName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the ruleset associated with the data quality result.

  • EvaluationContext – UTF-8 string.

    In the context of a job in Amazon Glue Studio, each node in the canvas is typically assigned some sort of name and data quality nodes will have names. In the case of multiple nodes, the evaluationContext can differentiate the nodes.

  • StartedOn – Timestamp.

    The date and time when this data quality run started.

  • CompletedOn – Timestamp.

    The date and time when this data quality run completed.

  • JobName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The job name associated with the data quality result, if any.

  • JobRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The job run ID associated with the data quality result, if any.

  • RulesetEvaluationRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The unique run ID for the ruleset evaluation for this data quality result.

  • RuleResults – An array of DataQualityRuleResult objects, not more than 2000 structures.

    A list of DataQualityRuleResult objects representing the results for each rule.

  • AnalyzerResults – An array of DataQualityAnalyzerResult objects, not more than 2000 structures.

    A list of DataQualityAnalyzerResult objects representing the results for each analyzer.

  • Observations – An array of DataQualityObservation objects, not more than 50 structures.

    A list of DataQualityObservation objects representing the observations generated after evaluating the rules and analyzers.

DataQualityAnalyzerResult structure

Describes the result of the evaluation of a data quality analyzer.

Fields
  • Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the data quality analyzer.

  • Description – UTF-8 string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    A description of the data quality analyzer.

  • EvaluationMessage – UTF-8 string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    An evaluation message.

  • EvaluatedMetrics – A map array of key-value pairs.

    Each key is a UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Each value is a Number (double).

    A map of metrics associated with the evaluation of the analyzer.

DataQualityObservation structure

Describes the observation generated after evaluating the rules and analyzers.

Fields
  • Description – UTF-8 string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    A description of the data quality observation.

  • MetricBasedObservation – A MetricBasedObservation object.

    An object of type MetricBasedObservation representing the observation that is based on evaluated data quality metrics.

MetricBasedObservation structure

Describes the metric based observation generated based on evaluated data quality metrics.

Fields
  • MetricName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the data quality metric used for generating the observation.

  • MetricValues – A DataQualityMetricValues object.

    An object of type DataQualityMetricValues representing the analysis of the data quality metric value.

  • NewRules – An array of UTF-8 strings.

    A list of new data quality rules generated as part of the observation based on the data quality metric value.

DataQualityMetricValues structure

Describes the data quality metric value according to the analysis of historical data.

Fields
  • ActualValue – Number (double).

    The actual value of the data quality metric.

  • ExpectedValue – Number (double).

    The expected value of the data quality metric according to the analysis of historical data.

  • LowerLimit – Number (double).

    The lower limit of the data quality metric value according to the analysis of historical data.

  • UpperLimit – Number (double).

    The upper limit of the data quality metric value according to the analysis of historical data.

DataQualityRuleResult structure

Describes the result of the evaluation of a data quality rule.

Fields
  • Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the data quality rule.

  • Description – UTF-8 string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    A description of the data quality rule.

  • EvaluationMessage – UTF-8 string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    An evaluation message.

  • Result – UTF-8 string (valid values: PASS | FAIL | ERROR).

    A pass or fail status for the rule.

  • EvaluatedMetrics – A map array of key-value pairs.

    Each key is a UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Each value is a Number (double).

    A map of metrics associated with the evaluation of the rule.

DataQualityResultDescription structure

Describes a data quality result.

Fields
  • ResultId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The unique result ID for this data quality result.

  • DataSource – A DataSource object.

    The table name associated with the data quality result.

  • JobName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The job name associated with the data quality result.

  • JobRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The job run ID associated with the data quality result.

  • StartedOn – Timestamp.

    The time that the run started for this data quality result.

DataQualityResultFilterCriteria structure

Criteria used to return data quality results.

Fields
  • DataSource – A DataSource object.

    Filter results by the specified data source. For example, retrieving all results for an Amazon Glue table.

  • JobName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Filter results by the specified job name.

  • JobRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Filter results by the specified job run ID.

  • StartedAfter – Timestamp.

    Filter results by runs that started after this time.

  • StartedBefore – Timestamp.

    Filter results by runs that started before this time.

DataQualityRulesetFilterCriteria structure

The criteria used to filter data quality rulesets.

Fields
  • Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the ruleset filter criteria.

  • Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    The description of the ruleset filter criteria.

  • CreatedBefore – Timestamp.

    Filter on rulesets created before this date.

  • CreatedAfter – Timestamp.

    Filter on rulesets created after this date.

  • LastModifiedBefore – Timestamp.

    Filter on rulesets last modified before this date.

  • LastModifiedAfter – Timestamp.

    Filter on rulesets last modified after this date.

  • TargetTable – A DataQualityTargetTable object.

    The name and database name of the target table.

Operations

StartDataQualityRulesetEvaluationRun action (Python: start_data_quality_ruleset_evaluation_run)

Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Amazon Glue table). The evaluation computes results which you can retrieve with the GetDataQualityResult API.

Request
  • DataSourceRequired: A DataSource object.

    The data source (Amazon Glue table) associated with this run.

  • RoleRequired: UTF-8 string.

    An IAM role supplied to encrypt the results of the run.

  • NumberOfWorkers – Number (integer).

    The number of G.1X workers to be used in the run. The default is 5.

  • Timeout – Number (integer), at least 1.

    The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

  • ClientToken – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.

  • AdditionalRunOptions – A DataQualityEvaluationRunAdditionalRunOptions object.

    Additional run options you can specify for an evaluation run.

  • RulesetNamesRequired: An array of UTF-8 strings, not less than 1 or more than 10 strings.

    A list of ruleset names.

  • AdditionalDataSources – A map array of key-value pairs.

    Each key is a UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Each value is a A DataSource object.

    A map of reference strings to additional data sources you can specify for an evaluation run.

Response
  • RunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The unique run identifier associated with this run.

Errors
  • InvalidInputException

  • EntityNotFoundException

  • OperationTimeoutException

  • InternalServiceException

  • ConflictException

CancelDataQualityRulesetEvaluationRun action (Python: cancel_data_quality_ruleset_evaluation_run)

Cancels a run where a ruleset is being evaluated against a data source.

Request
  • RunIdRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The unique run identifier associated with this run.

Response
  • No Response parameters.

Errors
  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

GetDataQualityRulesetEvaluationRun action (Python: get_data_quality_ruleset_evaluation_run)

Retrieves a specific run where a ruleset is evaluated against a data source.

Request
  • RunIdRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The unique run identifier associated with this run.

Response
  • RunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The unique run identifier associated with this run.

  • DataSource – A DataSource object.

    The data source (an Amazon Glue table) associated with this evaluation run.

  • Role – UTF-8 string.

    An IAM role supplied to encrypt the results of the run.

  • NumberOfWorkers – Number (integer).

    The number of G.1X workers to be used in the run. The default is 5.

  • Timeout – Number (integer), at least 1.

    The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

  • AdditionalRunOptions – A DataQualityEvaluationRunAdditionalRunOptions object.

    Additional run options you can specify for an evaluation run.

  • Status – UTF-8 string (valid values: STARTING | RUNNING | STOPPING | STOPPED | SUCCEEDED | FAILED | TIMEOUT).

    The status for this run.

  • ErrorString – UTF-8 string.

    The error strings that are associated with the run.

  • StartedOn – Timestamp.

    The date and time when this run started.

  • LastModifiedOn – Timestamp.

    A timestamp. The last point in time when this data quality rule recommendation run was modified.

  • CompletedOn – Timestamp.

    The date and time when this run was completed.

  • ExecutionTime – Number (integer).

    The amount of time (in seconds) that the run consumed resources.

  • RulesetNames – An array of UTF-8 strings, not less than 1 or more than 10 strings.

    A list of ruleset names for the run. Currently, this parameter takes only one Ruleset name.

  • ResultIds – An array of UTF-8 strings, not less than 1 or more than 10 strings.

    A list of result IDs for the data quality results for the run.

  • AdditionalDataSources – A map array of key-value pairs.

    Each key is a UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Each value is a A DataSource object.

    A map of reference strings to additional data sources you can specify for an evaluation run.

Errors
  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

ListDataQualityRulesetEvaluationRuns action (Python: list_data_quality_ruleset_evaluation_runs)

Lists all the runs meeting the filter criteria, where a ruleset is evaluated against a data source.

Request
  • Filter – A DataQualityRulesetEvaluationRunFilter object.

    The filter criteria.

  • NextToken – UTF-8 string.

    A paginated token to offset the results.

  • MaxResults – Number (integer), not less than 1 or more than 1000.

    The maximum number of results to return.

Response
  • Runs – An array of DataQualityRulesetEvaluationRunDescription objects.

    A list of DataQualityRulesetEvaluationRunDescription objects representing data quality ruleset runs.

  • NextToken – UTF-8 string.

    A pagination token, if more results are available.

Errors
  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

StartDataQualityRuleRecommendationRun action (Python: start_data_quality_rule_recommendation_run)

Starts a recommendation run that is used to generate rules when you don't know what rules to write. Amazon Glue Data Quality analyzes the data and comes up with recommendations for a potential ruleset. You can then triage the ruleset and modify the generated ruleset to your liking.

Recommendation runs are automatically deleted after 90 days.

Request
  • DataSourceRequired: A DataSource object.

    The data source (Amazon Glue table) associated with this run.

  • RoleRequired: UTF-8 string.

    An IAM role supplied to encrypt the results of the run.

  • NumberOfWorkers – Number (integer).

    The number of G.1X workers to be used in the run. The default is 5.

  • Timeout – Number (integer), at least 1.

    The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

  • CreatedRulesetName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    A name for the ruleset.

  • ClientToken – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.

Response
  • RunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The unique run identifier associated with this run.

Errors
  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

  • ConflictException

CancelDataQualityRuleRecommendationRun action (Python: cancel_data_quality_rule_recommendation_run)

Cancels the specified recommendation run that was being used to generate rules.

Request
  • RunIdRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The unique run identifier associated with this run.

Response
  • No Response parameters.

Errors
  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

GetDataQualityRuleRecommendationRun action (Python: get_data_quality_rule_recommendation_run)

Gets the specified recommendation run that was used to generate rules.

Request
  • RunIdRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The unique run identifier associated with this run.

Response
  • RunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The unique run identifier associated with this run.

  • DataSource – A DataSource object.

    The data source (an Amazon Glue table) associated with this run.

  • Role – UTF-8 string.

    An IAM role supplied to encrypt the results of the run.

  • NumberOfWorkers – Number (integer).

    The number of G.1X workers to be used in the run. The default is 5.

  • Timeout – Number (integer), at least 1.

    The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

  • Status – UTF-8 string (valid values: STARTING | RUNNING | STOPPING | STOPPED | SUCCEEDED | FAILED | TIMEOUT).

    The status for this run.

  • ErrorString – UTF-8 string.

    The error strings that are associated with the run.

  • StartedOn – Timestamp.

    The date and time when this run started.

  • LastModifiedOn – Timestamp.

    A timestamp. The last point in time when this data quality rule recommendation run was modified.

  • CompletedOn – Timestamp.

    The date and time when this run was completed.

  • ExecutionTime – Number (integer).

    The amount of time (in seconds) that the run consumed resources.

  • RecommendedRuleset – UTF-8 string, not less than 1 or more than 65536 bytes long.

    When a start rule recommendation run completes, it creates a recommended ruleset (a set of rules). This member has those rules in Data Quality Definition Language (DQDL) format.

  • CreatedRulesetName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the ruleset that was created by the run.

Errors
  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

ListDataQualityRuleRecommendationRuns action (Python: list_data_quality_rule_recommendation_runs)

Lists the recommendation runs meeting the filter criteria.

Request
  • Filter – A DataQualityRuleRecommendationRunFilter object.

    The filter criteria.

  • NextToken – UTF-8 string.

    A paginated token to offset the results.

  • MaxResults – Number (integer), not less than 1 or more than 1000.

    The maximum number of results to return.

Response
  • Runs – An array of DataQualityRuleRecommendationRunDescription objects.

    A list of DataQualityRuleRecommendationRunDescription objects.

  • NextToken – UTF-8 string.

    A pagination token, if more results are available.

Errors
  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

GetDataQualityResult action (Python: get_data_quality_result)

Retrieves the result of a data quality rule evaluation.

Request
  • ResultIdRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    A unique result ID for the data quality result.

Response
  • ResultId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    A unique result ID for the data quality result.

  • Score – Number (double), not more than 1.0.

    An aggregate data quality score. Represents the ratio of rules that passed to the total number of rules.

  • DataSource – A DataSource object.

    The table associated with the data quality result, if any.

  • RulesetName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the ruleset associated with the data quality result.

  • EvaluationContext – UTF-8 string.

    In the context of a job in Amazon Glue Studio, each node in the canvas is typically assigned some sort of name and data quality nodes will have names. In the case of multiple nodes, the evaluationContext can differentiate the nodes.

  • StartedOn – Timestamp.

    The date and time when the run for this data quality result started.

  • CompletedOn – Timestamp.

    The date and time when the run for this data quality result was completed.

  • JobName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The job name associated with the data quality result, if any.

  • JobRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The job run ID associated with the data quality result, if any.

  • RulesetEvaluationRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The unique run ID associated with the ruleset evaluation.

  • RuleResults – An array of DataQualityRuleResult objects, not more than 2000 structures.

    A list of DataQualityRuleResult objects representing the results for each rule.

  • AnalyzerResults – An array of DataQualityAnalyzerResult objects, not more than 2000 structures.

    A list of DataQualityAnalyzerResult objects representing the results for each analyzer.

  • Observations – An array of DataQualityObservation objects, not more than 50 structures.

    A list of DataQualityObservation objects representing the observations generated after evaluating the rules and analyzers.

Errors
  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

  • EntityNotFoundException

BatchGetDataQualityResult action (Python: batch_get_data_quality_result)

Retrieves a list of data quality results for the specified result IDs.

Request
  • ResultIdsRequired: An array of UTF-8 strings, not less than 1 or more than 100 strings.

    A list of unique result IDs for the data quality results.

Response
  • ResultsRequired: An array of DataQualityResult objects.

    A list of DataQualityResult objects representing the data quality results.

  • ResultsNotFound – An array of UTF-8 strings, not less than 1 or more than 100 strings.

    A list of result IDs for which results were not found.

Errors
  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

ListDataQualityResults action (Python: list_data_quality_results)

Returns all data quality execution results for your account.

Request
  • Filter – A DataQualityResultFilterCriteria object.

    The filter criteria.

  • NextToken – UTF-8 string.

    A paginated token to offset the results.

  • MaxResults – Number (integer), not less than 1 or more than 1000.

    The maximum number of results to return.

Response
  • ResultsRequired: An array of DataQualityResultDescription objects.

    A list of DataQualityResultDescription objects.

  • NextToken – UTF-8 string.

    A pagination token, if more results are available.

Errors
  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

CreateDataQualityRuleset action (Python: create_data_quality_ruleset)

Creates a data quality ruleset with DQDL rules applied to a specified Amazon Glue table.

You create the ruleset using the Data Quality Definition Language (DQDL). For more information, see the Amazon Glue developer guide.

Request
  • NameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    A unique name for the data quality ruleset.

  • Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    A description of the data quality ruleset.

  • RulesetRequired: UTF-8 string, not less than 1 or more than 65536 bytes long.

    A Data Quality Definition Language (DQDL) ruleset. For more information, see the Amazon Glue developer guide.

  • Tags – A map array of key-value pairs, not more than 50 pairs.

    Each key is a UTF-8 string, not less than 1 or more than 128 bytes long.

    Each value is a UTF-8 string, not more than 256 bytes long.

    A list of tags applied to the data quality ruleset.

  • TargetTable – A DataQualityTargetTable object.

    A target table associated with the data quality ruleset.

  • RecommendationRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    A unique run ID for the recommendation run.

  • ClientToken – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.

Response
  • Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    A unique name for the data quality ruleset.

Errors
  • InvalidInputException

  • AlreadyExistsException

  • OperationTimeoutException

  • InternalServiceException

  • ResourceNumberLimitExceededException

DeleteDataQualityRuleset action (Python: delete_data_quality_ruleset)

Deletes a data quality ruleset.

Request
  • NameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    A name for the data quality ruleset.

Response
  • No Response parameters.

Errors
  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

GetDataQualityRuleset action (Python: get_data_quality_ruleset)

Returns an existing ruleset by identifier or name.

Request
  • NameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the ruleset.

Response
  • Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the ruleset.

  • Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    A description of the ruleset.

  • Ruleset – UTF-8 string, not less than 1 or more than 65536 bytes long.

    A Data Quality Definition Language (DQDL) ruleset. For more information, see the Amazon Glue developer guide.

  • TargetTable – A DataQualityTargetTable object.

    The name and database name of the target table.

  • CreatedOn – Timestamp.

    A timestamp. The time and date that this data quality ruleset was created.

  • LastModifiedOn – Timestamp.

    A timestamp. The last point in time when this data quality ruleset was modified.

  • RecommendationRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    When a ruleset was created from a recommendation run, this run ID is generated to link the two together.

Errors
  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

ListDataQualityRulesets action (Python: list_data_quality_rulesets)

Returns a paginated list of rulesets for the specified list of Amazon Glue tables.

Request
  • NextToken – UTF-8 string.

    A paginated token to offset the results.

  • MaxResults – Number (integer), not less than 1 or more than 1000.

    The maximum number of results to return.

  • Filter – A DataQualityRulesetFilterCriteria object.

    The filter criteria.

  • Tags – A map array of key-value pairs, not more than 50 pairs.

    Each key is a UTF-8 string, not less than 1 or more than 128 bytes long.

    Each value is a UTF-8 string, not more than 256 bytes long.

    A list of key-value pair tags.

Response
  • Rulesets – An array of DataQualityRulesetListDetails objects.

    A paginated list of rulesets for the specified list of Amazon Glue tables.

  • NextToken – UTF-8 string.

    A pagination token, if more results are available.

Errors
  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

UpdateDataQualityRuleset action (Python: update_data_quality_ruleset)

Updates the specified data quality ruleset.

Request
  • NameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the data quality ruleset.

  • Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    A description of the ruleset.

  • Ruleset – UTF-8 string, not less than 1 or more than 65536 bytes long.

    A Data Quality Definition Language (DQDL) ruleset. For more information, see the Amazon Glue developer guide.

Response
  • Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the data quality ruleset.

  • Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    A description of the ruleset.

  • Ruleset – UTF-8 string, not less than 1 or more than 65536 bytes long.

    A Data Quality Definition Language (DQDL) ruleset. For more information, see the Amazon Glue developer guide.

Errors
  • EntityNotFoundException

  • AlreadyExistsException

  • IdempotentParameterMismatchException

  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

  • ResourceNumberLimitExceededException