# 使用匹配项置信度分数估算匹配项质量匹配项置信度分数可以提供对 FindMatches Match 找到的匹配项质量的估计值，以区分机器学习模型高度可信、不确定或不可能的匹配记录。匹配项置信度分数将介于 0 和 1 之间，其中分数越高，就意味着相似性越高。通过检查匹配项置信度分数，您可以区分系统高度信任的匹配项集群（您可能决定合并）、系统不确定的集群（您可能决定由人类审核），以及系统认为不可能的集群（您可能决定拒绝）。如果您看到匹配项置信度分数很高，但又确定不存在匹配项，或者您看到分数很低，但又确定实际上存在匹配项，在此类情况下，您可能想调整训练数据。当存在大型行业数据集时，审核每个 FindMatches 决策在此情况下并不可行，因此置信度分数特别有用。在 Amazon Glue 版本 2.0 或更高版本中提供了匹配项置信度分数。 ## 生成匹配项置信度分数您可以生成匹配项置信度分数，方法是在调用 `FindMatches` 或 `FindIncrementalMatches` API 时，将 `computeMatchConfidenceScores` 的布尔值设置为 True（真）。 Amazon Glue 会将一个新的 `column match_confidence_score` 添加到输出。 ## 匹配项评分示例例如，考虑以下匹配记录： **分数 >= 0.9** 匹配记录的摘要： ``` primary_id | match_id | match_confidence_score 3281355037663 85899345947 0.9823658302132061 1546188247619 85899345947 0.9823658302132061 ``` 详细信息： ![具有 Internet 网关的路由表的示例。](http://docs.amazonaws.cn/glue/latest/dg/images/match_score1.png) 通过此示例，我们可以看到两条记录非常相似并且共享 `display_position`、`primary_name` 和 `street name`。 **分数 >= 0.8 且分数 < 0.9** 匹配记录的摘要： ``` primary_id | match_id | match_confidence_score 309237680432 85899345928 0.8309852373674638 3590592666790 85899345928 0.8309852373674638 343597390617 85899345928 0.8309852373674638 249108124906 85899345928 0.8309852373674638 463856477937 85899345928 0.8309852373674638 ``` 详细信息： ![具有 Internet 网关的路由表的示例。](http://docs.amazonaws.cn/glue/latest/dg/images/match_score2.png) 通过此示例，我们可以看到这些记录共享相同的 `primary_name` 和 `country`。 **分数 >= 0.6 且分数 < 0.7** 匹配记录的摘要： ``` primary_id | match_id | match_confidence_score 2164663519676 85899345930 0.6971099896480333 317827595278 85899345930 0.6971099896480333 472446424341 85899345930 0.6971099896480333 3118146262932 85899345930 0.6971099896480333 214748380804 85899345930 0.6971099896480333 ``` 详细信息： ![具有 Internet 网关的路由表的示例。](http://docs.amazonaws.cn/glue/latest/dg/images/match_score3.png) 通过此示例，我们可以看到这些记录仅共享相同的 `primary_name`。有关更多信息，请参阅： + [步骤 5：使用机器学习转换添加和运行作业](machine-learning-transform-tutorial.md#ml-transform-tutorial-add-job) + PySpark：[FindMatches 类](aws-glue-api-crawler-pyspark-transforms-findmatches.md) + PySpark：[FindIncrementalMatches 类](aws-glue-api-crawler-pyspark-transforms-findincrementalmatches.md) + Scala：[FindMatches 类](glue-etl-scala-apis-glue-ml-findmatches.md) + Scala：[FindIncrementalMatches 类](glue-etl-scala-apis-glue-ml-findincrementalmatches.md)