Using the Record Matching transform to invoke an existing data classification transform
This transform invokes an existing Record Matching machine learning data classification transform.
The transform evaluates the current data against the trained model based on labels. A column "match_id" is added to assign each row to a group of items that are considered equivalent based on the algorithm training. For more information, see Record matching with Lake Formation FindMatches.
Note
The version of Amazon Glue used by the visual job must match the version that Amazon Glue used to create the Record Matching transform.
To add a Record Matching transform node to your job diagram
-
Open the Resource panel, and then choose Record Matching to add a new transform to your job diagram. The node selected at the time of adding the node will be its parent.
In the node properties panel, you can enter a name for the node in the job diagram. If a node parent isn't already selected, choose a node from the Node parents list to use as the input source for the transform.
On the Transform tab, enter the ID taken from the Machine learning transforms page:
(Optional) On the Transform tab, you can check the option to add the confidence scores. At the cost of extra computing, the model will estimate a confidence score for each match as an additional column.