

# Duplicate message handling


For data objects sent in real time, Ground Truth guarantees idempotency by ensuring each unique object is only sent for labeling once, even if the input message referring to that object is received multiple times (duplicate messages). To do this, each data object sent to a streaming labeling job is assigned a *deduplication ID*, which is identified with a *deduplication key*. If you send your requests to label data objects directly through your Amazon SNS input topic using Amazon SNS messages, you can optionally choose a custom deduplication key and deduplication IDs for your objects. For more information, see [Specify a deduplication key and ID in an Amazon SNS message](sms-streaming-impotency-create.md).

If you do not provide your own deduplication key, or if you use the Amazon S3 configuration to send data objects to your labeling job, Ground Truth uses one of the following for the deduplication ID:
+ For messages sent directly to your Amazon SNS input topic, Ground Truth uses the SNS message ID. 
+ For messages that come from an Amazon S3 configuration, Ground Truth creates a deduplication ID by combining the Amazon S3 URI of the object with the [sequencer token](https://docs.aws.amazon.com/AmazonS3/latest/dev/notification-content-structure.html) in the message.

# Specify a deduplication key and ID in an Amazon SNS message


When you send a data object to your streaming labeling job using an Amazon SNS message, you have the option to specify your deduplication key and deduplication ID in one of the following ways. In all of these scenarios, identify your deduplication key with `dataset-objectid-attribute-name`.

**Bring Your Own Deduplication Key and ID**

Create your own deduplication key and deduplication ID by configuring your Amazon SNS message as follows. Replace `byo-key` with your key and `UniqueId` with the deduplication ID for that data object.

```
{
    "source-ref":"s3://amzn-s3-demo-bucket/prefix/object1", 
    "dataset-objectid-attribute-name":"byo-key",
    "byo-key":"UniqueId" 
}
```

Your deduplication key can be up to 140 characters. Supported patterns include: `"^[$a-zA-Z0-9](-*[a-zA-Z0-9])*"`.

Your deduplication ID can be up to 1,024 characters. Supported patterns include: `^(https|s3)://([^/]+)/?(.*)$`.

**Use an Existing Key for your Deduplication Key**

You can use an existing key in your message as the deduplication key. When you do this, the value associated with that key is used for the deduplication ID. 

For example, you can specify use the `source-ref` key as your deduplication key by formatting your message as follows: 

```
{
    "source-ref":"s3://amzn-s3-demo-bucket/prefix/object1",
    "dataset-objectid-attribute-name":"source-ref" 
}
```

In this example, Ground Truth uses `"s3://amzn-s3-demo-bucket/prefix/object1"` for the deduplication id.

# Find deduplication key and ID in your output data


You can see the deduplication key and ID in your output data. The deduplication key is identified by `dataset-objectid-attribute-name`. When you use your own custom deduplication key, your output contains something similar to the following:

```
"dataset-objectid-attribute-name": "byo-key",
"byo-key": "UniqueId",
```

When you do not specify a key, you can find the deduplication ID that Ground Truth assigned to your data object as follows. The `$label-attribute-name-object-id` parameter identifies your deduplication ID. 

```
{
    "source-ref":"s3://bucket/prefix/object1", 
    "dataset-objectid-attribute-name":"$label-attribute-name-object-id"
    "label-attribute-name" :0,
    "label-attribute-name-metadata": {...},
    "$label-attribute-name-object-id":"<service-generated-key>"
}
```

For `<service-generated-key>`, if the data object came through an Amazon S3 configuration, Ground Truth adds a unique value used by the service and emits a new field keyed by `$sequencer` which shows the Amazon S3 sequencer used. If object was fed to SNS directly, Ground Truth use the SNS message ID.

**Note**  
Do not use the `$` character in your label attribute name. 