

# Create a streaming labeling job
<a name="sms-streaming-create-job"></a>

Streaming labeling jobs enable you to send individual data objects in real time to a perpetually running, streaming labeling job. To create a streaming labeling job, you can specify the Amazon SNS *input topic* ARN, `SnsTopicArn`, in the `InputConfig` parameter when making a [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateLabelingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateLabelingJob.html) request. Optionally, you can also create an Amazon SNS *output topic* and specify it in `OutputConfig`if you want to receive label data in real time.

**Important**  
If you are a new user of Ground Truth streaming labeling jobs, it is recommended that you review [Ground Truth streaming labeling jobs](sms-streaming-labeling-job.md) before creating a streaming labeling job. Ground Truth streaming labeling jobs are only supported through the SageMaker API.

Use the following sections to create the resources that you need and can use to create a streaming labeling job:
+ Learn how to create SNS topics with the permissions required for Ground Truth streaming labeling jobs by following the steps in [Use Amazon SNS Topics for Data Labeling](sms-create-sns-input-topic.md). Your SNS topics must be created in the same Amazon Region as your labeling job. 
+ See [Subscribe an Endpoint to Your Amazon SNS Output Topic](sms-create-sns-input-topic.md#sms-streaming-subscribe-output-topic) to learn how to set up an endpoint to receive labeling task output data at a specified endpoint each time a labeling task is completed.
+ To learn how to configure your Amazon S3 bucket to send notifications to your Amazon SNS input topic, see [Creating Amazon S3 based bucket event notifications based of the Amazon SNS defined in your labeling job](sms-streaming-s3-setup.md).
+ Optionally, add data objects that you want to have labeled as soon as the labeling job starts to your input manifest. For more information, see [Create a Manifest File (Optional)](sms-streaming-manifest.md).
+ There are other resources required to create a labeling job, such as an IAM role, Amazon S3 bucket, a worker task template and label categories. These are described in the Ground Truth documentation on creating a labeling job. For more information, see [Create a Labeling Job](sms-create-labeling-job.md). 
**Important**  
When you create a labeling job you must provide an IAM execution role. Attach the Amazon managed policy **AmazonSageMakerGroundTruthExecution** to this role to ensure it has required permissions to execute your labeling job. 

When you submit a request to create a streaming labeling job, the state of your labeling job is `Initializing`. Once the labeling job is active, the state changes to `InProgress`. Do not send new data objects to your labeling job or attempt to stop your labeling job while it is in the `Initializing` state. Once the state changes to `InProgress`, you can start sending new data objects using Amazon SNS and the Amazon S3 configuration. 

**Topics**
+ [Use Amazon SNS Topics for Data Labeling](sms-create-sns-input-topic.md)
+ [Creating Amazon S3 based bucket event notifications based of the Amazon SNS defined in your labeling job](sms-streaming-s3-setup.md)
+ [Create a Manifest File (Optional)](sms-streaming-manifest.md)
+ [Create a Streaming Labeling Job with the SageMaker API](sms-streaming-create-labeling-job-api.md)
+ [Stop a Streaming Labeling Job](sms-streaming-stop-labeling-job.md)

# Use Amazon SNS Topics for Data Labeling
<a name="sms-create-sns-input-topic"></a>

You need to create an Amazon SNS input to create a streaming labeling job. Optionally, you may provide an Amazon SNS output topic.

When you create an Amazon SNS topic to use in your streaming labeling job, note down the topic Amazon Resource Name (ARN). The ARN will be the input values for the parameter `SnsTopicArn` in `InputConfig` and `OutputConfig` when you create a labeling job.

## Create an Input Topic
<a name="sms-streaming-input-topic"></a>

Your input topic is used to send new data objects to Ground Truth. To create an input topic, follow the instructions in [Creating an Amazon SNS topic](https://docs.amazonaws.cn/sns/latest/dg/sns-create-topic.html) in the Amazon Simple Notification Service Developer Guide.

Note down your input topic ARN and use it as input for the `CreateLabelingJob` parameter `SnsTopicArn` in `InputConfig`. 

## Create an Output Topic
<a name="sms-streaming-output-topic"></a>

If you provide an output topic, it is used to send notifications when a data object is labeled. When you create a topic, you have the option to add an encryption key. Use this option to add a Amazon Key Management Service customer managed key to your topic to encrypt the output data of your labeling job before it is published to your output topic.

To create an output topic, follow the instructions in [Creating an Amazon SNS topic](https://docs.amazonaws.cn/sns/latest/dg/sns-create-topic.html) in the Amazon Simple Notification Service Developer Guide.

If you add encryption, you must attach additional permission to the topic. See [Add Encryption to Your Output Topic (Optional)](#sms-streaming-encryption). for more information.

**Important**  
To add a customer managed key to your output topic while creating a topic in the console, do not use the **(Default) alias/aws/sns** option. Select a customer managed key that you created. 

Note down your input topic ARN and use it in your `CreateLabelingJob` request in the parameter `SnsTopicArn` in `OutputConfig`. 

### Add Encryption to Your Output Topic (Optional)
<a name="sms-streaming-encryption"></a>

To encrypt messages published to your output topic, you need to provide an Amazon KMS customer managed key to your topic. Modify the following policy and add it to your customer managed key to give Ground Truth permission to encrypt output data before publishing it to your output topic.

Replace *`<account_id>`* with the ID of the account that you are using to create your topic. To learn how to find your Amazon account ID, see [Finding Your Amazon Account ID](https://docs.amazonaws.cn/IAM/latest/UserGuide/console_account-alias.html#FindingYourAWSId). 

------
#### [ JSON ]

****  

```
{
    "Id": "key-console-policy",
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "Enable IAM User Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws-cn:iam::111122223333:root"
            },
            "Action": "kms:*",
            "Resource": "*"
        },
        {
            "Sid": "Allow access for Key Administrators",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws-cn:iam::111122223333:role/Admin"
            },
            "Action": [
                "kms:Create*",
                "kms:Describe*",
                "kms:Enable*",
                "kms:List*",
                "kms:Put*",
                "kms:Update*",
                "kms:Revoke*",
                "kms:Disable*",
                "kms:Get*",
                "kms:Delete*",
                "kms:TagResource",
                "kms:UntagResource",
                "kms:ScheduleKeyDeletion",
                "kms:CancelKeyDeletion"
            ],
            "Resource": "*"
        }
    ]
}
```

------

Additionally, you must modify and add the following policy to the execution role that you use to create your labeling job (the input value for `RoleArn`). 

Replace *`<account_id>`* with the ID of the account that you are using to create your topic. Replace *`<region>`* with the Amazon Region you are using to create your labeling job. Replace `<key_id>` with your customer managed key ID.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "sid1",
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt",
                "kms:GenerateDataKey"
            ],
            "Resource": "arn:aws-cn:kms:us-east-1:111122223333:key/your_key_id"
        }
    ]
}
```

------

For more information on creating and securing keys, see [Creating Keys](https://docs.amazonaws.cn/kms/latest/developerguide/create-keys.html) and [Using Key Policies](https://docs.amazonaws.cn/kms/latest/developerguide/key-policies.html) in the Amazon Key Management Service Developer Guide.

## Subscribe an Endpoint to Your Amazon SNS Output Topic
<a name="sms-streaming-subscribe-output-topic"></a>

When a worker completes a labeling job task from a Ground Truth streaming labeling job, Ground Truth uses your output topic to publish output data to one or more endpoints that you specify. To receive notifications when a worker finishes a labeling task, you must subscribe an endpoint to your Amazon SNS output topic.

To learn how to add endpoints to your output topic, see [ Subscribing to an Amazon SNS topic](https://docs.amazonaws.cn/sns/latest/dg/sns-create-subscribe-endpoint-to-topic.html) in the *Amazon Simple Notification Service Developer Guide*.

To learn more about the output data format that is published to these endpoints, see [Labeling job output data](sms-data-output.md). 

**Important**  
If you do not subscribe an endpoint to your Amazon SNS output topic, you will not receive notifications when new data objects are labeled. 

# Creating Amazon S3 based bucket event notifications based of the Amazon SNS defined in your labeling job
<a name="sms-streaming-s3-setup"></a>

Changes to your Amazon S3 bucket, event notifications, are enabled either the Amazon S3 console, API, language specific Amazon SDKs, or the Amazon Command Line Interface. Events must use the same Amazon SNS input topic ARN, `SnsTopicArn`, specified in the `InputConfig` parameter as part of your `CreateLabelingJob` request.

**Amazon S3 bucket notifications and your input data should not be the same Amazon S3 bucket**  
When you create event notifications do not use the same Amazon S3 location that you specified as your `S3OutputPath` in the `OutputConfig` parameters. Linking the two buckets may result in unwanted data objects being processed by Ground Truth for labeling.

You control the types of events that you want to send to your Amazon SNS topic. Ground Truth creates a labeling job when you send [object creation events](https://docs.amazonaws.cn/AmazonS3/latest/user-guide/enable-event-notifications.html#enable-event-notifications-types).

The event structure sent to your Amazon SNS input topic must be a JSON message formatted using the same structure found in [Event message structure](https://docs.amazonaws.cn/AmazonS3/latest/dev/notification-content-structure.html).

To see examples of how you can set up an event notification for your Amazon S3 bucket using the Amazon S3 console, Amazon SDK for .NET, and Amazon SDK for Java, follow this walkthrough,[Walkthrough: Configure a bucket for notifications (SNS topic or SQS queue)](https://docs.amazonaws.cn/AmazonS3/latest/dev/ways-to-add-notification-config-to-bucket.html) in the *Amazon Simple Storage Service User Guide*.

Amazon EventBridge notifications are not natively supported. To use EventBridge based notification you must update the output format to match the JSON format used in the [Event message structure](https://docs.amazonaws.cn/AmazonS3/latest/dev/notification-content-structure.html).

# Create a Manifest File (Optional)
<a name="sms-streaming-manifest"></a>

When you create a streaming labeling job, you have the one time option to add objects (such as images or text) to an input manifest file that you specify in `ManifestS3Uri` of `CreateLabelingJob`. When the streaming labeling job starts, these objects are sent to workers or added to the Amazon SQS queue if the total number of objects exceed `MaxConcurrentTaskCount`. The results are added to the Amazon S3 path that you specify when creating the labeling job periodically as workers complete labeling tasks. Output data is sent to any endpoint that you subscribe to your output topic. 

If you want to provide initial objects to be labeled, create a manifest file that identifies these objects and place it in Amazon S3. Specify the S3 URI of this manifest file in `ManifestS3Uri` within `InputConfig`.

To learn how to format your manifest file, see [Input data](sms-data-input.md). To use the SageMaker AI console to automatically generate a manifest file (not supported for 3D point cloud task types), see [Automate data setup for labeling jobs](sms-console-create-manifest-file.md).

# Create a Streaming Labeling Job with the SageMaker API
<a name="sms-streaming-create-labeling-job-api"></a>

The following is an example of an [Amazon Python SDK (Boto3) request](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_labeling_job) that you can use to start a streaming labeling job for a built-in task type in the US East (N. Virginia) Region. For more details about each parameter below see [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateLabelingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateLabelingJob.html). To learn how you can create a labeling job using this API and associated language specific SDKs, see [Create a Labeling Job (API)](https://docs.amazonaws.cn/sagemaker/latest/dg/sms-create-labeling-job-api.html).

In this example, note the following parameters:
+ `SnsDataSource` – This parameter appears in `InputConfig` and `OutputConfig` and is used to identify your input and output Amazon SNS topics respectively. To create a streaming labeling job, you are required to provide an Amazon SNS input topic. Optionally, you can also provide an Amazon SNS output topic.
+ `S3DataSource` – This parameter is optional. Use this parameter if you want to include an input manifest file of data objects that you want labeled as soon as the labeling job starts.
+ [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateLabelingJob.html#sagemaker-CreateLabelingJob-request-StoppingConditions](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateLabelingJob.html#sagemaker-CreateLabelingJob-request-StoppingConditions) – This parameter is ignored when you create a streaming labeling job. To learn more about stopping a streaming labeling job, see [Stop a Streaming Labeling Job](sms-streaming-stop-labeling-job.md).
+ Streaming labeling jobs do not support automated data labeling. Do not include the `LabelingJobAlgorithmsConfig` parameter.

```
response = client.create_labeling_job(
    LabelingJobName= 'example-labeling-job',
    LabelAttributeName='label',
    InputConfig={
        'DataSource': {
            'S3DataSource': {
                'ManifestS3Uri': 's3://bucket/path/manifest-with-input-data.json'
            },
            'SnsDataSource': {
                'SnsTopicArn': 'arn:aws:sns:us-east-1:123456789012:your-sns-input-topic'
            }
        },
        'DataAttributes': {
            'ContentClassifiers': [
                'FreeOfPersonallyIdentifiableInformation'|'FreeOfAdultContent',
            ]
        }
    },
    OutputConfig={
        'S3OutputPath': 's3://bucket/path/file-to-store-output-data',
        'KmsKeyId': 'string',
        'SnsTopicArn': 'arn:aws:sns:us-east-1:123456789012:your-sns-output-topic'
    },
    RoleArn='arn:aws:iam::*:role/*',
    LabelCategoryConfigS3Uri='s3://bucket/path/label-categories.json',
    HumanTaskConfig={
        'WorkteamArn': 'arn:aws:sagemaker:us-east-1:*:workteam/private-crowd/*',
        'UiConfig': {
            'UiTemplateS3Uri': 's3://bucket/path/custom-worker-task-template.html'
        },
        'PreHumanTaskLambdaArn': 'arn:aws:lambda:us-east-1:432418664414:function:PRE-tasktype',
        'TaskKeywords': [
            'Example key word',
        ],
        'TaskTitle': 'Multi-label image classification task',
        'TaskDescription': 'Select all labels that apply to the images shown',
        'NumberOfHumanWorkersPerDataObject': 123,
        'TaskTimeLimitInSeconds': 123,
        'TaskAvailabilityLifetimeInSeconds': 123,
        'MaxConcurrentTaskCount': 123,
        'AnnotationConsolidationConfig': {
            'AnnotationConsolidationLambdaArn': 'arn:aws:lambda:us-east-1:432418664414:function:ACS-tasktype'
            }
        },
    Tags=[
        {
            'Key': 'string',
            'Value': 'string'
        },
    ]
)
```

# Stop a Streaming Labeling Job
<a name="sms-streaming-stop-labeling-job"></a>

You can manually stop your streaming labeling job using the operation [StopLabelingJob](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_StopLabelingJob.html). 

If your labeling job remains idle for over 10 days, it is automatically stopped by Ground Truth. In this context, a labeling job is considered *idle* if no objects are sent to the Amazon SNS input topic and no objects remain in your Amazon SQS queue, waiting to be labeled. For example, if no data objects are fed to the Amazon SNS input topic and all the objects fed to the labeling job are already labeled, Ground Truth starts a timer. After the timer starts, if no items are received within a 10 day period, the labeling job is stopped. 

When a labeling job is stopped, its status is `STOPPING` while Ground Truth cleans up labeling job resources and unsubscribes your Amazon SNS topic from your Amazon SQS queue. The Amazon SQS is *not* deleted by Ground Truth because this queue may contain unprocessed data objects. You should manually delete the queue if you want to avoid incurring additional charges from Amazon SQS. To learn more, see [Amazon SQS pricing ](https://aws.amazon.com/sqs/pricing/).