

# Use Amazon SageMaker Ground Truth Plus to Label Data
Ground Truth Plus

Amazon SageMaker Ground Truth Plus is a turnkey data labeling service that uses an expert workforce to deliver high-quality annotations quickly and reduces costs by up to 40%. Using SageMaker Ground Truth Plus, data scientists and business managers, such as data operations managers and program managers, can create high-quality training datasets without having to build labeling applications and manage labeling workforces on their own. You can get started with Amazon SageMaker Ground Truth Plus by uploading data along with the labeling requirements in Amazon S3. 
<a name="why-use-gtp"></a>
**Why use SageMaker Ground Truth Plus?**  
To train a machine learning (ML) model, data scientists need large, high-quality, labeled datasets. As ML adoption grows, labeling needs increase. This forces data scientists to spend weeks on building data labeling workflows and managing a data labeling workforce. Unfortunately, this slows down innovation and increases cost. To ensure data scientists can spend their time building, training, and deploying ML models, data scientists typically task other in-house teams consisting of data operations managers and program managers to produce high-quality training datasets. However, these teams typically don't have access to skills required to deliver high-quality training datasets, which affects ML results. As a result, you look for a data labeling partner that can help them create high-quality training datasets at scale without consuming their in-house resources.

When you upload the data, SageMaker Ground Truth Plus sets up the data labeling workflows and operates them on your behalf. From there, an expert workforce trained on a varierty of machine learning (ML) tasks performs data labeling. SageMaker Ground Truth Plus currently offers two types of expert workforce: an Amazon employed workforce and a curated list of third-party vendors. SageMaker Ground Truth Plus provides you with the flexibility to choose the labeling workforce. Amazon experts select the best labeling workforce based on your project requirements. For example, if you need people proficient in labeling audio files, specify that in the guidelines provided to SageMaker Ground Truth Plus, and the service automatically selects labelers with those skills. 

**Important**  
SageMaker Ground Truth Plus does not support PHI, PCI or FedRAMP certified data, and you should not provide this data to SageMaker Ground Truth Plus. 
<a name="how-it-works-gtp"></a>
**How does SageMaker Ground Truth Plus work?**  
There are five main components to a workflow.
+ Requesting a project
+ Creating a project team
+ Accessing the project portal to monitor progress of training datasets and review labeled data
+ Creating a batch
+ Receiving the labeled data
<a name="how-do-i-use-gtp"></a>
**How do I use SageMaker Ground Truth Plus?**  
If you are a first-time user of SageMaker Ground Truth Plus, use [Getting Started with Amazon SageMaker Ground Truth Plus.](gtp-getting-started.md) get started. To access SageMaker Ground Truth Plus using the SageMaker AI console, you must be in US East (N. Virginia) (`us-east-1`).

# Getting Started with Amazon SageMaker Ground Truth Plus.


The guide demonstrates how to complete the necessary steps to start an Amazon SageMaker Ground Truth Plus project, review labels, and satisfy SageMaker Ground Truth Plus prerequisites.

To get started using SageMaker Ground Truth Plus, review [Set up Amazon SageMaker Ground Truth Plus Prerequisites](gtp-getting-started-prerequisites.md) and [Core Components of Amazon SageMaker Ground Truth Plus](gtp-getting-started-core-components.md).

# Set up Amazon SageMaker Ground Truth Plus Prerequisites


The following page describes how to sign up for an Amazon account and configure an administrative user in your account. If you already have an Amazon account and user setup, you can skip this page.

## Sign up for an Amazon Web Services account


If you do not have an Amazon Web Services account, use the following procedure to create one.

**To sign up for Amazon Web Services**

1. Open [http://www.amazonaws.cn/](http://www.amazonaws.cn/) and choose **Sign Up**.

1. Follow the on-screen instructions.

Amazon sends you a confirmation email after the sign-up process is complete. At any time, you can view your current account activity and manage your account by going to [http://www.amazonaws.cn/](http://www.amazonaws.cn/) and choosing **My Account**.

## Secure IAM users


After you sign up for an Amazon Web Services account, safeguard your administrative user by turning on multi-factor authentication (MFA). For instructions, see [Enable a virtual MFA device for an IAM user (console)](https://docs.amazonaws.cn/IAM/latest/UserGuide/id_credentials_mfa_enable_virtual.html#enable-virt-mfa-for-iam-user) in the *IAM User Guide*.

To give other users access to your Amazon Web Services account resources, create IAM users. To secure your IAM users, turn on MFA and only give the IAM users the permissions needed to perform their tasks.

For more information about creating and securing IAM users, see the following topics in the *IAM User Guide*: 
+ [Creating an IAM user in your Amazon Web Services account](https://docs.amazonaws.cn//IAM/latest/UserGuide/id_users_create.html)
+ [Access management for Amazon resources](https://docs.amazonaws.cn/IAM/latest/UserGuide/access.html)
+ [Example IAM identity-based policies](https://docs.amazonaws.cn/IAM/latest/UserGuide/access_policies_examples.html)

# Core Components of Amazon SageMaker Ground Truth Plus


The following terms are key to understanding the capabilities of SageMaker Ground Truth Plus:
+ **Project**: Each qualified engagement with an Amazon expert results in a SageMaker Ground Truth Plus project. A project can be in the pilot or production stage.
+ **Batch**: A batch is a collection of similar recurring data objects such as images, video frames and text to be labeled. A project can have multiple batches.
+ **Metrics**: Metrics are data about your SageMaker Ground Truth Plus project for a specific date or over a date range.
+ **Task type**: SageMaker Ground Truth Plus supports five task types for data labeling. You can also have a custom task type. These include text, image, video, audio, and 3D point cloud.
+ **Data objects**: Individual items that are to be labeled.

# Request a Project


Requesting a new Amazon SageMaker Ground Truth Plus project initiates the engagement with the SageMaker Ground Truth Plus team who works to understand your requirements and deliver a high-quality, labeled dataset that is tailored to your use case. In the project request, you can provide details about your labeling task, such as the task type, dataset size, and any sensitive data. You also need to specify an Amazon IAM role with permissions for SageMaker Ground Truth Plus to access your data and perform the labeling job. The following page shows you how to create a new project request using the SageMaker AI console.

To request a project, do the following:

1. Under the Ground Truth tab of Amazon SageMaker AI, choose **Plus**.

1. On the **SageMaker Ground Truth Plus** page, choose **Request project**.

1. A page titled **Request a project** opens. The page includes fields for **General information** and **Project overview**. Enter the following information

   1. Under **General information**, enter your **First name**, **Last name** and **Business email address**. An Amazon expert uses this information for contacting you to discuss the project after you submit the request.

   1. Under **Project overview**, enter your **Project name** and **Project description**. Choose the **Task type** based on your data and use case. You can also indicate if your data contains personally identifiable information (PII). 

   1. Create or select an IAM role that grants SageMaker Ground Truth Plus permissions to perform a labeling job by choosing one of the options below. 

      1. You can **Create an IAM role** that provides access to any S3 bucket you specify.

      1. You can **Enter a custom IAM role ARN**.

      1. You can choose an existing role.

      1. If you use an existing role or a custom IAM role ARN, make sure you have the following IAM role and trust policy.

         IAM role

------
#### [ JSON ]

****  

         ```
         {
             "Version":"2012-10-17",		 	 	 
             "Statement": [
                 {
                     "Effect": "Allow",
                     "Action": [
                         "s3:GetObject",
                         "s3:GetBucketLocation",
                         "s3:ListBucket",
                         "s3:PutObject"
                     ],
                     "Resource": [
                         "arn:aws-cn:s3:::your-bucket-name",
                         "arn:aws-cn:s3:::your-bucket-name/*"
                     ]
                 }
             ]
         }
         ```

------

         Trust policy

------
#### [ JSON ]

****  

         ```
         {
             "Version":"2012-10-17",		 	 	 
             "Statement": [
                 {
                     "Effect": "Allow",
                     "Principal": {
                         "Service": "sagemaker-ground-truth-plus.amazonaws.com"
                     },
                     "Action": "sts:AssumeRole"
                 }
             ]
         }
         ```

------

1. Choose **Request a project**.

Once you create a project, you can find it on the **SageMaker Ground Truth Plus** page, under the Projects section. The project status should be **Review in-progress**

**Note**  
You cannot have more than 5 projects with the **Review in progress** status.

# Create a Project Team


A project team provides access to the members from your organization or team to track projects, view metrics, and review annotations. You can create a SageMaker Ground Truth Plus project team once you have shared your data in an Amazon S3 bucket.

To add team members using Amazon Cognito, you have two options:

1. Create a new Amazon Cognito user group

   1. Enter an **Amazon Cognito user group name**. This name cannot be changed.

   1. Enter the email addresses of up to 50 team members in the **Email addresses** field. The addresses must be separated by a comma.

   1. Choose **Create project team**.  
![\[Example Create project team section in the console.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/gtb-project-team.png)

   1. Your team members receive an email inviting them to join the SageMaker Ground Truth Plus project team as shown in the following image.   
![\[Example Preview invitation email.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/gtb-email-preview.png)

1. Import team members from existing Amazon Cognito user groups.

   1. Choose a user pool that you have created. User pools require a domain and an existing user group. If you get an error that the domain is missing, set it in the **Domain name** options on the **App integration** page of the Amazon Cognito console for your group.

   1. Choose an app client. We recommend using a client generated by Amazon SageMaker AI.

   1. Choose a user group from your pool to import its members.

   1. Choose **Create project team**.

You can view and manage the list of team members through the Amazon console.

**To add team members after creating the project team:**

1. Choose **Invite new members** in the **Members** section.

1. Enter the email addresses of up to 50 team members in the **Email addresses** field. The addresses must be separated by a comma.

1. Choose **Invite new members**

**To delete existing team members:**

1. Choose the team member to be deleted in the **Members** section.

1. Choose **Delete**.

Once you have added members to your project team, you can open the project portal to access your projects.

# Project Portal


Once you have successfully submitted the intake form and created a project team, you can access the SageMaker Ground Truth Plus project by choosing the **Open project portal** on the Amazon console.

Each project consists of one or more batches. A *batch* is a collection of recurring similar data objects (text, image, video frame, and point cloud) to be labeled. The project portal provides you with transparency into the data labeling process. You can stay updated about a project, create batches within a project, review the progress of the datasets across multiple projects, and analyze project metrics. The project portal also allows you to review a subset of the labeled data and provide feedback. You can configure the columns displayed in your project and batch table.

![\[The project portal for Amazon SageMaker Ground Truth Plus.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/gtp-project-how-it-works.png)


You can use the SageMaker Ground Truth Plus project portal to track the following details about your project.

**Project name**: Each project is identified using a unique name.

**Status**: A SageMaker Ground Truth Plus project has one of the following status types:

1. **Review in progress**: You have successfully submitted the project request form. An Amazon expert is currently reviewing your request.

1. **Request approved**: Your project request is approved. You can now share your data by creating a new batch from the project portal.

1. **Workflow design and setup progress**: An Amazon expert is setting up your project.

1. **Pilot in-progress**: Object labeling for the project in the pilot stage is currently in progress.

1. **Pilot complete**: Object labeling is complete and the labeled data is stored in your Amazon S3 bucket.

1. **Pricing complete**: An Amazon expert shares the pricing for the production project with you.

1. **Contract executed**: The contract is complete.

1. **Production in-progress**: Labeling for the project in the production stage is in progress.

1. **Production complete**: Object labeling is complete and the labeled data is stored in your Amazon S3 bucket.

1. **Paused**: Project is currently paused at your request.

**Task type**: SageMaker Ground Truth Plus lets you label five types of tasks that include text, image, video, audio, and point cloud.

**Batches**: Total number of batches within a project.

**Project creation date**: Starting date of a project.

**Total objects**: Total number of objects to be labeled across all batches.

**Objects completed**: Number of labeled objects.

**Remaining objects**: Number of objects left to be labeled.

**Failed objects**: Number of objects that cannot be labeled due to an issue with the input data.

# Create a Batch


You can use the project portal to create batches for a project after the project status is changed to **Request approved**.

![\[The intake form to create a batch using Amazon SageMaker Ground Truth Plus.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/gtp-create-batch.png)


To create a batch, do the following.

1. Select a project by choosing the project name.

1. A page titled with the project name opens. Under the **Batches** section, choose **Create batch**.

1. Enter the **Batch name**, **Batch description**, **S3 location for input datasets**, and **S3 location for output datasets**.

1. Choose **Submit**.

**To create a batch successfully, make sure you meet the following criteria:**
+ Your data is in the US East (N. Virginia) Region.
+ The maximum size for each file is no more than 2 gigabytes.
+ The maximum number of files in a batch is 10,000.
+ The total size of a batch is less than 100 gigabytes.
+ You have no more than 5 batches with the **Data transfer in-progress** status.

**Note**  
You cannot create a batch before the project status changes to **Request approved**.

# Batch Metrics


Metrics are data about your SageMaker Ground Truth Plus project for a specific date or over a date range.

You can review metrics for all batches or choose a batch of your choice as shown in the following image.

![\[Example histograms of metrics for your batches in the console.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/gtb-review-metrics.png)


You can review the following metrics about the batch:

**Total objects**: Total number of objects in a batch or across all batches.

**Objects completed by day**: Total numbers of objects labeled on a specific date or over a date range.

**Labels completed by day**: Total numbers of labels completed on a specific date or over a date range. An object can have more than one label.

# Batch Details


Every Amazon SageMaker Ground Truth Plus project consists of one or more batches. Each batch is made up of data objects to be labeled. You can view all the batches for your project using the project portal as shown in the following image. 

![\[Example batches for your project in the project portal.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/gtb-review-batch.png)


You can use the SageMaker Ground Truth Plus project portal to track the following details about every batch: 

**Batch name**: Each batch is identified with a unique batch name.

**Status**: A SageMaker Ground Truth Plus batch has one of the following status types:

1. **Request submitted**: You have successfully submitted a new batch.

1. **Data transfer failed**: Data transfer failed with errors. Check the error reason and create a new batch after fixing the error.

1. **Data received**: We have received your unlabeled input data.

1. **In-progress**: Data labeling is in progress.

1. **Ready for review**: Data labeling is completed. A subset of labeled objects from the batch are ready for you to review. This is an optional step.

1. **Review submission in-progress**: Review feedback is currently being processed.

1. **Review complete**: You have successfully reviewed the batch. Next, you have to accept or reject it. This action can not be undone.

1. **Accepted**: You have accepted the labeled data and will receive it in your Amazon S3 bucket shortly.

1. **Rejected**: Labeled data needs to be reworked.

1. **Sent for rework**: Labeled data is sent for rework. You can review the batch after its status changes to **Ready for review**.

1. **Ready for delivery**: Labeled data is ready to be transferred to your Amazon S3 bucket.

1. **Data delivered**: Object labeling is complete and the labeled data is stored in your Amazon S3 bucket.

1. **Paused**: Batch is paused at your request.

**Task type**: SageMaker Ground Truth Plus lets you label five types of tasks that include text, image, video, audio, and point cloud.

**Batch creation date**: Date when the batch was created.

**Total objects**: Total number of objects to be labeled across a batch.

**Completed objects**: Number of labeled objects.

**Remaining objects**: Number of objects left to be labeled.

**Failed objects**: Number of objects that cannot be labeled due to an issue with the input data.

**Objects to review**: Number of objects that are ready for your review.

**Objects with feedback**: Number of objects that have gotten feedback from the team members.

SageMaker Ground Truth Plus lets you review a sample set of your labeled data (determined during the initial consultation call) through the review UI shown in the following image.

![\[A screenshot of the project portal used to review batches.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/gtb-review-ui.png)


The portal allows your project team members and you to review a small sample set of the labeled objects for each batch. You can provide feedback for each labeled object within that subset through this UI. The review UI allows you to navigate across the subset of labeled objects and provide feedback for those labeled objects.

You can perform the following actions using the review UI.
+ Use the arrow controls on the bottom left to navigate through the data objects.
+ You can provide feedback for each object. The **Feedback section** is in the right panel. Choose **Submit** to submit feedback for all images.
+ Use the image controls in the bottom tray to zoom, pan, and control contrast.
+ If you plan on returning to finish up your review, choose **Stop and resume later** on the top right.
+ Choose **Save** to save your progress. Your progress is also autosaved every 15 minutes.
+ To exit the review UI, choose **Close** on the upper right corner of the review UI.
+ You can verify the **Label attributes** and **Frame attributes** on each frame using the panel on the right. You cannot create new objects or modify existing objects in this task.

# Accept or Reject Batches


After you have reviewed a batch, you must choose to accept or reject it.

If you accept a batch, the output from that labeling job is placed in the Amazon S3 bucket that you specify. Once the data is delivered to your S3 bucket, the status of your batch changes from **Accepted** to **Data delivered**.

If you reject a batch, you can provide feedback and explain your reasons for rejecting the batch.

SageMaker Ground Truth Plus allows you to provide feedback at the data object level as well as the batch level. You can provide feedback for data objects through the review UI. You can use the project portal to provide feedback for each batch. When you reject a batch, an Amazon expert contacts you to determine the rework process and the next steps for the batch. 

**Note**  
 Accepting or rejecting a batch is a one-time action and cannot be undone. It is necessary to either accept or reject every batch of the project. 