Input Data - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Input Data

The input data are the data objects that you send to your workforce to be labeled. There are two ways to send data objects to Ground Truth for labeling:

  • Send a list of data objects that require labeling using an input manifest file.

  • Send individual data objects in real time to a perpetually running, streaming labeling job.

If you have a dataset that needs to be labeled one time, and you do not require an ongoing labeling job, create a standard labeling job using an input manifest file.

If you want to regularly send new data objects to your labeling job after it has started, create a streaming labeling job. When you create a streaming labeling job, you can optionally use an input manifest file to specify a group of data that you want labeled immediately when the job starts. You can continuously send new data objects to a streaming labeling job as long as it is active.

Note

Streaming labeling jobs are only supported through the SageMaker API. You cannot create a streaming labeling job using the SageMaker console.

The following task types have special input data requirements and options: