Make batch predictions - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Make batch predictions

Make batch predictions when you have an entire dataset for which you’d like to generate predictions.

There are two types of batch predictions you can make:

  • Manual batch predictions are when you have a dataset for which you want to make one-time predictions.

  • Automatic batch predictions are when you set up a configuration that runs a batch prediction whenever a specific dataset is updated. For example, if you’ve configured weekly updates to a SageMaker Canvas dataset of inventory data, you can set up automatic batch predictions that run whenever you update the dataset. After setting up an automated batch predictions workflow, see Manage automations for more information about viewing and editing the details of your configuration. For more information about setting up automatic dataset updates, see Configure automatic updates for a dataset.

Note

You can only set up automatic batch predictions for datasets imported through local upload or Amazon S3. Additionally, automatic batch predictions can only run while you’re logged in to the Canvas application. If you log out of Canvas, automatic batch prediction jobs resume when you log back in.

To get started, reviewing the following section for batch prediction dataset requirements, and then choose one of the following manual or automatic batch prediction workflows.

Batch prediction dataset requirements

For batch predictions, make sure that your datasets meet the requirements outlined in Create a dataset.

You might not be able to make predictions on some datasets because they have incompatible schemas. A schema is an organizational structure. For a tabular dataset, the schema is the names of the columns and the data type of the data in the columns. An incompatible schema might happen for one of the following reasons:

  • The dataset that you're using to make predictions has fewer columns than the dataset that you're using to build the model.

  • The data types in the columns you used to build the dataset might be different from the data types in dataset that you're using to make predictions.

  • The dataset that you're using to make predictions and the dataset that you've used to build the model have column names that don't match. The column names are case sensitive. Column1 is not the same as column1.

To ensure that you can successfully generate batch predictions, match the schema of your batch predictions dataset to the dataset you used to train the model.

Note

For batch predictions, if you dropped any columns when building your model, Canvas adds the dropped columns back to the prediction results. However, Canvas does not add the dropped columns to your batch predictions for time series models.

Make manual batch predictions

Choose one of the following procedures to make manual batch predictions based on your model type.

Make manual batch predictions with numeric, categorical, and time series forecasting models

To make manual batch predictions for numeric, categorical, and time series forecasting model types, do the following:

  1. In the left navigation pane of the Canvas application, choose My models.

  2. On the My models page, choose your model.

  3. After opening your model, choose the Predict tab.

  4. On the Run predictions page, choose Batch prediction.

  5. Choose Select dataset to pick a dataset for generating predictions.

  6. From the list of available datasets, select your dataset, and then choose Start Predictions to get your predictions.

After the prediction job finishes running, there is an output dataset listed on the same page in the Predictions section. This dataset contains your results, and if you select the More options icon ( ), you can choose Preview to preview the output data. You can see the input data matched to the prediction and the probability that the prediction is correct. Then, you can choose Download prediction to download the results as a file.

Make manual batch predictions with image prediction models

To make manual batch predictions for a single-label image prediction model, do the following:

  1. In the left navigation pane of the Canvas application, choose My models.

  2. On the My models page, choose your model.

  3. After opening your model, choose the Predict tab.

  4. On the Run predictions page, choose Batch prediction.

  5. Choose Select dataset if you’ve already imported your dataset. If not, choose Import new dataset, and then you’ll be directed through the import data workflow.

  6. From the list of available datasets, select your dataset and choose Generate predictions to get your predictions.

After the prediction job finishes running, on the Run predictions page, you see an output dataset listed under Predictions. This dataset contains your results, and if you select the More options icon ( ), you can choose View prediction results to see the output data. You can see the images along with their predicted labels and confidence scores. Then, you can choose Download prediction to download the results as a CSV or a ZIP file.

Make manual batch predictions with text prediction models

To make manual batch predictions for a multi-category text prediction model, do the following:

  1. In the left navigation pane of the Canvas application, choose My models.

  2. On the My models page, choose your model.

  3. After opening your model, choose the Predict tab.

  4. On the Run predictions page, choose Batch prediction.

  5. Choose Select dataset if you’ve already imported your dataset. If not, choose Import new dataset, and then you’ll be directed through the import data workflow. The dataset you choose must have the same source column as the dataset with which you built the model.

  6. From the list of available datasets, select your dataset and choose Generate predictions to get your predictions.

After the prediction job finishes running, on the Run predictions page, you see an output dataset listed under Predictions. This dataset contains your results, and if you select the More options icon ( ), you can choose Preview to see the output data. You can see the images along with their predicted labels and confidence scores. Then, you can choose Download prediction to download the results.

Make automatic batch predictions

To set up a schedule for automatic batch predictions, do the following:

  1. In the left navigation pane of Canvas, choose My models.

  2. Choose your model.

  3. Choose the Predict tab.

  4. Choose Batch prediction.

  5. For Generate predictions, choose Automatic.

  6. The Automate batch predictions dialog box pops up. Choose Select dataset and choose the dataset for which you want to automate predictions. Note that you can only select a dataset that was imported through local upload or Amazon S3.

  7. After selecting a dataset, choose Set up.

Canvas runs a batch predictions job for the dataset after you set up the configuration. Then, every time you Update a dataset, either manually or automatically, another batch predictions job runs.

After the prediction job finishes running, on the Run predictions page, you see an output dataset listed under Predictions. This dataset contains your results, and if you select the More options icon ( ), you can choose Preview to preview the output data. You can see the input data matched to the prediction and the probability that the prediction is correct. Then, you can choose Download to download the results.

The following sections describe how to view, update, and delete your automatic batch prediction configuration through the Datasets page in the Canvas application. You can only set up a maximum of 20 automatic configurations in Canvas. For more information about viewing your automated batch predictions job history or making changes to your automatic configuration through the Automations page, see Manage automations.

View your automatic batch prediction jobs

To view your job history for your automatic batch predictions, go to the Predict tab of your model.

Each automatic batch prediction job shows up in the Predict tab of your model. Under Predictions, you can see the All jobs tab and the Configuration tabs:

  • All jobs – In this tab, you can see all of the batch prediction jobs for this model. You can filter the jobs by configuration name. For each job, you can see fields such as the Input dataset, which includes the version of the dataset, and the Prediction type, such as whether the predictions were automatic or manual. If you choose the More options icon ( ), you can choose View prediction or Download prediction.

  • Configuration – In this tab, you can see all of the automatic batch prediction configurations you’ve created for this model. For each configuration, you can see fields such as the timestamp for when it was Created, the Input dataset it tracks for updates, and the Next job scheduled. If you choose the More options icon ( ), you can choose View all jobs to see the job history and in progress jobs for the configuration.

Edit your automatic batch prediction configuration

You might want to make changes to your auto update configuration for a dataset, such as changing the frequency of the updates. You might also want to turn off your automatic update configuration to pause the updates to your dataset.

When you edit a batch prediction configuration, you can change the target dataset but not the frequency (since automatic batch predictions occur whenever the dataset is updated).

To edit your auto update configuration, do the following:

  1. Go to the Predict tab of your model.

  2. Under Predictions, choose the Configuration tab.

  3. Find your configuration and choose the More options icon ( ).

  4. From the dropdown menu, choose Update configuration.

  5. The Automate batch prediction dialog box opens. You can select another dataset and choose Set up to save your changes.

Your automatic batch predictions configuration is now updated.

To pause your automatic batch predictions, turn off your automatic configuration by doing the following:

  1. Go to the Predict tab of your model.

  2. Under Predictions, choose the Configuration tab.

  3. Find your configuration from the list and turn off the Auto update toggle.

Automatic batch predictions are now paused. You can turn the toggle back on at any time to resume the update schedule.

Delete your automatic batch prediction configuration

To learn how to delete your automatic batch prediction configuration, see Delete an automatic configuration.

You can also delete your configuration by doing the following:

  1. Go to the Predict tab of your model.

  2. Under Predictions, choose the Configuration tab.

  3. Find your configuration from the list and choose the More options icon ( ).

  4. From the dropdown menu, choose Delete configuration.

Your configuration should now be deleted.