Managed Spot Training Lifecycle - Amazon SageMaker AI
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Managed Spot Training Lifecycle

You can monitor a training job using TrainingJobStatus and SecondaryStatus returned by DescribeTrainingJob. The list below shows how TrainingJobStatus and SecondaryStatus values change depending on the training scenario:

  • Spot instances acquired with no interruption during training

    1. InProgress: StartingDownloadingTrainingUploading

  • Spot instances interrupted once. Later, enough spot instances were acquired to finish the training job.

    1. InProgress: StartingDownloadingTrainingInterruptedStartingDownloadingTrainingUploading

  • Spot instances interrupted twice and MaxWaitTimeInSeconds exceeded.

    1. InProgress: StartingDownloadingTrainingInterruptedStartingDownloadingTrainingInterruptedDownloadingTraining

    2. Stopping: Stopping

    3. Stopped: MaxWaitTimeExceeded

  • Spot instances were never launched.

    1. InProgress: Starting

    2. Stopping: Stopping

    3. Stopped: MaxWaitTimeExceeded