Train a Model with Amazon SageMaker - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Train a Model with Amazon SageMaker

The following diagram shows how you train and deploy a model with Amazon SageMaker. Your training code accesses your training data and outputs model artifacts from an S3 bucket. Then you can make requests to a model endpoint to run inference. You can store both the training and inference container images in an Amazon Elastic Container Registry (ECR).


            Your code interacts with an S3 bucket, endpoint and ECR during model training and deployment.

The following guide highlights two components of SageMaker: model training and model deployment.

To train a model in SageMaker, you create a training job. The training job includes the following information:

  • The URL of the Amazon Simple Storage Service (Amazon S3) bucket where you've stored the training data.

  • The compute resources that you want SageMaker to use for model training. Compute resources are machine learning (ML) compute instances that are managed by SageMaker.

  • The URL of the S3 bucket where you want to store the output of the job.

  • The Amazon Elastic Container Registry path where the training code is stored. For more information, see Docker Registry Paths and Example Code.

Note

Your input dataset must be in the same Amazon Web Services Region as your training job.

You have the following options for a training algorithm:

After you create the training job, SageMaker launches the ML compute instances and uses the training code and the training dataset to train the model. It saves the resulting model artifacts and other output in the S3 bucket you specified for that purpose.

You can create a training job with the SageMaker console or the API. For information about creating a training job with the API, see the CreateTrainingJob API.

When you create a training job with the API, SageMaker replicates the entire dataset on ML compute instances by default. To make SageMaker replicate a subset of the data on each ML compute instance, you must set the S3DataDistributionType field to ShardedByS3Key. You can set this field using the low-level SDK. For more information, see S3DataDistributionType in S3DataSource.

Important

To prevent your algorithm container from contending for memory, we reserve memory for our SageMaker critical system processes on your ML compute instances and therefore you cannot expect to see all the memory for your instance type.