

# Amazon Batch support for SageMaker AI training jobs
<a name="training-job-queues"></a>

An [Amazon Batch job queue](https://docs.amazonaws.cn/batch/latest/userguide/job_queues.html) stores and prioritizes submitted jobs before they run on compute resources. You can submit SageMaker AI training jobs to a job queue in order to take advantage of the serverless job scheduling and prioritization tools provided by Amazon Batch.

## How it works
<a name="training-job-queues-how-it-works"></a>

The following steps describe the workflow of how to use an Amazon Batch job queue with SageMaker AI training jobs. For more detailed tutorials and example notebooks, see the [Get started](#training-job-queues-get-started) section.
+ Set up Amazon Batch and any necessary permissions. For more information, see [Setting up Amazon Batch](https://docs.amazonaws.cn/batch/latest/userguide/get-set-up-for-aws-batch.html) in the *Amazon Batch User Guide*.
+ Create the following Amazon Batch resources in the console or using the Amazon CLI:
  + [Service environment](https://docs.amazonaws.cn/batch/latest/userguide/service-environments.html) – Contains configuration parameters for integrating with SageMaker AI.
  + [SageMaker AI training job queue](https://docs.amazonaws.cn/batch/latest/userguide/create-sagemaker-job-queue.html) – Integrates with SageMaker AI to submit training jobs.
+ Configure your details and request for a SageMaker AI training job, such as your training container image. To submit a training job to an Amazon Batch queue, you can use the Amazon CLI, the Amazon SDK for Python (Boto3), or the SageMaker AI Python SDK.
+ Submit your training jobs to the job queue. You can use the following options to submit jobs:
  + Use the Amazon Batch [SubmitServiceJob](https://docs.amazonaws.cn/batch/latest/APIReference/API_SubmitServiceJob.html) API.
  + Use the [`aws_batch` module](https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/aws_batch) from the SageMaker AI Python SDK. After creating a TrainingQueue object and a model training object (such as an Estimator or ModelTrainer), you can submit training jobs to the TrainingQueue using the `queue.submit()` method.
+ After submitting jobs, view your job queue and job status with the Amazon Batch console, the Amazon Batch [DescribeServiceJob](https://docs.amazonaws.cn/batch/latest/APIReference/API_DescribeServiceJob.html) API, or the SageMaker AI [DescribeTrainingJob](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_DescribeTrainingJob.html) API.

## Cost and availability
<a name="training-job-queues-cost-availability"></a>

For detailed pricing information about training jobs, see [Amazon SageMaker AI pricing](https://www.amazonaws.cn/sagemaker-ai/pricing/). With Amazon Batch, you only pay for any Amazon resources used, such as Amazon EC2 instances. For more information, see [Amazon Batch pricing](https://www.amazonaws.cn/batch/pricing/).

You can use Amazon Batch for SageMaker AI training jobs in any Amazon Web Services Region where training jobs are available. For more information, see [Amazon SageMaker AI endpoints and quotas](https://docs.amazonaws.cn/general/latest/gr/sagemaker.html).

To ensure you have the required capacity when you need it, you can use SageMaker AI Flexible Training Plans (FTP). These plans allow you to reserve capacity for your training jobs. When combined with Amazon Batch's queuing capabilities, you can maximize utilization during your plan's duration. For more information, see [Reserve training plans for you training jobs or HyperPod clusters](https://docs.amazonaws.cn/sagemaker/latest/dg/reserve-capacity-with-training-plans.html).

## Get started
<a name="training-job-queues-get-started"></a>

For a tutorial on how to set up an Amazon Batch job queue and submit SageMaker AI training jobs, see [Getting started with Amazon Batch on SageMaker AI](https://docs.amazonaws.cn/batch/latest/userguide/getting-started-sagemaker.html) in the *Amazon Batch User Guide*.

For Jupyter notebooks that show how to use the `aws_batch` module in the SageMaker AI Python SDK, see the [Amazon Batch for SageMaker AI Training jobs notebook examples in the amazon-sagemaker-examples GitHub repository](https://github.com/aws/amazon-sagemaker-examples/tree/default/%20%20%20%20%20%20build_and_train_models/sm-training-queues).