View a markdown version of this page

Submitting jobs to a quota share - Amazon Batch
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Submitting jobs to a quota share

Quota management job queues require that all jobs specify a quota share at job submission. To submit jobs to a quota share, specify the quotaShareName in SubmitServiceJob. A preemptionConfiguration can optionally be supplied to limit the number of preemption attempts before a job attempt enters FAILED. To limit the number of preemptions a job experiences, set preemptionRetriesBeforeTermination within ServiceJobPreemptionConfiguration on job submission.

Prerequisites

Before submitting jobs to a quota share, ensure you have:

Submit a service job to a quota share

The table below shows how to submit a service job to a quota share using either the SageMaker Python SDK or the Amazon CLI:

Submit using the SageMaker Python SDK

The SageMaker Python SDK has built-in support for submitting jobs to a quota management enabled job queue. The following examples show how to create a model trainer, create a training queue, and submit jobs to a quota share. For a complete example, see the full sample notebook on GitHub.

Create a ModelTrainer that defines the training job configuration.

from sagemaker.train.model_trainer import ModelTrainer from sagemaker.train.configs import SourceCode, Compute, StoppingCondition source_code = SourceCode(command="echo 'Hello World'") model_trainer = ModelTrainer( training_image="123456789012.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.5-gpu-py311", source_code=source_code, base_job_name="my-training-job", compute=Compute(instance_type="ml.g5.xlarge", instance_count=1), stopping_condition=StoppingCondition(max_runtime_in_seconds=300), )

Create a TrainingQueue object that references your quota management enabled job queue by name.

from sagemaker.train.aws_batch.training_queue import TrainingQueue queue = TrainingQueue("my-sagemaker-job-queue")

Submit jobs to a quota share by calling queue.submit and specifying the quota_share_name. You should set a priority to influence job ordering within the quota share. A real-world ModelTrainer will require inputs so that it has data to train on.

job = queue.submit( job_name="my-training-job", training_job=model_trainer, quota_share_name="my_quota_share", priority=3, inputs=None, )
Submit using the Amazon CLI

The following example uses the submit-service-job command to submit a job to a quota share.

aws batch submit-service-job \ --job-name "my-sagemaker-training-job" \ --job-queue "my-sagemaker-job-queue" \ --service-job-type "SAGEMAKER_TRAINING" \ --quota-share-name "my_quota_share" \ --timeout-config '{"attemptDurationSeconds":3600}' \ --scheduling-priority 5 \ --service-request-payload '{\"TrainingJobName\": \"sagemaker-training-job-example\", \"AlgorithmSpecification\": {\"TrainingImage\": \"123456789012.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:1.8.0-cpu-py3\", \"TrainingInputMode\": \"File\", \"ContainerEntrypoint\": [\"sleep\", \"1\"]}, \"RoleArn\":\"arn:aws:iam::123456789012:role/SageMakerExecutionRole\", \"OutputDataConfig\": {\"S3OutputPath\": \"s3://example-bucket/model-output/\"}, \"ResourceConfig\": {\"InstanceType\": \"ml.m5.large\", \"InstanceCount\": 1, \"VolumeSizeInGB\": 1}}'"