Submit a service job in Amazon Batch - Amazon Batch
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Submit a service job in Amazon Batch

To submit service jobs to Amazon Batch, you use the SubmitServiceJob API. You can submit jobs using the Amazon CLI or SDK.

If you don't already have an execution role then you must create one before you can submit your service job. To create the SageMaker AI execution role, see How to use SageMaker AI execution roles in the SageMaker AI Developer guide.

Service job submission workflow

When you submit a service job, Amazon Batch follows this workflow:

  1. Amazon Batch receives your SubmitServiceJob request and validates the Amazon Batch-specific parameters. The serviceRequestPayload is passed through without validation.

  2. The job enters the SUBMITTED state and is placed in the specified job queue

  3. Amazon Batch evaluates if there is available capacity in the service environment for RUNNABLE jobs at the front of the queue

  4. If capacity is available, the job moves to SCHEDULED and the job has been passed to SageMaker AI

  5. When capacity has been acquired and SageMaker AI has downloaded the service job data, the service job will start initialization and the job is changed to STARTING.

  6. When SageMaker AI starts to execute the job its status is changed to RUNNING.

  7. While SageMaker AI executes the job, Amazon Batch monitors its progress and maps service states to Amazon Batch job states. For details about how service job states are mapped, see Mapping Amazon Batch service job status to SageMaker AI status

  8. When the service job is completed it moves to SUCCEEDED and any output is ready to be downloaded.

Prerequisites

Before submitting a servicde job, ensure you have:

Submit a service job with the Amazon CLI

The following shows how to submit a service job using the Amazon CLI:

aws batch submit-service-job \ --job-name "my-sagemaker-training-job" \ --job-queue "my-sagemaker-job-queue" \ --service-job-type "SAGEMAKER_TRAINING" \ --service-request-payload '{\"TrainingJobName\": \"sagemaker-training-job-example\", \"AlgorithmSpecification\": {\"TrainingImage\": \"123456789012.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:1.8.0-cpu-py3\", \"TrainingInputMode\": \"File\", \"ContainerEntrypoint\": [\"sleep\", \"1\"]}, \"RoleArn\":\"arn:aws:iam::123456789012:role/SageMakerExecutionRole\", \"OutputDataConfig\": {\"S3OutputPath\": \"s3://example-bucket/model-output/\"}, \"ResourceConfig\": {\"InstanceType\": \"ml.m5.large\", \"InstanceCount\": 1, \"VolumeSizeInGB\": 1}}' --client-token "unique-token-12345"

For more information about the serviceRequestPayload parameters, see Service job payloads in Amazon Batch.