Reference: Example of an array job workflow
A common workflow for Amazon Batch customers is to run a prerequisite setup job, run a series of commands against a large number of input tasks, and then conclude with a job that aggregates results and writes summary data to Amazon S3, DynamoDB, Amazon Redshift, or Aurora.
For example:
-
JobA
: A standard, non-array job that performs a quick listing and metadata validation of objects in an Amazon S3 bucket,BucketA
. The SubmitJob JSON syntax is as follows.{ "jobName": "JobA", "jobQueue": "ProdQueue", "jobDefinition": "JobA-list-and-validate:1" }
-
JobB
: An array job with 10,000 copies that is dependent uponJobA
that runs CPU-intensive commands against each object inBucketA
and uploads results toBucketB
. The SubmitJob JSON syntax is as follows.{ "jobName": "JobB", "jobQueue": "ProdQueue", "jobDefinition": "JobB-CPU-Intensive-Processing:1", "containerOverrides": { "resourceRequirements": [ { "type": "MEMORY", "value": "4096" }, { "type": "VCPU", "value": "32" } ] } "arrayProperties": { "size": 10000 }, "dependsOn": [ { "jobId": "
JobA_job_ID
" } ] } -
JobC
: Another 10,000 copy array job that's dependent uponJobB
with anN_TO_N
dependency model, that runs memory-intensive commands against each item inBucketB
, writes metadata to DynamoDB, and uploads the resulting output toBucketC
. The SubmitJob JSON syntax is as follows.{ "jobName": "JobC", "jobQueue": "ProdQueue", "jobDefinition": "JobC-Memory-Intensive-Processing:1", "containerOverrides": { "resourceRequirements": [ { "type": "MEMORY", "value": "32768" }, { "type": "VCPU", "value": "1" } ] } "arrayProperties": { "size": 10000 }, "dependsOn": [ { "jobId": "
JobB_job_ID
", "type": "N_TO_N" } ] } -
JobD
: An array job that performs 10 validation steps that each need to query DynamoDB and might interact with any of the above Amazon S3 buckets. Each of the steps inJobD
run the same command. However, the behavior is different based on the value of theAWS_BATCH_JOB_ARRAY_INDEX
environment variable within the job's container. These validation steps run sequentially (for example,JobD:0
and thenJobD:1
). The SubmitJob JSON syntax is as follows.{ "jobName": "JobD", "jobQueue": "ProdQueue", "jobDefinition": "JobD-Sequential-Validation:1", "containerOverrides": { "resourceRequirements": [ { "type": "MEMORY", "value": "32768" }, { "type": "VCPU", "value": "1" } ] } "arrayProperties": { "size": 10 }, "dependsOn": [ { "jobId": "
JobC_job_ID
" }, { "type": "SEQUENTIAL" }, ] } -
JobE
: A final, non-array job that performs some simple cleanup operations and sends an Amazon SNS notification with a message that the pipeline has completed and a link to the output URL. The SubmitJob JSON syntax is as follows.{ "jobName": "JobE", "jobQueue": "ProdQueue", "jobDefinition": "JobE-Cleanup-and-Notification:1", "parameters": { "SourceBucket": "s3://amzn-s3-demo-source-bucket", "Recipient": "pipeline-notifications@mycompany.com" }, "dependsOn": [ { "jobId": "
JobD_job_ID
" } ] }