Tutorial: Getting started with Amazon Batch on Amazon EKS
Amazon Batch on Amazon EKS is a managed service for scheduling and scaling batch workloads into existing Amazon EKS clusters. Amazon Batch doesn't create, administer, or perform lifecycle operations of your Amazon EKS clusters on your behalf. Amazon Batch orchestration scales up and down nodes managed by Amazon Batch and run pods on those nodes.
Amazon Batch doesn't touch nodes, auto scaling node groups or pods lifecycles that aren't
associated with Amazon Batch compute environments within your Amazon EKS cluster. For Amazon Batch to operate
effectively, its service-linked role needs Kubernetes
role-based access control (RBAC) permissions in your existing Amazon EKS cluster. For more information,
see Using RBAC
Authorization
Amazon Batch requires a Kubernetes namespace where it can scope pods as Amazon Batch jobs into. We recommend a dedicated namespace to isolate the Amazon Batch pods from your other cluster workloads.
After Amazon Batch has been given RBAC access and a namespace has been established, you can associate that Amazon EKS cluster to an Amazon Batch compute environment using the CreateComputeEnvironment API operation. A job queue can be associated with this new Amazon EKS compute environment. Amazon Batch jobs are submitted to the job queue based on an Amazon EKS job definition using the SubmitJob API operation. Amazon Batch then launches Amazon Batch managed nodes and place jobs from job queue as Kubernetes pods into the EKS cluster associated with an Amazon Batch compute environment.
The following sections cover how to get set up for Amazon Batch on Amazon EKS.
Contents
Prerequisites
Before starting this tutorial, you must install and configure the following tools and resources that you need to create and manage both Amazon Batch and Amazon EKS resources.
-
Amazon CLI – A command line tool for working with Amazon services, including Amazon EKS. This guide requires that you use version 2.8.6 or later or 1.26.0 or later. For more information, see Installing, updating, and uninstalling the Amazon CLI in the Amazon Command Line Interface User Guide. After installing the Amazon CLI, we recommend that you also configure it. For more information, see Quick configuration with
aws configure
in the Amazon Command Line Interface User Guide. -
kubectl
– A command line tool for working with Kubernetes clusters. This guide requires that you use version1.23
or later. For more information, see Installing or updatingkubectl
in the Amazon EKS User Guide. -
– A command line tool for working with Amazon EKS clusters that automates many individual tasks. This guide requires that you use versioneksctl
0.115.0
or later. For more information, see Installing or updating
in the Amazon EKS User Guide.eksctl
-
Required IAM permissions – The IAM security principal that you're using must have permissions to work with Amazon EKS IAM roles and service linked roles, Amazon CloudFormation, and a VPC and related resources. For more information, see Actions, resources, and condition keys for Amazon Elastic Kubernetes Service and Using service-linked roles in the IAM User Guide. You must complete all steps in this guide as the same user.
-
Creating an Amazon EKS cluster – For more information, see Getting started with Amazon EKS –
eksctl
in the Amazon EKS User Guide.Note
Amazon Batch only supports Amazon EKS clusters with API server endpoints that have public access, accessible to the public internet. By default, Amazon EKS clusters API server endpoints have public access. For more information, see Amazon EKS cluster endpoint access control in the Amazon EKS User Guide.
Note
Amazon Batch doesn't provide managed-node orchestration for CoreDNS or other deployment pods. If you need CoreDNS, see Adding the CoreDNS Amazon EKS add-on in the Amazon EKS User Guide. Or, use
eksctl create cluster create
to create the cluster, it includes CoreDNS by default. -
Permissions – Users calling the CreateComputeEnvironment API operation to create a compute environment that uses Amazon EKS resources require permissions to the
eks:DescribeCluster
API operation. Using the Amazon Web Services Management Console to create a compute resource using Amazon EKS resources requires permissions to botheks:DescribeCluster
andeks:ListClusters
.
Prepare your Amazon EKS cluster for Amazon Batch
All steps are required.
-
Create a dedicated namespace for Amazon Batch jobs
Use
kubectl
to create a new namespace.$
namespace=
my-aws-batch-namespace
$
cat - <<EOF | kubectl create -f - { "apiVersion": "v1", "kind": "Namespace", "metadata": { "name": "${namespace}", "labels": { "name": "${namespace}" } } } EOF
Output:
namespace/my-aws-batch-namespace created
-
Enable access via role-based access control (RBAC)
Use
kubectl
to create a Kubernetes role for the cluster to allow Amazon Batch to watch nodes and pods, and to bind the role. You must do this once for each EKS cluster.Note
For more information about using RBAC authorization, see Using RBAC Authorization
in the Kubernetes User Guide. $
cat - <<EOF | kubectl apply -f - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name:
aws-batch-cluster-role
rules: - apiGroups: [""] resources: ["namespaces"] verbs: ["get"] - apiGroups: [""] resources: ["nodes"] verbs: ["get", "list", "watch"] - apiGroups: [""] resources: ["pods"] verbs: ["get", "list", "watch"] - apiGroups: [""] resources: ["configmaps"] verbs: ["get", "list", "watch"] - apiGroups: ["apps"] resources: ["daemonsets", "deployments", "statefulsets", "replicasets"] verbs: ["get", "list", "watch"] - apiGroups: ["rbac.authorization.k8s.io"] resources: ["clusterroles", "clusterrolebindings"] verbs: ["get", "list"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name:aws-batch-cluster-role-binding
subjects: - kind: User name:aws-batch
apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name:aws-batch-cluster-role
apiGroup: rbac.authorization.k8s.io EOFOutput:
clusterrole.rbac.authorization.k8s.io/aws-batch-cluster-role created clusterrolebinding.rbac.authorization.k8s.io/aws-batch-cluster-role-binding created
Create namespace-scoped Kubernetes role for Amazon Batch to manage and lifecycle pods and bind it. You must do this once for each unique namespace.
$
namespace=
my-aws-batch-namespace
$
cat - <<EOF | kubectl apply -f - --namespace "${namespace}" apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name:
aws-batch-compute-environment-role
namespace: ${namespace} rules: - apiGroups: [""] resources: ["pods"] verbs: ["create", "get", "list", "watch", "delete", "patch"] - apiGroups: [""] resources: ["serviceaccounts"] verbs: ["get", "list"] - apiGroups: ["rbac.authorization.k8s.io"] resources: ["roles", "rolebindings"] verbs: ["get", "list"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name:aws-batch-compute-environment-role-binding
namespace: ${namespace} subjects: - kind: User name:aws-batch
apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name:aws-batch-compute-environment-role
apiGroup: rbac.authorization.k8s.io EOFOutput:
role.rbac.authorization.k8s.io/aws-batch-compute-environment-role created rolebinding.rbac.authorization.k8s.io/aws-batch-compute-environment-role-binding created
Update Kubernetes
aws-auth
configuration map to map the preceding RBAC permissions to the Amazon Batch service-linked role.$
eksctl create iamidentitymapping \ --cluster
my-cluster-name
\ --arn "arn:aws-cn:iam::<your-account>
:role/AWSServiceRoleForBatch" \ --usernameaws-batch
Output:
2022-10-25 20:19:57 [ℹ] adding identity "arn:aws-cn:iam::
<your-account>
:role/AWSServiceRoleForBatch" to auth ConfigMapNote
The path
aws-service-role/batch.amazonaws.com/
has been removed from the ARN of the service-linked role. This is because of an issue with theaws-auth
configuration map. For more information, see Roles with paths don't work when the path is included in their ARN in the aws-authconfigmap.
Create an Amazon EKS compute environment
Amazon Batch compute environments define compute resource parameters to meet your batch workload needs. In a managed compute environment, Amazon Batch helps you to manage the capacity and instance types of the compute resources (Kubernetes nodes) within your Amazon EKS cluster. This is based on the compute resource specification that you define when you create the compute environment. You can use EC2 On-Demand Instances or EC2 Spot Instances.
Now that the AWSServiceRoleForBatch service-linked role has access to your Amazon EKS cluster, you can create Amazon Batch resources. First, create a compute environment that points to your Amazon EKS cluster.
$
cat <<EOF > ./batch-eks-compute-environment.json { "computeEnvironmentName": "
My-Eks-CE1
", "type": "MANAGED", "state": "ENABLED", "eksConfiguration": { "eksClusterArn": "arn:aws-cn:eks:<region>
:123456789012
:cluster/<cluster-name>
", "kubernetesNamespace": "my-aws-batch-namespace
" }, "computeResources": { "type": "EC2", "allocationStrategy": "BEST_FIT_PROGRESSIVE", "minvCpus": 0, "maxvCpus": 128, "instanceTypes": [ "m5" ], "subnets": [ "<eks-cluster-subnets-with-access-to-internet-for-image-pull>
" ], "securityGroupIds": [ "<eks-cluster-sg>
" ], "instanceRole": "<eks-instance-profile>
" } } EOF$
aws batch create-compute-environment --cli-input-json file://./batch-eks-compute-environment.json
Notes
-
The
serviceRole
parameter should not be specified, then the Amazon Batch service-linked role will be used. Amazon Batch on Amazon EKS only supports the Amazon Batch service-linked role. -
Only
BEST_FIT_PROGRESSIVE
,SPOT_CAPACITY_OPTIMIZED
, andSPOT_PRICE_CAPACITY_OPTIMIZED
allocation strategies are supported for Amazon EKS compute environments.Note
We recommend that you use
SPOT_PRICE_CAPACITY_OPTIMIZED
rather thanSPOT_CAPACITY_OPTIMIZED
in most instances. -
For the
instanceRole
, see Creating the Amazon EKS node IAM role and Enabling IAM principal access to your cluster in the Amazon EKS User Guide. If you're using pod networking, see Configuring the Amazon VPC CNI plugin for Kubernetes to use IAM roles for service accounts in the Amazon EKS User Guide. -
A way to get working subnets for the
subnets
parameter is to use the Amazon EKS managed node groups public subnets that were created byeksctl
when creating an Amazon EKS cluster. Otherwise, use subnets that have a network path that supports pulling images. -
The
securityGroupIds
parameter can use the same security group as the Amazon EKS cluster. This command retrieves the security group ID for the cluster.$
eks describe-cluster \ --name
<cluster-name>
\ --query cluster.resourcesVpcConfig.clusterSecurityGroupId -
Maintenance of an Amazon EKS compute environment is a shared responsibility. For more information, see Shared responsibility of the Kubernetes nodes.
Important
It's important to confirm that the compute environment is healthy before proceeding. The DescribeComputeEnvironments API operation can be used to do this.
$
aws batch describe-compute-environments --compute-environments
My-Eks-CE1
Confirm that the status
parameter is not INVALID
. If it is,
look at the statusReason
parameter for the cause. For more information, see
Troubleshooting Amazon Batch.
Create a job queue and attach the compute environment
$
aws batch describe-compute-environments --compute-environments
My-Eks-CE1
Jobs submitted to this new job queue are run as pods on Amazon Batch managed nodes that joined the Amazon EKS cluster that's associated with your compute environment.
$
cat <<EOF > ./batch-eks-job-queue.json { "jobQueueName": "
My-Eks-JQ1
", "priority": 10, "computeEnvironmentOrder": [ { "order": 1, "computeEnvironment": "My-Eks-CE1
" } ] } EOF$
aws batch create-job-queue --cli-input-json file://./batch-eks-job-queue.json
Create a job definition
$
cat <<EOF > ./batch-eks-job-definition.json { "jobDefinitionName": "
MyJobOnEks_Sleep
", "type": "container", "eksProperties": { "podProperties": { "hostNetwork": true, "containers": [ { "image": "public.ecr.aws/amazonlinux/amazonlinux:2", "command": [ "sleep", "60" ], "resources": { "limits": { "cpu": "1", "memory": "1024Mi" } } } ], "metadata": { "labels": { "environment": "test
" } } } } } EOF$
aws batch register-job-definition --cli-input-json file://./batch-eks-job-definition.json
Notes
-
Only single container jobs are supported.
-
There are considerations for the
cpu
andmemory
parameters. For more information, see Memory and vCPU considerations for Amazon Batch on Amazon EKS.
Submit a job
$
aws batch submit-job --job-queue
My-Eks-JQ1
\ --job-definitionMyJobOnEks_Sleep
--job-nameMy-Eks-Job1
$
aws batch describe-jobs --job
<jobId-from-submit-response>
Notes
-
Only single container jobs are supported.
-
Make sure you're familiar with all the relevant considerations for the
cpu
andmemory
parameters. For more information, see Memory and vCPU considerations for Amazon Batch on Amazon EKS. -
For more information about running jobs on Amazon EKS resources, see Amazon EKS jobs.
(Optional) Submit a job with overrides
This job overrides the command passed to the container.
$
cat <<EOF > ./submit-job-override.json { "jobName": "
EksWithOverrides
", "jobQueue": "My-Eks-JQ1
", "jobDefinition": "MyJobOnEks_Sleep
", "eksPropertiesOverride": { "podProperties": { "containers": [ { "command": [ "/bin/sh" ], "args": [ "-c", "echo hello world" ] } ] } } } EOF$
aws batch submit-job --cli-input-json file://./submit-job-override.json
Notes
-
Amazon Batch aggressively cleans up the pods after the jobs complete to reduce the load to Kubernetes. To examine the details of a job, logging must be configured. For more information, see Use CloudWatch Logs to monitor Amazon Batch on Amazon EKS jobs.
-
For improved visibility into the details of the operations, enable Amazon EKS control plane logging. For more information, see Amazon EKS control plane logging in the Amazon EKS User Guide.
-
Daemonsets and kubelets overhead affects available vCPU and memory resources, specifically scaling and job placement. For more information, see Memory and vCPU considerations for Amazon Batch on Amazon EKS.