Shared responsibility of the Kubernetes nodes - Amazon Batch
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Shared responsibility of the Kubernetes nodes

Maintenance of the compute environments is a shared responsibility.

  • Don't change or remove Amazon Batch nodes, labels, taints, namespaces, launch templates, or auto scaling groups. Don't add taints to Amazon Batch managed nodes. If you make any of these changes, your compute environment cannot be supported and failures including idle instances occur.

  • Don't target your pods to Amazon Batch managed nodes. If you target your pods to the managed nodes, broken scaling and stuck job queues occur. Run workloads that don't use Amazon Batch on self-managed nodes or managed node groups. For more information, see Managed node groups in the Amazon EKS User Guide.

  • You can target a DaemonSet to run on Amazon Batch managed nodes. For more information, see Run a DaemonSet on Amazon Batch managed nodes.

Amazon Batch doesn't automatically update compute environment AMIs. It's your responsibility to update them. Run the following command to update your AMIs to the latest AMI version.

$ aws batch update-compute-environment \ --compute-environment <compute-environment-name> \ --compute-resources 'updateToLatestImageVersion=true'

Amazon Batch doesn't automatically upgrade the Kubernetes version. Run the following command to update the Kubernetes version of your computer environment to 1.23.

$ aws batch update-compute-environment \ --compute-environment <compute-environment-name> \ --compute-resources \ 'ec2Configuration=[{imageType=EKS_AL2,imageKubernetesVersion=1.23}]'

When updating to a more recent AMI or the Kubernetes version, you can specify whether to terminate jobs when they're updated (terminateJobsOnUpdate) and how long to wait for before an instance is replaced if running jobs don't finish (jobExecutionTimeoutMinutes.) For more information, see Updating compute environments and the infrastructure update policy (UpdatePolicy) set in the UpdateComputeEnvironment API operation.