Perform blue/green updates for compute environments
A blue/green update is an update strategy that reduces downtime and risk by creating a new compute environment (green) alongside your existing compute environment (blue). This approach allows you to gradually transition workloads to the new environment while keeping the existing environment operational. Blue/green updates provide the safest update path and work with any service role type or allocation strategy.
Overview
Blue/green updates offer several advantages that make them ideal for production environments. They provide zero downtime by keeping your workloads running continuously during the update process. The approach enables easy rollback capabilities, allowing you to quickly revert to the original environment if issues arise. You can implement a gradual transition strategy, verifying the new environment's performance before fully switching over your production workloads. This method also provides excellent risk mitigation since the original environment remains unchanged and operational until you choose to remove it.
When blue/green updates are required
You must use blue/green updates in the following situations:
-
When your compute environment uses
BEST_FITallocation strategy (doesn't support infrastructure updates) -
When your compute environment doesn't use the AWSServiceRoleForBatch service-linked role
-
When you need to transition between different service role types
When blue/green updates are recommended
Blue/green updates are particularly recommended for production environments where zero downtime is critical for your workloads. This approach works well when you need to test new configurations before transitioning production workloads, ensuring that changes meet your performance and reliability requirements. Choose blue/green updates when quick rollback capability is important for your operations, especially if you're updating custom AMIs with significant changes. This method is also ideal when you want to validate performance characteristics and behavior before fully committing to changes, providing confidence in your update process.
Prerequisites
Before performing a blue/green update, ensure you have:
-
Appropriate IAM permissions to create and manage compute environments
-
Access to view and modify job queue settings
-
Job retry strategies configured for your job definitions to handle potential failures during the transition. For more information, see Automated job retries.
-
The AMI ID for the new compute environment. This can be either:
-
A recent, approved version of the Amazon ECS optimized AMI (used by default)
-
A custom AMI that meets the Amazon ECS container instance AMI specification. When using a custom AMI, you can specify it in one of these ways:
-
Using the Image ID override field in the EC2 configuration
-
Specifying it in a launch template
For more information about creating custom AMIs, see Tutorial: Create a compute resource AMI.
-
-
Before creating the new environment, you need to record the configuration of your existing compute environment. You can do this using either the Amazon Web Services Management Console or the Amazon CLI.
Note
The following procedures detail how to perform a blue/green update that only changes the AMI. You can update other settings for the new environment.
Important
When you remove the old (blue) compute environment, any currently running jobs on those instances will fail because the instances will be terminated. Configure job retry strategies in your job definitions to handle these failures automatically. For more information, see Automated job retries.
Once you're confident in the new environment:
-
Edit the job queue to remove the old compute environment.
-
Wait for any running jobs in the old environment to complete.
-
Delete the old compute environment.