Deployment circuit breaker - Amazon Elastic Container Service
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Deployment circuit breaker

The deployment circuit breaker is the rolling update mechanism that determines if the tasks reach a steady state. The deployment circuit breaker has an option that will automatically roll back a failed deployment to the deployment that is in the COMPLETED state.

When a service deployment changes state, Amazon ECS sends a service deployment state change event to EventBridge. This provides a programmatic way to monitor the status of your service deployments. For more information, see Service deployment state change events. We recommend that you create and monitor an EventBridge rule with an eventName of SERVICE_DEPLOYMENT_FAILED so that you can take manual action to start your deployment. For more information, see Creating an EventBridge Rule in the Amazon EventBridge User Guide.

When the deployment circuit breaker determines that a deployment failed, it looks for the most recent deployment that is in a COMPLETED state. This is the deployment that it uses as the roll-back deployment. When the rollback starts, the deployment changes from a COMPLETED to IN_PROGRESS. This means that the deployment is not eligible for another rollback until it reaches the a COMPLETED state. When the deployment circuit breaker does not find a deployment that is in a COMPLETED state, the circuit breaker does not launch new tasks and the deployment is stalled.

Example:

Deployment 1 is in a COMPLETED state.

Deployment 2 cannot start, so the circuit breaker rolls back to Deployment 1. Deployment 1 transitions to the IN_PROGRESS state.

Deployment 3 starts and there is no deployment in the COMPLETED state, so Deployment 3 cannot roll back, or launch tasks.

Consider the following when you use the deployment circuit breaker method on a service. EventBridge generates the rule.

  • The DescribeServices response provides insight into the state of a deployment, the rolloutState and rolloutStateReason. When a new deployment is started, the rollout state begins in an IN_PROGRESS state. When the service reaches a steady state, the rollout state transitions to COMPLETED. If the service fails to reach a steady state and circuit breaker is turned on, the deployment will transition to a FAILED state. A deployment in a FAILED state doesn't launch any new tasks.

  • In addition to the service deployment state change events Amazon ECS sends for deployments that have started and have completed, Amazon ECS also sends an event when a deployment with circuit breaker turned on fails. These events provide details about why a deployment failed or if a deployment was started because of a rollback. For more information, see Service deployment state change events.

  • If a new deployment is started because a previous deployment failed and a rollback occurred, the reason field of the service deployment state change event indicates the deployment was started because of a rollback.

  • The deployment circuit breaker is only supported for Amazon ECS services that use the rolling update (ECS) deployment controller.

  • You must use the new Amazon ECS console, or the Amazon CLI when you use the deployment circuit breaker with the CloudWatch option. For more information, see Create a service using defined parameters and create-service in the Amazon Command Line Interface Reference.

The following create-service Amazon CLI example shows how to create a Linux service when the deployment circuit breaker is used with rollback.

aws ecs create-service \ --service-name MyService \ --deployment-controller type=ECS \ --desired-count 2 \ --deployment-configuration "deploymentCircuitBreaker={enable=true,rollback=true}" \ --task-definition sample-fargate:1 \ --launch-type FARGATE \ --platform-family LINUX \ --platform-version 1.4.0 \ --network-configuration "awsvpcConfiguration={subnets=[subnet-12344321],securityGroups=[sg-12344321],assignPublicIp=ENABLED}"

Failure threshold

The deployment circuit breaker calculates the threshold value, and then uses the value to determine when to move the deployment to a FAILED state.

The deployment circuit breaker has a minimum threshold of 10 and a maximum threshold of 200. and uses the values in the following formula to determine the deployment failure.

Minimum threshold <= 0.5 * desired task count => maximum threshold

When the result of the calculation is greater than the minimum of 10, but smaller than the maximum of 200, the failure threshold is set to the calculated threshold (rounded up).

Note

You cannot change either of the threshold values.

There are two stages for the deployment status check.

  1. The deployment circuit breaker monitors tasks that are part of the deployment and checks for tasks that are in the RUNNING state. The scheduler ignores the failure criteria when a task in the current deployment is in the RUNNING state and proceeds to the next stage. When tasks fail to reach in the RUNNING state, the deployment circuit breaker increases the failure count by one. When the failure count equals the threshold, the deployment is marked as FAILED.

  2. This stage is entered when there are one of more tasks in the RUNNING state. The deployment circuit breaker performs health checks on the following resources for the tasks in the current deployment:

    • Elastic Load Balancing load balancers

    • Amazon Cloud Map service

    • Amazon ECS container health checks

    When a health check fails for the task, the deployment circuit breaker increases the failure count by one. When the failure count equals the threshold, the deployment is marked as FAILED.

The following table provides some examples.

Desired task count Calculation Threshold

1

10 <= 0.5 * 1 => 200
10 (the calculated value is less than the minimum)

25

10 <= 0.5 * 25 => 200
13 (the value is rounded up)

400

10 <= 0.5 * 400 => 200
200

800

10 <= 0.5 * 800 => 200
200 (the calculated value is greater than the maximum)

For additional examples about how to use the rollback option, see Announcing Amazon ECS deployment circuit breaker.