Troubleshoot Amazon ECS deployment issues
Topics
- A timeout occurs while waiting for replacement task set
- A timeout occurs while waiting for a notification to continue
- The IAM role does not have enough permissions
- The deployment timed out while waiting for a status callback
- The deployment failed because one or more of the lifecycle event validation functions failed
- The ELB could not be updated due to the following error: Primary taskset target group must be behind listener
- My deployment sometimes fails when using Auto Scaling
- Only ALB supports gradual traffic routing, use AllAtOnce Traffic routing instead when you create/update Deployment group
- Even though my deployment succeeded, the replacement task set fails the Elastic Load Balancing health checks, and my application is down
- Can I attach multiple load balancers to a deployment group?
- Can I perform CodeDeploy blue/green deployments without a load balancer?
- How can I update my Amazon ECS service with new information during a deployment?
A timeout occurs while waiting for replacement task set
Problem: You see the following error message while deploying your Amazon ECS application using CodeDeploy:
The deployment timed out while waiting for the replacement task set to become
healthy. This time out period is 60 minutes.
Possible cause: This error might occur if there is a
mistake in your task definition file or other deployment-related files. For example, if there
is a typo in the image
field in your task definition file, Amazon ECS will try to pull
the wrong container image and continuously fail, causing this error.
Possible fixes and next steps:
-
Fix typographical errors and configuration problems in your task definition file and other files.
-
Check the related Amazon ECS service event and find out why replacement tasks are not becoming healthy. For more information on Amazon ECS events, see Amazon ECS events in the Amazon Elastic Container Service Developer Guide.
-
Check the Amazon ECS troubleshooting section in the Amazon Elastic Container Service Developer Guide for errors related to the messages in the event.
A timeout occurs while waiting for a notification to continue
Problem: You see the following error message while deploying your Amazon ECS application using CodeDeploy:
The deployment timed out while waiting for a notification to continue. This time out
period is
n
minutes.
Possible cause: This error might occur if you specified a wait time in the Specify when to reroute traffic field when you created your deployment group, but the deployment couldn't finish before the wait time expired.
Possible fixes and next steps:
-
In your deployment group, set the Specify when to reroute traffic to a larger amount of time and redeploy. For more information, see Create a deployment group for an Amazon ECS deployment (console).
-
In your deployment group, change Specify when to reroute traffic to Reroute traffic immediately and redeploy. For more information, see Create a deployment group for an Amazon ECS deployment (console).
-
Redeploy and then run the
aws deploy continue-deployment
Amazon CLI command with the--deployment-wait-type
option set toREADY_WAIT
. Make sure to run this command before the time specified in Specify when to reroute traffic expires.
The IAM role does not have enough permissions
Problem: You see the following error message while deploying your Amazon ECS application using CodeDeploy:
The IAM role
role-arn
does not give you permission to
perform operations in the following Amazon service: AWSLambda.
Possible cause: This error might occur if you specified a Lambda function in the AppSpec file's Hooks section, but you did not give CodeDeploy permission to the Lambda service.
Possible fix: Add the lambda:InvokeFunction
permission to the CodeDeploy service role. To add this permission, add one of the following
Amazon-managed policies to the role: AWSCodeDeployRoleForECS
or
AWSCodeDeployRoleForECSLimited
. For information about these policies
and how to add them to the CodeDeploy service role, see Step 2: Create a service role for
CodeDeploy.
The deployment timed out while waiting for a status callback
Problem: You see the following error message while deploying your Amazon ECS application using CodeDeploy:
The deployment timed out while waiting for a status callback. CodeDeploy expects a status
callback within one hour after a deployment hook is invoked.
Possible cause: This error might occur if you specified a
Lambda function in the AppSpec file's Hooks
section, but Lambda function could not call the necessary
PutLifecycleEventHookExecutionStatus
API to return a Succeeded
or
Failed
status to CodeDeploy.
Possible fixes and next steps:
-
Add the
codedeploy:putlifecycleEventHookExecutionStatus
permission to the Lambda execution role used by the Lambda function that you specified in the AppSpec file. This permission grants the Lambda function the ability to return a status ofSucceeded
orFailed
to CodeDeploy. For more information about the Lambda execution role, see Lambda execution role in the Amazon Lambda User Guide. -
Check your Lambda function code and execution logs to make sure your Lambda function is calling CodeDeploy's
PutLifecycleEventHookExecutionStatus
API to inform CodeDeploy about whether the lifecycle validation testSucceeded
orFailed
. For information about theputlifecycleEventHookExecutionStatus
API, see PutLifecycleEventHookExecutionStatus in the Amazon CodeDeploy API Reference. For information about Lambda execution logs, see Accessing Amazon CloudWatch logs for Amazon Lambda.
The deployment failed because one or more of the lifecycle event validation functions failed
Problem: You see the following error message while deploying your Amazon ECS application using CodeDeploy:
The deployment failed because one or more of the lifecycle event validation
functions failed.
Possible cause: This error might occur if you specified a
Lambda function in the AppSpec file's Hooks
section, but the Lambda function returned Failed
to CodeDeploy when it called
PutLifecycleEventHookExecutionStatus
. This failure indicates to CodeDeploy that the
lifecycle validation test failed.
Possible next step: Check your Lambda execution logs to see why the validation test code is failing. For information about Lambda execution logs, see Accessing Amazon CloudWatch logs for Amazon Lambda.
The ELB could not be updated due to the following error: Primary taskset target group must be behind listener
Problem: You see the following error message while deploying your Amazon ECS application using CodeDeploy:
The ELB could not be updated due to the following error: Primary taskset target
group must be behind listener
Possible cause: This error might occur if you have configured an optional test listener, and it is configured with wrong target group. For more information about the test listener in CodeDeploy, see Before you begin an Amazon ECS deployment and What happens during an Amazon ECS deployment. For more information about task sets, see TaskSet in the Amazon Elastic Container Service API Reference and describe-task-set in the Amazon ECS section of the Amazon CLI Command Reference.
Possible fix: Make sure that the Elastic Load Balancing's production listener and test listener are both pointing to the target group that's currently serving your workloads. There are three places to check:
-
In Amazon EC2, in your load balancer's Listeners and rules settings. For more information, see Listeners for your Application Load Balancers in the User Guide for Application Load Balancers, or Listeners for your Network Load Balancers in the User Guide for Network Load Balancers.
-
In Amazon ECS, in your cluster, under your service's Networking configuration. For more information, see Application Load Balancer and Network Load Balancer considerations in the Amazon Elastic Container Service Developer Guide.
-
In CodeDeploy, in your deployment group settings. For more information, see Create a deployment group for an Amazon ECS deployment (console).
My deployment sometimes fails when using Auto Scaling
Problem: You are using Auto Scaling with CodeDeploy and you notice that your deployments occasionally fail. For more information about the symptoms of this problem, see the topic that reads For services configured to use service auto scaling and the blue/green deployment type, auto scaling is not blocked during a deployment but the deployment may fail under some circumstances in the Amazon Elastic Container Service Developer Guide.
Possible cause: This problem might occur if CodeDeploy and Auto Scaling processes conflict.
Possible fix: Suspend and resume Auto Scaling processes during
the CodeDeploy deployment using the RegisterScalableTarget
API (or the corresponding
register-scalable-target
Amazon CLI command). For more information, see Suspend and resume scaling for Application Auto Scaling in the
Application Auto Scaling User Guide.
Note
CodeDeploy can't call RegisterScaleableTarget
directly. To use this API, you
must configure CodeDeploy to send a notification or event to Amazon Simple Notification Service (or Amazon CloudWatch). You must
then configure Amazon SNS (or CloudWatch) to call a Lambda function, and configure the Lambda function to
call the RegisterScalableTarget
API. The RegisterScalableTarget
API must be called with the SuspendedState
parameter set to true
to suspend Auto Scaling operations, and false
to resume them.
The notification or event that CodeDeploy sends out must occur when a deployment starts (to trigger Auto Scaling suspend operations), or when a deployment succeeds, fails, or stops (to trigger Auto Scaling resume operations).
For information about how to configure CodeDeploy to generate Amazon SNS notifications or CloudWatch events, see Monitoring deployments with Amazon CloudWatch Events. and Monitoring Deployments with Amazon SNS Event Notifications.
Only ALB supports gradual traffic routing, use AllAtOnce Traffic routing instead when you create/update Deployment group
Problem: You see the following error message while creating or updating a deployment group in CodeDeploy:
Only ALB supports gradual traffic routing, use AllAtOnce Traffic routing instead when
you create/update Deployment group.
Possible cause: This error might occur if you're using a
Network Load Balancer and tried to use a predefined deployment configuration other than
CodeDeployDefault.ECSAllAtOnce
.
Possible fixes:
-
Change your predefined deployment configuration to
CodeDeployDefault.ECSAllAtOnce
. This is the only predefined deployment configuration supported by Network Load Balancers.For more information about predefined deployment configurations, see Predefined deployment configurations for an Amazon ECS compute platform.
-
Change your load balancer to an Application Load Balancer. Application Load Balancer's support all the predefined deployment configurations. For more information about creating a Application Load Balancer, see Set up a load balancer, target groups, and listeners for CodeDeploy Amazon ECS deployments.
Even though my deployment succeeded, the replacement task set fails the Elastic Load Balancing health checks, and my application is down
Problem: Even though CodeDeploy indicates that my deployment succeeded, the replacement task set fails the health checks from Elastic Load Balancing, and my application is down.
Possible cause: This issue might occur if you performed a
CodeDeploy all-at-once deployment, and your replacement (green) task set contains bad code that is
causing the Elastic Load Balancing health checks to fail. With the all-at-once deployment configuration, the
load balancer’s health checks start running on the replacement task set
after traffic has been shifted to it (that is,
after CodeDeploy’s AllowTraffic
lifecycle event occurs). That’s
why you will see health checks failing on the replacement task set after traffic has shifted,
but not before. For information about the lifecycle events that CodeDeploy generates, see What happens during an Amazon ECS
deployment.
Possible fixes:
-
Change your deployment configuration from all-at-once to canary or linear. In a canary or linear configuration, the load balancer’s health checks start running on the replacement task set while CodeDeploy installs your application in the replacement environment, and before traffic is shifted (that is, during the
Install
lifecycle event, and before theAllowTraffic
event). By allowing the checks to run during the application installation but before traffic is shifted, bad application code will be detected and cause deployment failures before the application becomes publicly available.For information about how to configure canary or linear deployments, see Change deployment group settings with CodeDeploy.
For information about CodeDeploy lifecycle events that run during an Amazon ECS deployment, see What happens during an Amazon ECS deployment.
Note
Canary and linear deployment configurations are only supported with Application Load Balancers.
-
If you want to keep your all-at-once deployment configuration, set up a test listener and check the health status of the replacement task set with the
BeforeAllowTraffic
lifecycle hook. For more information, see List of lifecycle event hooks for an Amazon ECS deployment.
Can I attach multiple load balancers to a deployment group?
No. If you want to use multiple Application Load Balancers or Network Load Balancers, use Amazon ECS rolling updates instead of CodeDeploy blue/green deployments. For more information about rolling updates, see Rolling update in the Amazon Elastic Container Service Developer Guide. For more information about using multiple load balancers with Amazon ECS, see Registering multiple target groups with a service in the Amazon Elastic Container Service Developer Guide.
Can I perform CodeDeploy blue/green deployments without a load balancer?
No, you cannot perform CodeDeploy blue/green deployments without a load balancer. If you are unable to use a load balancer, use Amazon ECS's rolling updates feature instead. For more information about Amazon ECS rolling updates, see Rolling update in the Amazon Elastic Container Service Developer Guide.
How can I update my Amazon ECS service with new information during a deployment?
To have CodeDeploy update your Amazon ECS service with a new parameter while it conducts a
deployment, specify the parameter in the resources
section of the AppSpec file.
Only a few Amazon ECS parameters are supported by CodeDeploy, such as the task definition file and
container name parameters. For a full list of Amazon ECS parameters that CodeDeploy can update, see
AppSpec 'resources'
section for Amazon ECS deployments.
Note
If you need to update your Amazon ECS service with a parameter that is not supported by CodeDeploy, complete these tasks:
-
Call Amazon ECS's
UpdateService
API with the parameter you want to update. For a full list of parameters that can be updated, see UpdateService in the Amazon Elastic Container Service API Reference. -
To apply the change to the tasks, create a new Amazon ECS blue/green deployment. For more information, see Create an Amazon ECS Compute Platform deployment (console).