Troubleshoot Amazon EC2 Auto Scaling: Health checks - Amazon EC2 Auto Scaling
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Troubleshoot Amazon EC2 Auto Scaling: Health checks

This page provides information about your EC2 instances that terminate due to a health check. It describes potential causes, and the steps that you can take to resolve the issues.

To retrieve an error message, see Retrieve an error message from scaling activities.

Note

You can be notified when Amazon EC2 Auto Scaling terminates the instances in your Auto Scaling group, including when the cause of instance termination is not the result of a scaling activity. For more information, see Amazon SNS notification options for Amazon EC2 Auto Scaling.

The sections that follow describe the most common health check errors and causes that you'll encounter. If you have a different issue, see the following Amazon Knowledge Center articles for additional troubleshooting help:

An instance was taken out of service in response to an EC2 instance status check failure

Problem: Auto Scaling instances fail the Amazon EC2 status checks.

Cause 1: If there are issues that cause Amazon EC2 to consider the instances in your Auto Scaling group impaired, Amazon EC2 Auto Scaling automatically replaces the impaired instances as part of its health check. Status checks are built into Amazon EC2, so they cannot be disabled or deleted. When an instance status check fails, you typically must address the problem yourself by making instance configuration changes until your application is no longer exhibiting any problems.

Solution 1: To address this issue, follow these steps:

  1. Manually create an Amazon EC2 instance that is not part of the Auto Scaling group and investigate the problem. For general help with investigating impaired instances, see Troubleshoot instances with failed status checks in the Amazon EC2 User Guide for Linux Instances and Troubleshooting Windows Instances in the Amazon EC2 User Guide for Windows Instances.

  2. After you confirm that your instance launched successfully and is healthy, deploy a new, error-free instance configuration to the Auto Scaling group.

  3. Delete the instance that you created to avoid ongoing charges to your Amazon account.

Cause 2: There is a mismatch between the health check grace period and the instance startup time.

Solution 2: Edit the health check grace period for your Auto Scaling group to an appropriate time period for your application. Instances launched in an Auto Scaling group require sufficient warm-up time (grace period) to prevent early termination due to a health check replacement. For more information, see Set the health check grace period for an Auto Scaling group.

An instance was taken out of service in response to an EC2 scheduled reboot

Problem: Auto Scaling instances are replaced when a scheduled event indicates a problem with the instance.

Cause: Amazon EC2 Auto Scaling replaces instances with a future scheduled maintenance or retirement event.

Solution: These events do not occur frequently. If you need something to happen on the instance that is terminating, or on the instance that is starting up, you can use lifecycle hooks. These hooks allow you to perform a custom action as Amazon EC2 Auto Scaling launches or terminates instances. For more information, see Amazon EC2 Auto Scaling lifecycle hooks.

If you do not want instances to be replaced due to a scheduled event, you can suspend the health check process for an Auto Scaling group. For more information, see Suspend and resume Amazon EC2 Auto Scaling processes.

An instance was taken out of service in response to an EC2 health check that indicated it had been terminated or stopped

Problem: Auto Scaling instances that have been stopped, rebooted, or terminated are replaced.

Cause 1: A user manually stopped, rebooted, or terminated the instance.

Solution 1: If a health check fails because a user manually stopped, rebooted, or terminated the instance, this is due to how Amazon EC2 Auto Scaling health checks work. The instance must be healthy and reachable. If you need to reboot the instances in your Auto Scaling group, we recommend that you put the instances on standby first. For more information, see Temporarily remove instances from your Auto Scaling group.

Note that when you terminate instances manually, termination lifecycle hooks and Elastic Load Balancing deregistration (and connection draining) must be completed before the instance is actually terminated.

Cause 2: Amazon EC2 Auto Scaling attempts to replace Spot Instances after the Amazon EC2 Spot service interrupts the instances, because the Spot price increases above your maximum price or capacity is no longer available.

Solution 2: There is no guarantee that a Spot Instance exists to fulfill the request at any given point in time. However, you can try the following:

  • Use a higher Spot maximum price (possibly the On-Demand price). By setting your maximum price higher, it gives the Amazon EC2 Spot service a better chance of launching and maintaining your required amount of capacity.

  • Increase the number of different capacity pools that you can launch instances from by running multiple instance types in multiple Availability Zones. For more information, see Auto Scaling groups with multiple instance types and purchase options.

  • If you use multiple instance types, consider enabling the Capacity Rebalancing feature. This is useful if you want the Amazon EC2 Spot service to attempt to launch a new Spot Instance before a running instance is terminated. For more information, see Use Capacity Rebalancing to handle Amazon EC2 Spot interruptions.

Cause 3: With Capacity Blocks, Amazon EC2 terminates any instances that are still running 30 minutes before the end time of the Capacity Block. This abrupt termination causes your Auto Scaling group to try to launch new instances to maintain its desired capacity, even as the Capacity Block is ending.

Solution 3: To resolve this issue, try the following:

  • Decrease the desired capacity of the Auto Scaling group to prevent it from trying to launch new instances. For more information, see Manual scaling for Amazon EC2 Auto Scaling.

  • Make sure you scale in your Auto Scaling group 30 minutes before the Capacity Block end time so that you do not encounter this error frequently. Make sure any lifecycle hooks have completed 30 minutes before the Capacity Block end time. For more information, see Use Capacity Blocks for machine learning workloads.

An instance was taken out of service in response to an ELB system health check failure

Problem: Auto Scaling instances might pass the EC2 status checks. But they might fail the Elastic Load Balancing health checks for the target groups or Classic Load Balancers with which the Auto Scaling group is registered.

Cause: If your Auto Scaling group relies on health checks provided by Elastic Load Balancing, Amazon EC2 Auto Scaling determines the health status of your instances by checking the results of both the EC2 status checks and the Elastic Load Balancing health checks. The load balancer performs health checks by sending a request to each instance and waiting for the correct response, or by establishing a connection with the instance. An instance might fail the Elastic Load Balancing health check because an application running on the instance has issues that cause the load balancer to consider the instance out of service. For more information, see Health checks for instances in an Auto Scaling group.

Solution 1: To pass the Elastic Load Balancing health checks:

  • Make note of the success codes that the load balancer is expecting, and verify that your application is configured correctly to return these codes on success.

  • Verify that the security groups for your load balancer and Auto Scaling group are correctly configured.

  • Verify that the health check settings of your target groups are correctly configured. You define health check settings for your load balancer per target group.

  • Consider adding a launch lifecycle hook to the Auto Scaling group to ensure that the applications on the instances are ready to accept traffic before they are registered to the load balancer at the end of the lifecycle hook.

  • Set the health check grace period for your Auto Scaling group to a long enough time period to support the number of consecutive successful health checks required before Elastic Load Balancing considers a newly launched instance healthy.

  • Verify that the load balancer is configured in the same Availability Zones as your Auto Scaling group.

For more information, see the following topics:

Solution 2: Update the Auto Scaling group to disable Elastic Load Balancing health checks.