Status checks for your instances - Amazon Elastic Compute Cloud
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Status checks for your instances

With instance status monitoring, you can quickly determine whether Amazon EC2 has detected any problems that might prevent your instances from running applications. Amazon EC2 performs automated checks on every running EC2 instance to identify hardware and software issues. You can view the results of these status checks to identify specific and detectable problems. The event status data augments the information that Amazon EC2 already provides about the state of each instance (such as pending, running, stopping) and the utilization metrics that Amazon CloudWatch monitors (CPU utilization, network traffic, and disk activity).

Status checks are performed every minute, returning a pass or a fail status. If all checks pass, the overall status of the instance is OK. If one or more checks fail, the overall status is impaired. Status checks are built into Amazon EC2, so they cannot be disabled or deleted.

When a status check fails, the corresponding CloudWatch metric for status checks is incremented. For more information, see Status check metrics. You can use these metrics to create CloudWatch alarms that are triggered based on the result of the status checks. For example, you can create an alarm to warn you if status checks fail on a specific instance. For more information, see Create and edit status check alarms.

You can also create an Amazon CloudWatch alarm that monitors an Amazon EC2 instance and automatically recovers the instance if it becomes impaired due to an underlying issue. For more information, see Recover your instance.

Types of status checks

There are three types of status checks.

System status checks

System status checks monitor the Amazon systems on which your instance runs. These checks detect underlying problems with your instance that require Amazon involvement to repair. When a system status check fails, you can choose to wait for Amazon to fix the issue, or you can resolve it yourself. For instances backed by Amazon EBS, you can stop and start the instance yourself, which in most cases results in the instance being migrated to a new host. For Linux instances backed by instance store, you can terminate and replace the instance. For Windows instances, the root volume must be an Amazon EBS volume; instance store is not supported for the root volume. Note that instance store volumes are ephemeral and all data is lost when the instance is stopped.

The following are examples of problems that can cause system status checks to fail:

  • Loss of network connectivity

  • Loss of system power

  • Software issues on the physical host

  • Hardware issues on the physical host that impact network reachability

If a system status check fails, we increment the StatusCheckFailed_System metric.

Bare metal instances

If you perform a restart from the operating system on a bare metal instance, the system status check might temporarily return a fail status. When the instance becomes available, the system status check should return a pass status.

Instance status checks

Instance status checks monitor the software and network configuration of your individual instance. Amazon EC2 checks the health of the instance by sending an address resolution protocol (ARP) request to the network interface (NIC). These checks detect problems that require your involvement to repair. When an instance status check fails, you typically must address the problem yourself (for example, by rebooting the instance or by making instance configuration changes).

The following are examples of problems that can cause instance status checks to fail:

  • Failed system status checks

  • Incorrect networking or startup configuration

  • Exhausted memory

  • Corrupted file system

  • Incompatible kernel

If an instance status check fails, we increment the StatusCheckFailed_Instance metric.

Bare metal instances

If you perform a restart from the operating system on a bare metal instance, the instance status check might temporarily return a fail status. When the instance becomes available, the instance status check should return a pass status.

Attached EBS status checks

Attached EBS status checks monitor if the Amazon EBS volumes attached to an instance are reachable and able to complete I/O operations. The StatusCheckFailed_AttachedEBS metric is a binary value that indicates impairment if one or more of the EBS volumes attached to the instance are unable to complete I/O operations. These status checks detect underlying issues with the compute or Amazon EBS infrastructure. When the attached EBS status check metric fails, you can either wait for Amazon to resolve the issue, or you can take actions, such as replacing the affected volumes or stopping and restarting the instance.

The following are examples of issues that can cause attached EBS status checks to fail:

  • Hardware or software issues on the storage subsystems underlying the EBS volumes

  • Hardware issues on the physical host that impact reachability of the EBS volumes

  • Connectivity issues between the instance and EBS volumes

You can use the StatusCheckFailed_AttachedEBS metric to help improve the resilience of your workload. You can use this metric to create Amazon CloudWatch alarms that are triggered based on the result of the status check. For example, you could fail over to a secondary instance or Availability Zone when you detect a prolonged impact. Alternatively, you can monitor the I/O performance of each attached volume using EBS CloudWatch metrics to detect and replace the impaired volume. If your workload is not driving I/O to any of the EBS volumes attached to your instance, and the attached EBS status check indicates an impairment, you can stop and start the instance to address issues with the physical host that is impacting the reachability of the EBS volumes.

Note
  • The attached EBS status check metric is available only for Nitro instances.

  • You can monitor the attached EBS status check metric by creating a CloudWatch alarm based on the StatusCheckFailed_AttachedEBS metric. You can't view this status check by using the describe-instance-status Amazon CLI command.

Working with status checks

You can work with status checks using the console and command line tools, such as the Amazon CLI.

View status checks

To view status checks, use one of the following methods.

Console
To view status checks
  1. Open the Amazon EC2 console at https://console.amazonaws.cn/ec2/.

  2. In the navigation pane, choose Instances.

  3. On the Instances page, the Status check column lists the operational status of each instance.

  4. To view the status of a specific instance, select the instance, and then choose the Status and alarms tab.

    
                                            View the instance status checks on the Status
                                                and alarms tab.

    If your instance has a failed status check, you typically must address the problem yourself (for example, by rebooting the instance or by making instance configuration changes). To troubleshoot system or instance status check failures yourself, see Troubleshoot instances with failed status checks.

  5. To review the CloudWatch metrics for status checks, on the Status and alarms tab, expand Metrics to see the graphs for the following metrics:

    • Status check failed for system

    • Status check failed for instance

    For more information, see Status check metrics.

Command line

You can view status checks for running instances by using the describe-instance-status (Amazon CLI) command.

To view the status of all instances, use the following command.

aws ec2 describe-instance-status

To get the status of all instances with an instance status of impaired, use the following command.

aws ec2 describe-instance-status \ --filters Name=instance-status.status,Values=impaired

To get the status of a single instance, use the following command.

aws ec2 describe-instance-status \ --instance-ids i-1234567890abcdef0

Alternatively, use the following commands:

If you have an instance with a failed status check, see Troubleshoot instances with failed status checks.

Create and edit status check alarms

You can use the status check metrics to create CloudWatch alarms to notify you when an instance has a failed status check.

To create a status check alarm, use one of the following methods:

Console

Use the following procedure to configure an alarm that sends you a notification by email, or stops, terminates, or recovers an instance when it fails a status check.

To create a status check alarm
  1. Open the Amazon EC2 console at https://console.amazonaws.cn/ec2/.

  2. In the navigation pane, choose Instances.

  3. Select the instance, choose the Status Checks tab, and choose Actions, Create status check alarm.

  4. On the Manage CloudWatch alarms page, under Add or edit alarm, choose Create an alarm.

  5. For Alarm notification, turn the toggle on to configure Amazon Simple Notification Service (Amazon SNS) notifications. Select an existing Amazon SNS topic or enter a name to create a new topic.

    If you add an email address to the list of recipients or created a new topic, Amazon SNS sends a subscription confirmation email message to each new address. Each recipient must confirm the subscription by choosing the link contained in that message. Alert notifications are sent only to confirmed addresses.

  6. For Alarm action, turn the toggle on to specify an action to take when the alarm is triggered. Select the action.

  7. For Alarm thresholds, specify the metric and criteria for the alarm.

    You can leave the default settings for Group samples by (Average) and Type of data to sample (Status check failed:either), or you can change them to suit your needs.

    For Consecutive period, set the number of periods to evaluate and, in Period, enter the evaluation period duration before triggering the alarm and sending an email.

  8. (Optional) For Sample metric data, choose Add to dashboard.

  9. Choose Create.

If you need to make changes to an instance status alarm, you can edit it.

To edit a status check alarm
  1. Open the Amazon EC2 console at https://console.amazonaws.cn/ec2/.

  2. In the navigation pane, choose Instances.

  3. Select the instance and choose Actions, Monitoring, Manage CloudWatch alarms.

  4. On the Manage CloudWatch alarms page, under Add or edit alarm, choose Edit an alarm.

  5. For Search for alarm, choose the alarm.

  6. When you are finished making changes, choose Update.

Command line

In the following example, the alarm publishes a notification to an SNS topic, arn:aws-cn:sns:us-west-2:111122223333:my-sns-topic, when the instance fails either the instance check or system status check for at least two consecutive periods. The CloudWatch metric used is StatusCheckFailed.

To create a status check alarm using the Amazon CLI
  1. Select an existing SNS topic or create a new one. For more information, see Using the Amazon CLI with Amazon SNS in the Amazon Command Line Interface User Guide.

  2. Use the following list-metrics command to view the available Amazon CloudWatch metrics for Amazon EC2.

    aws cloudwatch list-metrics --namespace AWS/EC2
  3. Use the following put-metric-alarm command to create the alarm.

    aws cloudwatch put-metric-alarm \ --alarm-name StatusCheckFailed-Alarm-for-i-1234567890abcdef0 \ --metric-name StatusCheckFailed \ --namespace AWS/EC2 \ --statistic Maximum \ --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \ --unit Count \ --period 300 \ --evaluation-periods 2 \ --threshold 1 \ --comparison-operator GreaterThanOrEqualToThreshold \ --alarm-actions arn:aws-cn:sns:us-west-2:111122223333:my-sns-topic

    The period is the time frame, in seconds, in which Amazon CloudWatch metrics are collected. This example uses 300, which is 60 seconds multiplied by 5 minutes. The evaluation period is the number of consecutive periods for which the value of the metric must be compared to the threshold. This example uses 2. The alarm actions are the actions to perform when this alarm is triggered. This example configures the alarm to send an email using Amazon SNS.