Recover your instance
To automatically recover an instance when a system status check failure occurs, you can use the default configuration of the instance or create an Amazon CloudWatch alarm. If an instance becomes unreachable because of an underlying hardware failure or a problem that requires Amazon involvement to repair, the instance is automatically recovered.
A recovered instance is identical to the original instance, including the instance ID, private IP addresses, Elastic IP addresses, and all instance metadata. If the impaired instance has a public IPv4 address, the instance retains the public IPv4 address after recovery. If the impaired instance is in a placement group, the recovered instance runs in the placement group. During instance recovery, the instance is migrated as part of an instance reboot, and any data that is in-memory is lost.
Examples of problems that require instance recovery:
-
Loss of network connectivity
-
Loss of system power
-
Software issues on the physical host
-
Hardware issues on the physical host that impact network reachability
Topics
Simplified automatic recovery based on instance configuration
Instances that support simplified automatic recovery are configured by default to recover a failed instance. The default configuration applies to new instances that you launch and existing instances that you previously launched. Simplified automatic recovery is initiated in response to system status check failures. Simplified automatic recovery doesn't take place during Service Health Dashboard events, or any other events that impact the underlying hardware. For more information, see Troubleshoot instance recovery failures.
When a simplified automatic recovery event succeeds, you are notified by an Amazon Health Dashboard event. When a simplified automatic recovery event fails, you are notified by an Amazon Health Dashboard event and by email. You can also use Amazon EventBridge rules to monitor for simplified automatic recovery events using the following event codes:
-
AWS_EC2_SIMPLIFIED_AUTO_RECOVERY_SUCCESS
— successful events -
AWS_EC2_SIMPLIFIED_AUTO_RECOVERY_FAILURE
— failed events
For more information, see Amazon EventBridge rules.
Requirements
Simplified automatic recovery is supported by an instance if the instance has the following characteristics:
-
It uses
default
ordedicated
instance tenancy. -
It does not use an Elastic Fabric Adapter.
-
It uses one of the following instance types:
-
General purpose: M3 | M4 | M5 | M5a | M5n | M5zn | M6a | M6i | M6in | M7a | M7i | M7i-flex | T1 | T2 | T3 | T3a
-
Compute optimized: C3 | C4 | C5 | C5a | C5n | C6a | C6i | C6in | C7i | Hpc7a
-
Memory optimized: R3 | R4 | R5 | R5a | R5b | R5n | R6a | R6g | R6in | R7a | R7iz | u-3tb1 | u-6tb1 | u-9tb1 | u-12tb1 | u-18tb1 | u-24tb1 | X1 | X1e | X2iezn
-
Accelerated computing: G3 | G3s | P2 | P3
-
-
It does not have instance store volumes. If a Nitro instance type has instance store volumes or if a Xen-based instance has mapped instance store volumes, the instance will not be automatically recovered. You should regularly backup your instance store volume data to more persistent storage, such as Amazon EBS, Amazon S3, or Amazon EFS. In the event of a system status check failure, you can stop and start instances with instance store volumes and then restore your instance store volume using the backed-up data.
Limitations
-
Instances with instance store volumes and metal instance types are not supported by simplified automatic recovery.
-
Simplified automatic recovery is not initiated for instances in an Auto Scaling group. If your instance is part of an Auto Scaling group with health checks enabled, then the instance is replaced when it becomes impaired.
-
Simplified automatic recovery applies to unplanned events only. It does not apply to scheduled events.
-
Terminated or stopped instances cannot be recovered.
Set the recovery behavior
You can set the automatic recovery behavior to disabled
or
default
during or after launching the instance. The default
configuration does not enable simplified automatic recovery for an unsupported
instance type.
Amazon CloudWatch action based recovery
Use Amazon CloudWatch action based recovery if you want to customize when to recover your instance.
When the StatusCheckFailed_System
alarm is triggered, and the recovery action
is initiated, you're notified by the Amazon SNS topic that you selected when you created the alarm
and associated the recovery action. When the recovery action is complete, information is
published to the Amazon SNS topic you configured for the alarm. Anyone who is subscribed to this
Amazon SNS topic receives an email notification that includes the status of the recovery attempt
and any further instructions. As a last step in the recovery action, the recovered instance
reboots.
You can use Amazon CloudWatch alarms to recover an instance even if simplified automatic recovery is not disabled. For information about creating an Amazon CloudWatch alarm to recover an instance, see Add recover actions to Amazon CloudWatch alarms.
Supported instance types
All of the instance types supported by simplified automatic recovery are also supported by CloudWatch action based recovery. Additionally, Amazon CloudWatch action based recovery supports the following instance types with instance store volumes.
-
General purpose: M3
-
Compute optimized: C3
-
Memory optimized: R3 | X1 | X1e | X2idn | X2iedn
Important
If the instance has instance store volumes attached, the data is lost during recovery.
Amazon CloudWatch action based recovery does not support recovery for instances with Amazon EC2 Dedicated Hosts tenancy and metal instances.
You can use the Amazon Web Services Management Console or the Amazon CLI to view the instance types that support Amazon CloudWatch action based recovery.
Troubleshoot instance recovery failures
The following issues can cause the recovery of your instance to fail:
-
During Service Health Dashboard events, simplified automatic recovery might not recover your instance. You might not receive recovery failure notifications for such events. Any ongoing Service Health Dashboard events might also prevent CloudWatch action based recovery from successfully recovering an instance. For the latest service availability information, see http://status.amazonaws.cn/
. -
Temporary, insufficient capacity of replacement hardware.
-
The instance has reached the maximum daily allowance of three recovery attempts.
The automatic recovery process attempts to recover your instance for up to three separate failures per day. If the instance system status check failure persists, we recommend that you manually stop and start the instance. Data on instance store volumes is lost when the instance is stopped. For more information, see Stop and start your instance.
Your instance might subsequently be retired if automatic recovery fails and a hardware degradation is determined to be the root cause for the original system status check failure.