Recover your instance - Amazon Elastic Compute Cloud
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Recover your instance

To automatically recover an instance when a system status check failure occurs, you can use the default configuration of the instance or create an Amazon CloudWatch alarm. If an instance becomes unreachable because of an underlying hardware failure or a problem that requires Amazon involvement to repair, the instance is automatically recovered.

A recovered instance is identical to the original instance, including the instance ID, private IP addresses, Elastic IP addresses, and all instance metadata. If the impaired instance has a public IPv4 address, the instance retains the public IPv4 address after recovery. If the impaired instance is in a placement group, the recovered instance runs in the placement group. During instance recovery, the instance is migrated as part of an instance reboot, and any data that is in-memory is lost.

Examples of problems that require instance recovery:

  • Loss of network connectivity

  • Loss of system power

  • Software issues on the physical host

  • Hardware issues on the physical host that impact network reachability

Simplified automatic recovery based on instance configuration

Instances that support simplified automatic recovery are configured by default to recover a failed instance. The default configuration applies to new instances that you launch and existing instances that you previously launched. Simplified automatic recovery is initiated in response to system status check failures. Simplified automatic recovery doesn't take place during Service Health Dashboard events, or any other events that impact the underlying hardware. For more information, see Troubleshoot instance recovery failures.

When a simplified automatic recovery event succeeds, you are notified by an Amazon Health Dashboard event. When a simplified automatic recovery event fails, you are notified by an Amazon Health Dashboard event and by email. You can also use Amazon EventBridge rules to monitor for simplified automatic recovery events using the following event codes:

  • AWS_EC2_SIMPLIFIED_AUTO_RECOVERY_SUCCESS — successful events

  • AWS_EC2_SIMPLIFIED_AUTO_RECOVERY_FAILURE — failed events

For more information, see Amazon EventBridge rules.

Requirements

Simplified automatic recovery is supported by an instance if the instance has the following characteristics:

  • It uses default or dedicated instance tenancy.

  • It does not use an Elastic Fabric Adapter.

  • It uses one of the following instance types:

    • General purpose: A1 | M3 | M4 | M5 | M5a | M5n | M5zn | M6a | M6g | M6i | M6in | M7a | M7g | M7i | M7i-flex | T1 | T2 | T3 | T3a | T4g

    • Compute optimized: C3 | C4 | C5 | C5a | C5n | C6a | C6g | C6gn | C6i | C6in | C7a | C7g | C7gn | C7i

    • Memory optimized: R3 | R4 | R5 | R5a | R5b | R5n | R6a | R6g | R6i | R6in | R7a | R7g | R7i | R7iz | u-3tb1 | u-6tb1 | u-9tb1 | u-12tb1 | u-18tb1 | u-24tb1 | X1 | X1e | X2iezn

    • Accelerated computing: G3 | G3s | G5g | Inf1 | P2 | P3 | VT1

    • High-performance computing Hpc6a | Hpc7a | Hpc7g

  • It does not have instance store volumes. If a Nitro instance type has instance store volumes, or if a Xen-based instance has mapped instance store volumes in the AMI being used, the instance can't be automatically recovered.

    Important

    If an instance has instance store volumes attached, stopping and starting the instance will cause any data on the instance store volumes to be lost. You should regularly backup your instance store volume data to more persistent storage, such as Amazon EBS, Amazon S3, or Amazon EFS. In the event of a system status check failure, you can stop and start instances with instance store volumes and then restore the instance store volumes using the backed-up data.

Limitations

  • Instances with instance store volumes and metal instance types are not supported by simplified automatic recovery.

  • Simplified automatic recovery is not initiated for instances in an Auto Scaling group. If your instance is part of an Auto Scaling group with health checks enabled, then the instance is replaced when it becomes impaired.

  • Simplified automatic recovery applies to unplanned events only. It does not apply to scheduled events.

  • Terminated or stopped instances can't be recovered.

Set the recovery behavior

You can set the automatic recovery behavior to disabled or default during or after launching the instance. The default configuration does not enable simplified automatic recovery for an unsupported instance type.

Console
To disable simplified automatic recovery during instance launch
  1. Open the Amazon EC2 console at https://console.amazonaws.cn/ec2/.

  2. In the navigation pane, choose Instances, and then choose Launch instance.

  3. In the Advanced details section, for Instance auto-recovery, select Disabled.

  4. Configure the remaining instance launch settings as needed and then launch the instance.

To disable simplified automatic recovery for a running or stopped instance
  1. Open the Amazon EC2 console at https://console.amazonaws.cn/ec2/.

  2. In the navigation pane, choose Instances.

  3. Select the instance, and then choose Actions, Instance settings, Change auto-recovery behavior.

  4. Choose Off, and then choose Save.

To set the automatic recovery behavior to default for a running or stopped instance
  1. Open the Amazon EC2 console at https://console.amazonaws.cn/ec2/.

  2. In the navigation pane, choose Instances.

  3. Select the instance, and then choose Actions, Instance settings, Change auto-recovery behavior.

  4. Choose Default (On), and then choose Save.

Amazon CLI
To disable simplified automatic recovery at launch

Use the run-instances command.

aws ec2 run-instances \ --image-id ami-1a2b3c4d \ --instance-type t2.micro \ --key-name MyKeyPair \ --maintenance-options AutoRecovery=Disabled \ [...]
To disable simplified automatic recovery for a running or stopped instance

Use the modify-instance-maintenance-options command.

aws ec2 modify-instance-maintenance-options \ --instance-id i-0abcdef1234567890 \ --auto-recovery disabled
To set the automatic recovery behavior to default for a running or stopped instance

Use the modify-instance-maintenance-options command.

aws ec2 modify-instance-maintenance-options \ --instance-id i-0abcdef1234567890 \ --auto-recovery default

Amazon CloudWatch action based recovery

Use Amazon CloudWatch action based recovery if you want to customize when to recover your instance.

When the StatusCheckFailed_System alarm is triggered, and the recovery action is initiated, you're notified by the Amazon SNS topic that you selected when you created the alarm and associated the recovery action. When the recovery action is complete, information is published to the Amazon SNS topic you configured for the alarm. Anyone who is subscribed to this Amazon SNS topic receives an email notification that includes the status of the recovery attempt and any further instructions. As a last step in the recovery action, the recovered instance reboots.

You can use Amazon CloudWatch alarms to recover an instance even if simplified automatic recovery is not disabled. For information about creating an Amazon CloudWatch alarm to recover an instance, see Add recover actions to Amazon CloudWatch alarms.

Supported instance types

All of the instance types supported by simplified automatic recovery are also supported by Amazon CloudWatch action based recovery. Additionally, CloudWatch action based recovery supports bare metal variants of the supported instance types. The following instance families are also supported in addition to those supported by simplified automatic recovery:

  • Memory optimized: X2idn | X2iedn

Important

For supported instance types that have instance store volumes, any data on these volumes will be lost during a recovery. Stopping and starting the instance will also cause any data on the instance store volume to be lost. You should regularly backup your instance store volume data to more persistent storage, such as Amazon EBS, Amazon S3, or Amazon EFS. In the event of a system status check failure, you can stop and start instances with instance store volumes and then restore the instance store volumes using the backed-up data.

CloudWatch action based recovery does not support recovery for instances with Dedicated Host tenancy. For Amazon EC2 Dedicated Hosts, you can use Dedicated Host Auto Recovery to automatically recover unhealthy instances.

You can use the Amazon Web Services Management Console or the Amazon CLI to view the instance types that support CloudWatch action based recovery.

Console
To view the instance types that support Amazon CloudWatch action based recovery
  1. Open the Amazon EC2 console at https://console.amazonaws.cn/ec2/.

  2. In the left navigation pane, choose Instance Types.

  3. In the filter bar, enter Auto Recovery support: true. Alternatively, as you enter the characters and the filter name appears, you can select it.

    The Instance types table displays all the instance types that support Amazon CloudWatch action based recovery.

Amazon CLI
To view the instance types that support Amazon CloudWatch action based recovery

Use the describe-instance-types command.

aws ec2 describe-instance-types --filters Name=auto-recovery-supported,Values=true --query "InstanceTypes[*].[InstanceType]" --output text | sort

Troubleshoot instance recovery failures

The following issues can cause the recovery of your instance to fail:

  • During Service Health Dashboard events, simplified automatic recovery might not recover your instance. You might not receive recovery failure notifications for such events. Any ongoing Service Health Dashboard events might also prevent CloudWatch action based recovery from successfully recovering an instance. For the latest service availability information, see http://status.amazonaws.cn/.

  • Temporary, insufficient capacity of replacement hardware.

  • The instance has reached the maximum daily allowance of three recovery attempts.

The automatic recovery process attempts to recover your instance for up to three separate failures per day. If the instance system status check failure persists, we recommend that you manually stop and start the instance. Data on instance store volumes is lost when the instance is stopped. For more information, see Stop and start your instance.

Your instance might subsequently be retired if automatic recovery fails and a hardware degradation is determined to be the root cause for the original system status check failure.