Add checkpoints to an instance refresh - Amazon EC2 Auto Scaling
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Add checkpoints to an instance refresh

When using an instance refresh, you can choose to replace instances in phases, so that you can perform verifications on your instances as you go. To do a phased replacement, you add checkpoints, which are points in time where the instance refresh pauses. Using checkpoints gives you greater control over how you choose to update your Auto Scaling group. It helps you to confirm that your application will function in a reliable, predictable manner.

How it works

When starting an instance refresh, you specify checkpoints as percentages of the total number of instances in the Auto Scaling group. These checkpoints indicate the minimum percentage of instances in the Auto Scaling group that must be new instances before the checkpoint is considered reached. For example, if your checkpoints are [20, 50, 100], the first checkpoint is reached when 20 percent of instances are new, the second when 50 percent are new, and the final checkpoint when all instances are new.

Amazon EC2 Auto Scaling paces instance replacements to honor the specified checkpoint percentages while maintaining the group's minimum healthy percentage. To reach a checkpoint percentage, Amazon EC2 Auto Scaling will sometimes replace fewer but never more than what the minimum healthy percentage allows.

Consider the following Auto Scaling group that has 10 instances. The checkpoint percentages are [20,50,100], the minimum healthy percentage is 80 percent, and the maximum healthy percentage is 100 percent. To maintain the minimum healthy percentage, only two instances can be replaced at a time. The following diagram summarizes the process for replacing instances before a checkpoint is reached.


                        This diagram shows how checkpoints affect the flow of an instance
                            refresh.

In the above example, there is an instance warmup period for each new instance that starts. You might also have a lifecycle hook that puts an instance into a wait state and then performs a custom action as it's launching or terminating.

Amazon EC2 Auto Scaling emits events for each checkpoint except for the 100 percent complete checkpoint. You can add an EventBridge rule to send the events to a target such as Amazon SNS. This way, you are notified when you can run the required verifications. For more information, see Create EventBridge rules for instance refresh events.

Considerations

Keep the following considerations in mind when using checkpoints:

  • Because checkpoints are based on percentages, the number of instances to replace changes with the size of the group. When a scale-out activity occurs and the size of the group increases, an in-progress operation could reach a checkpoint again. If that happens, Amazon EC2 Auto Scaling sends another notification and repeats the wait time between checkpoints before continuing.

  • It's possible to skip a checkpoint under certain circumstances. For example, suppose that your Auto Scaling group has two instances and your checkpoint percentages are [10,40,100]. After the first instance is replaced, Amazon EC2 Auto Scaling calculates that 50 percent of the group was replaced. Because 50 percent is higher than the first two checkpoints, it skips the first checkpoint (10) and sends a notification for the second checkpoint (40).

  • Canceling the operation stops any further replacements from being made. If you cancel the operation or it fails before reaching the last checkpoint, any instances that were already replaced are not rolled back to their previous configuration.

  • For a partial refresh, when you rerun the operation, Amazon EC2 Auto Scaling doesn't restart from the point of the last checkpoint, nor does it stop when only the earlier instances are replaced. However, it targets earlier instances for replacement first, before targeting new instances.

  • The actual percentage complete might be higher than the percentage for that checkpoint when the checkpoint's percentage is too low relative to the number of instances in the group. For example, suppose the checkpoint's percentage is 20 percent and the group has four instances. If Amazon EC2 Auto Scaling replaces one of the four instances, the actual percentage replaced (25 percent) will be higher than the checkpoint's percentage (20 percent).

  • After a checkpoint is reached, the displayed overall percentage complete doesn't update until after the instances finish warming up. For example, your checkpoint percentages are [20,50] with a checkpoint delay of 15 minutes and a minimum healthy percentage of 80 percent. Your Auto Scaling group has 10 instances and makes the following replacements:

    • 0:00: Two earlier instances are replaced with new ones.

    • 0:10: Two new instances finish warming up.

    • 0:25: Two earlier instances are replaced with new ones. (To maintain the minimum healthy percentage, only two instances are replaced.)

    • 0:35: Two new instances finish warming up.

    • 0:35: One earlier instance is replaced with a new one.

    • 0:45: One new instance finishes warming up.

    At 0:35, the operation stops launching new instances. The percentage complete doesn't accurately reflect the number of completed replacements yet (50 percent), because the new instance isn't done warming up. After the new instance completes its warmup period at 0:45, the percentage complete shows 50 percent.

Enable checkpoints (console)

You can enable checkpoints before starting an instance refresh to replace instances using an incremental or phased approach. This provides additional time for verification.

To start an instance refresh that uses checkpoints
  1. Open the Amazon EC2 console at https://console.amazonaws.cn/ec2/, and choose Auto Scaling Groups from the navigation pane.

  2. Select the check box next to your Auto Scaling group.

    A split pane opens up at the bottom of the Auto Scaling groups page.

  3. On the Instance refresh tab, in Active instance refresh, choose Start instance refresh.

  4. On the Start instance refresh page, enter the values for Minimum healthy percentage and Instance warmup.

  5. Select the Enable checkpoints check box.

    This displays a box where you can define the percentage threshold for the first checkpoint.

  6. For Proceed until ____ % of the group is refreshed, enter a number (1–100). This sets the percentage for the first checkpoint.

  7. To add another checkpoint, choose Add checkpoint and then define the percentage for the next checkpoint.

  8. To specify how long Amazon EC2 Auto Scaling waits after a checkpoint is reached, update the fields in Wait for 1 hour between checkpoints. The time unit can be hours, minutes, or seconds.

  9. If you are finished with your instance refresh selections, choose Start instance refresh.

Enable checkpoints (Amazon CLI)

To start an instance refresh with checkpoints enabled using the Amazon CLI, you need a configuration file that defines the following parameters:

  • CheckpointPercentages: Specifies threshold values for the percentage of instances to be replaced. These threshold values provide the checkpoints. When the percentage of instances that are replaced and warmed up reaches one of the specified thresholds, the operation waits for a specified period of time. You specify the number of seconds to wait in CheckpointDelay. When the specified period of time has passed, the instance refresh continues until it reaches the next checkpoint (if applicable).

  • CheckpointDelay: Specifies the amount of time, in seconds, to wait after a checkpoint is reached before continuing. Choose a time period that provides enough time to perform your verifications.

The last value shown in the CheckpointPercentages array describes the percentage of the Auto Scaling group that needs to be successfully replaced. The operation transitions to Successful after this percentage is successfully replaced and each instance is considered to have finished initializing.

To create multiple checkpoints

To create multiple checkpoints, use the following example start-instance-refresh command. This example configures an instance refresh that initially refreshes one percent of the Auto Scaling group. After waiting 10 minutes, it then refreshes the next 19 percent and waits another 10 minutes. Finally, it refreshes the rest of the group before concluding the operation.

aws autoscaling start-instance-refresh --cli-input-json file://config.json

Contents of config.json:

{ "AutoScalingGroupName": "my-asg", "Preferences": { "InstanceWarmup": 60, "MinHealthyPercentage": 80, "CheckpointPercentages": [1,20,100], "CheckpointDelay": 600 } }
To create a single checkpoint

To create a single checkpoint, use the following example start-instance-refresh command. This example configures an instance refresh that initially refreshes 20 percent of the Auto Scaling group. After waiting 10 minutes, it then refreshes the rest of the group before concluding the operation.

aws autoscaling start-instance-refresh --cli-input-json file://config.json

Contents of config.json:

{ "AutoScalingGroupName": "my-asg", "Preferences": { "InstanceWarmup": 60, "MinHealthyPercentage": 80, "CheckpointPercentages": [20,100], "CheckpointDelay": 600 } }
To partially refresh the Auto Scaling group

To replace only a portion of your Auto Scaling group and then stop completely, use the following example start-instance-refresh command. This example configures an instance refresh that initially refreshes one percent of the Auto Scaling group. After waiting 10 minutes, it then refreshes the next 19 percent before concluding the operation.

aws autoscaling start-instance-refresh --cli-input-json file://config.json

Contents of config.json:

{ "AutoScalingGroupName": "my-asg", "Preferences": { "InstanceWarmup": 60, "MinHealthyPercentage": 80, "CheckpointPercentages": [1,20], "CheckpointDelay": 600 } }