Considerations when you configure zonal autoshift - Amazon Route 53 Application Recovery Controller
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Considerations when you configure zonal autoshift

Zonal autoshift in Amazon Route 53 Application Recovery Controller includes two types of traffic shifts: autoshifts and practice run zonal shifts. With an autoshift, Amazon help reduce your time to recovery by shifting away application resource traffic from an Availability Zone during events, on your behalf. With practice runs, Route 53 ARC starts a zonal shift to shift traffic away from an Availability Zone for a resource, and back again, on a weekly cadence. Practice runs help you to make sure that you have scaled up sufficient capacity for Availability Zones in a Region for your application to tolerate the loss of one Availability Zone.

There are several considerations to keep in mind with autoshifts and practice runs. Review the following topics before you enable zonal autoshift or configure practice runs for a resource.

Topics

Resource capacity prescaling

When Amazon shifts traffic away from one Availability Zone, it's important that the remaining Availability Zones can service the increased request rates for your resource. This pattern is known as static stability. For more information, see the Static stability using Availability Zones whitepaper in the Amazon Builder’s Library.

For example, if your application requires 30 instances to serve its clients, you should provision 15 instances across three Availability Zones, for a total of 45 instances. By doing this, when Amazon shifts traffic away from one Availability Zone—with an autoshift or during a practice run—Amazon can still serve your application’s clients with the remaining total of 30 instances, across two Availability Zones.

The zonal autoshift capability in Route 53 ARC helps you to quickly recover from Amazon events in an Availability Zone when you have an application with resources that are prescaled to work normally with the loss of one Availability Zone. Before you enable zonal autoshift for a resource, scale your resource capacity in all configured Availability Zones in an Amazon Web Services Region. Then, start zonal shifts for the resource, to test that your application still runs normally when traffic is shifted away from an Availability Zone.

After you test with zonal shifts, then enable zonal autoshift and configure practice runs for application resources. Regular practice runs with zonal autoshift help you to make sure—on an ongoing basis—that your capacity is still scaled appropriately. With sufficient capacity across Availability Zones, your application can continue to serve clients, without interruption, during an autoshift.

For more information about starting a zonal shift for a resource, see Zonal shift in Amazon Route 53 Application Recovery Controller.

Resource types and restrictions

Zonal autoshift supports shifting traffic out of an Availability Zone for all resources that are supported by zonal shift. In general, Network Load Balancers and Application Load Balancers with cross-zone load balancing turned off are supported. In a few specific resource scenarios, zonal autoshift does not shift traffic from an Availability Zone for an autoshift.

For example, if the load balancer target groups in the Availability Zones don't have any instances, or if all of the instances are unhealthy, then the load balancer is in a fail open state. If Amazon starts an autoshift for a load balancer in this scenario, an autoshift does not change which Availability Zones the load balancer uses because the load balancer is already in a fail open state. This is expected behavior. Autoshift cannot cause one Availability Zone to be unhealthy and shift traffic to the other Availability Zones in an Amazon Web Services Region if all Availability Zones are failing open (unhealthy).

A second scenario is if Amazon starts an autoshift for an Application Load Balancer that is an endpoint for an accelerator in Amazon Global Accelerator. As with zonal shift, autoshift isn't supported for Application Load Balancers that are endpoints of accelerators in Global Accelerator.

To see details about supported resources, including all of the requirements and exceptions to be aware of, see Resources supported for zonal shift and zonal autoshift.

Alarms that you specify for practice runs

When you consider how to configure CloudWatch alarms for practice runs for your resource, keep in mind the following:

  • For the outcome alarm, which is required, we recommend that you configure a CloudWatch alarm to go into an ALARM state when metrics for the resource, or your application, indicate that shifting traffic away from the Availability Zone adversely impacts performance. For example, you can determine a threshold for request rates for your resource, and then configure an alarm to go into an ALARM state when the threshold is exceeded. You are responsible for configuring an appropriate alarm that causes Amazon to end the practice run and return a FAILED outcome.

  • We recommend that you follow the Amazon Well Architected Framework, which advises you to implement key performance indicators (KPIs) as CloudWatch alarms. If you do so, you can use these alarms to create a composite alarm to use as a safety trigger, to prevent practice runs from starting if they might cause your application to miss a KPI. When the alarm is no longer in an ALARM state, Route 53 ARC starts practice runs the next time a practice run is scheduled for the resource.

  • For the practice run blocking alarm, if you choose to configure it, you might choose to track a specific metric that you use to indicate that you don't want a practice run to start.

  • For practice run alarms, you specify the Amazon Resource Name (ARN) for each alarm, which you must first configure in Amazon CloudWatch. The CloudWatch alarms that you specify can be composite alarms, to enable you to include several metrics and checks for your application and resource that can trigger the alarm to go into an ALARM state. For more information, see Combining alarms in the Amazon CloudWatch User Guide.

  • Make sure that the CloudWatch alarms that you specify for practice runs are in the same Region as the resource that you're configuring a practice run for.

Outcomes for practice runs

Route 53 ARC reports an outcome for each practice run. The following are the possible practice run outcomes:

  • SUCCEEDED: The outcome alarm did not enter an ALARM state during the practice run, and the practice run completed the full 30 minute test period.

  • FAILED: The outcome alarm entered an ALARM state during the practice run.

  • INTERRUPTED: The practice run ended for a reason that was not the outcome alarm entering an ALARM state. A practice run can be interrupted for a variety of reasons, including the following:

    • Practice run was ended because Amazon started an autoshift in the Amazon Web Services Region or there was an alarm condition in the Region.

    • Practice run was ended because the practice run configuration was deleted for the resource.

    • Practice run was ended because a customer-initiated zonal shift was started for the resource in the Availability Zone that the practice run zonal shift was shifting traffic away from.

    • Practice run was ended because a CloudWatch alarm that was specified for the practice run configuration can no longer be accessed.

    • Practice run was ended because the blocking alarm specified for the practice run entered an ALARM state.

    • Practice run was ended for an unknown reason.

  • PENDING: The practice run is active (in progress). There's no outcome to return yet.