Testing zonal autoshift with Amazon FIS
You can use Amazon Fault Injection Service to set up and run experiments that help you simulate real-world conditions, such as the AZ Availability: Power Interruption scenario, that will demonstrate what happens when Amazon starts a zonal autoshift on your autoshift-enabled resources during a potentially widespread AZ impairment.
The start aws:arc:start-zonal-autoshift
recovery action allows you to demonstrate
how Amazon will automatically shifts traffic, for zonal autoshift enabled resources, away from a
potentially impaired AZ and reroute them to healthy AZs in the same Amazon Web Services Region during the execution
of the AZ availability scenario.
For example, you can use the Amazon FIS scenario library to simulate an AZ impairment that was caused by a power
interruption. In this experiment, five minutes after the AZ power interruption begins, the recovery
action aws:arc:start-zonal-autoshift
automatically shifts resource traffic away from the
specified AZ. The traffic is shifted for the remaining 25 minutes of the power interruption, to demonstrate how autoshift would
be triggered when there is potentially widespread AZ impairment. When the experiment completes, the traffic shift
ends and traffic begins flowing to all AZs again. This process demonstrates a complete recovery from a power
event that impacts an AZ.
How experiments differ from zonal autoshift practice runs
Amazon FIS experiments differ from zonal autoshift practice runs in that, during practice runs, ARC shifts traffic for your resource away from one AZ as part of a normal process to ensure that your application can tolerate the loss of an AZ. However, during an Amazon FIS experiment, Amazon FIS demonstrates how an AZ impairment and an autoshift would be triggered for your autoshift-enabled resources on your behalf, and then cancels the autoshift when the impairment has been resolved.
You cannot update an Amazon FIS-initiated zonal shift while it is running. In addition, if you cancel a zonal shift outside of Amazon FIS, the Amazon FIS experiment ends.
Amazon FIS expiration-based safety mechanism
Amazon FIS manages the zonal shift using the
StartZonalShift,
UpdateZonalShift,
and
CancelZonalShift
API operations, with the expiresIn
field for these requests set to 1
minute as a safety mechanism. This enables Amazon FIS to quickly roll back the zonal shift if
there are unexpected events, such as network outages or system issues. In the ARC
console, the expiration time field will display Amazon FIS-managed, and the actual expected
expiration is determined by the duration specified in the zonal shift action. For more
information on practice runs, see How zonal
autoshift and practice runs work
There can be no more than one applied zonal shift at a given time. That is, only one practice run zonal shift, customer-initiated zonal shift, autoshift, or Amazon FIS experiment for the resource. When a second zonal shift is started, ARC follows a precedence to determine which zonal shift type is in effect for a resource. For more information on precedence for zonal shifts, see Precedence for zonal shifts.
For more information about Amazon FIS recovery actions, refer to the Amazon FIS recovery action in the Amazon Fault Injection Service User Guide.