Testing zonal autoshift with Amazon FIS
You can use Amazon Fault Injection Service to set up and run experiments that help you simulate real-world conditions, such as the AZ Availability: Power Interruption scenario, that will demonstrate what happens when Amazon starts a zonal autoshift on your autoshift-enabled resources during a potentially widespread AZ impairment.
The start aws:arc:start-zonal-autoshift
recovery action allows you to demonstrate how Amazon will automatically shifts traffic, for zonal autoshift enabled resources, away from a potentially impaired AZ and reroute them to healthy AZs in the same Amazon Region during the execution of the AZ availability scenario.
For example, you can use the Amazon FIS scenario library to simulate an AZ impairment due to a power interruption. In this experiment, five minutes after the AZ power interruption begins, the recovery action aws:arc:start-zonal-autoshift
automatically shifts resource traffic away from the specified AZ for the remaining 25 minutes of the power interruption to demonstrate how autoshift would be triggered when there is a potential widespread AZ impairment. After that duration, traffic shifts back to the original AZ when the experiment has ended, demonstrating a complete recovery of the power event impacting that AZ.
How experiments differ from zonal autoshift practice runs
Amazon FIS experiments differ from zonal autoshift practice runs in that, during practice runs, ARC shifts traffic for your resource away from one AZ as part of a normal process to ensure your application can tolerate the loss of an AZ. However, during an Amazon FIS experiment, Amazon FIS demonstrates how an AZ impairment and an autoshift would be triggered for your autoshift-enabled resources on your behalf, and then cancels the autoshift when the impairment has been resolved.
You cannot update an Amazon FIS-initiated zonal shift while it is running, and cancelling a zonal shift outside of Amazon FIS will end the Amazon FIS experiment.
Amazon FIS expiration-based safety mechanism
Amazon FIS manages the zonal shift using the StartZonalShift, UpdateZonalShift, and
CancelZonalShift APIs with the expiresIn
field for these requests set to 1
minute as a safety mechanism. This enables Amazon FIS to quickly rollback the zonal shift in
the case of any unexpected events such as network outages or system issues. In the ARC
console, the expiration time field will display Amazon FIS-managed, and the actual expected
expiration is determined by the duration specified in the zonal shift action. For more
information on practice runs, see How zonal
autoshift and practice runs work
There can be no more than one applied zonal shift at a given time—that is, only one practice run zonal shift, customer-initiated zonal shift, autoshift, or Amazon FIS experiment for the resource. When a second zonal shift is started ARC follows a precedence to determine which zonal shift type is in effect for a resource. For more information on precedence for zonal shifts, see Precedence for zonal shifts.
For more information about Amazon FIS recovery actions, refer to the Amazon FIS recovery action in the Amazon Fault Injection Service User Guide.