Fault testing on Amazon EBS - Amazon EBS
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Fault testing on Amazon EBS

Use Amazon Fault Injection Service and the Pause I/O action to temporarily stop I/O between an Amazon EBS volume and the instances to which it is attached to test how your workloads handle I/O interruptions. With Amazon FIS, you can use controlled experiments to test your architecture and monitoring, such as Amazon CloudWatch alarms and OS timeout configurations, and improve resiliency to storage faults.

For more information about Amazon FIS, see the Amazon Fault Injection Service User Guide.

Considerations

Keep in mind the following considerations for pausing volume I/O:

  • You can pause I/O for all Amazon EBS volume types that are attached to instances built on the Nitro System.

  • You can pause I/O for the root volume.

  • You can pause I/O for Multi-Attach enabled volumes. If you pause I/O for a Multi-Attach enabled volume, I/O is paused between the volume and all of the instances to which it is attached.

  • To test your OS timeout configuration, set the experiment duration equal to or greater than the value specified for nvme_core.io_timeout. For more information, see NVMe I/O operation timeout for Amazon EBS volumes.

  • If you drive I/O to a volume that has I/O paused, the following happens:

    • The volume's status transitions to impaired within 120 seconds. For more information, see Amazon EBS volume status checks.

    • The CloudWatch metrics for queue length (VolumeQueueLength) will be non-zero. Any alarms or monitoring should monitor for a non-zero queue depth. For more information see Metrics for Amazon EBS volumes.

    • The CloudWatch metrics for VolumeReadOps or VolumeWriteOps will be 0, which indicates that the volume is no longer processing I/O.

Limitations

Keep in mind the following limitations for pausing volume I/O:

  • Instance store volumes are not supported.

  • Xen-based instances types are not supported.

  • You can't pause I/O for volumes created on an Outpost in Amazon Outposts, in an Amazon Wavelength Zone, or in a Local Zone.

You can perform a basic experiment from the Amazon EC2 console, or you can perform more advanced experiments using the Amazon FIS console. For more information about performing advanced experiments using the Amazon FIS console, see Tutorials for Amazon FIS in the Amazon Fault Injection Service User Guide.

To perform a basic experiment using the Amazon EC2 console
  1. Open the Amazon EC2 console at https://console.amazonaws.cn/ec2/.

  2. In the navigation pane, choose Volumes.

  3. Select the volume for which to pause I/O and choose Actions, Fault injection, Pause volume I/O.

  4. For Duration, enter the duration for which to pause I/O between the volume and the instances. The field next to the Duration dropdown list shows the duration in ISO 8601 format.

  5. In the Service access section, select the IAM service role for Amazon FIS to assume to perform the experiment. You can use either the default role, or an existing role that you created. For more information, see Create an IAM role for Amazon FIS experiments.

  6. Choose Pause volume I/O. When prompted, enter start in the confirmation field and choose Start experiment.

  7. Monitor the progress and impact of your experiment. For more information, see Monitoring Amazon FIS in the Amazon FIS User Guide.