Fault tolerance - Amazon Web Services Support
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Fault tolerance

You can use the following checks for the fault tolerance category.

Amazon EBS Snapshots

Description

Checks the age of the snapshots for your Amazon EBS volumes (either available or in-use). Failures can occur even if Amazon EBS volumes are replicated. Snapshots are persisted to Amazon S3 for durable storage and point-in-time recovery.

Check ID

H7IgTzjTYb

Alert Criteria
  • Yellow: The most recent volume snapshot is between 7 and 30 days old.

  • Red: The most recent volume snapshot is more than 30 days old.

  • Red: The volume does not have a snapshot.

Recommended Action

Create weekly or monthly snapshots of your volumes. For more information, see Creating an Amazon EBS Snapshot.

To automate the creation of EBS snapshots, you can consider using Amazon Backup or Amazon Data Lifecycle Manager.

Additional Resources

Amazon Elastic Block Store (Amazon EBS)

Amazon EBS Snapshots

Amazon Backup

Amazon Data Lifecycle Manager

Report columns
  • Status

  • Region

  • Volume ID

  • Volume Name

  • Snapshot ID

  • Snapshot Name

  • Snapshot Age

  • Volume Attachment

  • Reason

Amazon ElastiCache Multi-AZ clusters

Description

Checks for ElastiCache clusters that deploy in a single Availability Zone (AZ). This check alerts you if Multi-AZ is inactive in a cluster.

Deployments in multiple AZs enhance ElastiCache cluster availability by asynchronously replicating to read-only replicas in a different AZ. When planned cluster maintenance occurs, or a primary node is unavailable, ElastiCache automatically promotes a replica to primary. This failover allows cluster write operations to resume, and doesn't require an administrator to intervene.

Note

Results for this check are automatically refreshed several times daily, and refresh requests are not allowed. It might take a few hours for changes to appear.

For Business, Enterprise On-Ramp, or Enterprise Support customers, you can use the BatchUpdateRecommendationResourceExclusion API to include or exclude one or more resources from your Trusted Advisor results.

Check ID

ECHdfsQ402

Alert criteria
  • Green: Multi-AZ is active in the cluster.

  • Yellow: Multi-AZ is inactive in the cluster.

Recommended action

Create at least one replica per shard, in an AZ that is different than the primary.

Additional resources

For more information, see Minimizing downtime in ElastiCache (Redis OSS) with Multi-AZ.

Report columns
  • Status

  • Region

  • Cluster Name

  • Last Updated Time

Amazon MemoryDB Multi-AZ clusters

Description

Checks for MemoryDB clusters that deploy in a single Availability Zone (AZ). This check alerts you if Multi-AZ is inactive in a cluster.

Deployments in multiple AZs enhance MemoryDB cluster availability by asynchronously replicating to read-only replicas in a different AZ. When planned cluster maintenance occurs, or a primary node is unavailable, MemoryDB automatically promotes a replica to primary. This failover allows cluster write operations to resume, and doesn't require an administrator to intervene.

Note

Results for this check are automatically refreshed several times daily, and refresh requests are not allowed. It might take a few hours for changes to appear.

For Business, Enterprise On-Ramp, or Enterprise Support customers, you can use the BatchUpdateRecommendationResourceExclusion API to include or exclude one or more resources from your Trusted Advisor results.

Check ID

MDBdfsQ401

Alert Criteria
  • Green: Multi-AZ is active in the cluster.

  • Yellow: Multi-AZ is inactive in the cluster.

Recommended Action

Create at least one replica per shard, in an AZ that is different than the primary.

Additional Resources

For more information, see Minimizing downtime in MemoryDB with Multi-AZ.

Report columns
  • Status

  • Region

  • Cluster Name

  • Last Updated Time

Amazon RDS Backups

Description

Checks for automated backups of Amazon RDS DB instances.

By default, backups are enabled with a retention period of one day. Backups reduce the risk of unexpected data loss and allow for point-in-time recovery.

Note

This check reports the resources that are flagged by the criteria and the total number of resources evaluated, including OK resources. The resources table lists only the flagged resources.

Check ID

opQPADkZvH

Alert Criteria

Red: A DB instance has the backup retention period set to 0 days.

Recommended Action

Set the retention period for the automated DB instance backup to 1 to 35 days as appropriate to the requirements of your application. See Working With Automated Backups.

Additional Resources

Getting Started with Amazon RDS

Report columns
  • Status

  • Region/AZ

  • DB Instance

  • VPC ID

  • Backup Retention Period

Amazon S3 Bucket Logging

Description

Checks the logging configuration of Amazon Simple Storage Service (Amazon S3) buckets.

When server access logging is enabled, detailed access logs are delivered hourly to a bucket that you choose. An access log record contains details about each request, such as the request type, the resources specified in the request, and the time and date the request was processed. By default, bucket logging is not enabled. You should enable logging if you want to perform security audits or learn more about users and usage patterns.

When logging is initially enabled, the configuration is automatically validated. However, future modifications can result in logging failures. This check examines explicit Amazon S3 bucket permissions, but it does not examine associated bucket policies that might override the bucket permissions.

Check ID

BueAdJ7NrP

Alert Criteria
  • Yellow: The bucket does not have server access logging enabled.

  • Yellow: The target bucket permissions do not include the root account, so Trusted Advisor cannot check it.

  • Red: The target bucket does not exist.

  • Red: The target bucket and the source bucket have different owners.

  • Red: The log deliverer does not have write permissions for the target bucket.

Recommended Action

Enable bucket logging for most buckets. See Enabling Logging Using the Console and Enabling Logging Programmatically.

If the target bucket permissions do not include the root account and you want Trusted Advisor to check the logging status, add the root account as a grantee. See Editing Bucket Permissions.

If the target bucket does not exist, select an existing bucket as a target or create a new one and select it. See Managing Bucket Logging.

If the target and source have different owners, change the target bucket to one that has the same owner as the source bucket. See Managing Bucket Logging.

If the log deliverer does not have write permissions for the target (write not enabled), grant Upload/Delete permissions to the Log Delivery group. See Editing Bucket Permissions.

Additional Resources
Report columns
  • Status

  • Region

  • Bucket Name

  • Target Name

  • Target Exists

  • Same Owner

  • Write Enabled

  • Reason

Auto Scaling Group Health Check

Description

Examines the health check configuration for Auto Scaling groups.

If Elastic Load Balancing is being used for an Auto Scaling group, the recommended configuration is to enable an Elastic Load Balancing health check. If an Elastic Load Balancing health check is not used, Auto Scaling can only act upon the health of the Amazon Elastic Compute Cloud (Amazon EC2) instance. Auto Scaling will not act on the application running on the instance.

Check ID

CLOG40CDO8

Alert Criteria
  • Yellow: An Auto Scaling group has an associated load balancer, but the Elastic Load Balancing health check is not enabled.

  • Yellow: An Auto Scaling group does not have an associated load balancer, but the Elastic Load Balancing health check is enabled.

Recommended Action

If the Auto Scaling group has an associated load balancer, but the Elastic Load Balancing health check is not enabled, see Add an Elastic Load Balancing Health Check to your Auto Scaling Group.

If the Elastic Load Balancing health check is enabled, but no load balancer is associated with the Auto Scaling group, see Set Up an Auto-Scaled and Load-Balanced Application.

Additional Resources

Amazon EC2 Auto Scaling User Guide

Report columns
  • Status

  • Region

  • Auto Scaling Group Name

  • Load Balancer Associated

  • Health Check

Auto Scaling Group Resources

Description

Checks the availability of resources associated with your launch configurations, launch templates, and your Auto Scaling groups.

Auto Scaling groups that point to unavailable resources cannot launch new Amazon Elastic Compute Cloud (Amazon EC2) instances. When properly configured, Auto Scaling causes the number of Amazon EC2 instances to increase seamlessly during demand spikes, and decrease automatically during demand lulls. Auto Scaling groups and launch configurations/launch templates that point to unavailable resources do not operate as intended.

Note

This check reports the resources that are flagged by the criteria and the total number of resources evaluated, including OK resources. The resources table lists only the flagged resources.

Check ID

8CNsSllI5v

Alert Criteria
  • Red: An Auto Scaling group is associated with a deleted load balancer.

  • Red: A launch configuration is associated with a deleted Amazon Machine Image (AMI).

  • Red: A launch template is associated with a deleted Amazon Machine Image (AMI).

Recommended Action

If the load balancer has been deleted, either create a new load balancer or target group and then associate it to the Auto Scaling group. or create a new Auto Scaling group without the load balancer. For information about creating a new Auto Scaling group with a new load balancer, see Set Up an Auto-Scaled and Load-Balanced Application. For information about creating a new Auto Scaling group without a load balancer, see Create Auto Scaling Group in Getting Started With Auto Scaling Using the Console.

If the AMI has been deleted, then create a new launch configuration or launch template version using a valid AMI and associate it with an Auto Scaling group. For information on how to create a new launch configuration, see Create a launch configuration in the Amazon EC2 Auto Scaling User Guide. For information on how to create a launch template, see Create a launch template for an Auto Scaling group in the Amazon EC2 Auto Scaling User Guide.

Note

For security reasons, the check results don’t include any resources referenced using Amazon Systems Manager parameters in the launch template.

If your launch templates include an Amazon Systems Manager parameter that include an Amazon Machine Image (AMI) ID , then review the launch template to make sure that the parameters reference a valid AMI ID, or make the appropriate changes in the Amazon Systems Manager parameter store. For more information, see Use Amazon Systems Manager parameters instead of AMI IDs in the Amazon EC2 Auto Scaling User Guide.

Additional Resources
Report columns
  • Status

  • Region

  • Auto Scaling Group Name

  • Launch Type

  • Resource Type

  • Resource Name

CLB Connection Draining

Description

Checks for Classic load balancers that do not have connection draining enabled.

When connection draining is not enabled and you deregister an Amazon EC2 instance from a Classic load balancer, the Classic load balancer stops routing traffic to that instance and closes the connection. When connection draining is enabled, the Classic load balancer stops sending new requests to the deregistered instance but keeps the connection open to serve active requests.

Check ID

7qGXsKIUw

Alert Criteria
  • Yellow: Connection draining is not enabled for a Classic load balancer.

  • Green: Connection draining is enabled for Classic load balancer. .

Recommended Action

Enable connection draining for the Classic load balancer. For more information, see Connection Draining and Enable or Disable Connection Draining for Your Load Balancer.

Additional Resources

Elastic Load Balancing Concepts

Report columns
  • Status

  • Region

  • Load Balancer Name

  • Reason

Load Balancer Optimization

Description

Checks your load balancer configuration.

To help increase the level of fault tolerance in Amazon Elastic Compute Cloud (Amazon EC2) when using Elastic Load Balancing , we recommend running an equal number of instances across multiple Availability Zones in a Region. A load balancer that is configured accrues charges, so this is a cost-optimization check as well.

Check ID

iqdCTZKCUp

Alert Criteria
  • Yellow: A load balancer is enabled for a single Availability Zone.

  • Yellow: A load balancer is enabled for an Availability Zone that has no active instances.

  • Yellow: The Amazon EC2 instances that are registered with a load balancer are unevenly distributed across Availability Zones. (The difference between the highest and lowest instance counts in utilized Availability Zones is more than 1, and the difference is more than 20% of the highest count.)

Recommended Action

Ensure that your load balancer points to active and healthy instances in at least two Availability Zones. For more information, see Add Availability Zone.

If your load balancer is configured for an Availability Zone with no healthy instances, or if there is an imbalance of instances across the Availability Zones, determine if all the Availability Zones are necessary. Omit any unnecessary Availability Zones and ensure there is a balanced distribution of instances across the remaining Availability Zones. For more information, see Remove Availability Zone.

Additional Resources
Report columns
  • Status

  • Region

  • Load Balancer Name

  • # of Zones

  • Zone a Instances

  • Zone b Instances

  • Zone c Instances

  • Zone d Instances

  • Zone e Instances

  • Zone f Instances

  • Reason