Failover process for an RDS Custom for Oracle Multi-AZ deployment - Amazon Relational Database Service
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Failover process for an RDS Custom for Oracle Multi-AZ deployment

If a planned or unplanned outage of your DB instance results from an infrastructure defect, Amazon RDS automatically switches to a standby replica in another Availability Zone if you have turned on Multi-AZ. The time that it takes for the failover to complete depends on the database activity and other conditions at the time that the primary DB instance became unavailable. Failover times are typically 60 – 120 seconds. However, large transactions or a lengthy recovery process can increase failover time. When the failover is complete, it can take additional time for the Amazon RDS console to show the new Availability Zone.

Note

You can force a failover manually when you stop and start your primary EC2 host while your DB instance is available.

Amazon RDS handles failovers automatically so you can resume database operations as quickly as possible without administrative intervention. The primary DB instance switches over automatically to the standby replica if any of the conditions described in the following table occurs. You can view these failover reasons in the Amazon RDS event log.

Failover reason Description
The operating system underlying the RDS database instance is being patched in an offline operation. A failover was triggered during the maintenance window for an OS patch or a security update. For more information, see Maintaining a DB instance.
The primary host of the RDS Multi-AZ instance is unhealthy. The Multi-AZ DB instance deployment detected an impaired primary DB instance and failed over.
The primary host of the RDS Multi-AZ instance is unreachable due to loss of network connectivity. RDS monitoring detected a network reachability failure to the primary DB instance and triggered a failover.
The RDS instance was modified by customer. An RDS DB instance modification triggered a failover. For more information, see Modifying your RDS Custom for Oracle DB instance.
The storage volume underlying the primary host of the RDS Multi-AZ instance experienced a failure. The Multi-AZ DB instance deployment detected a storage issue on the primary DB instance and failed over.
The RDS Multi-AZ primary instance is busy and unresponsive. The primary DB instance is unresponsive. We recommend that you do the following: Examine the event and CloudWatch logs for excessive CPU, memory, or swap space usage. For more information, see Working with Amazon RDS event notification and Creating a rule that triggers on an Amazon RDS event. Evaluate your workload to determine whether you're using the appropriate DB instance class. For more information, see DB instance classes.

To determine if your Multi-AZ DB instance has failed over, you can do the following:

  • Set up DB event subscriptions to notify you by email or SMS that a failover has been initiated. For more information about events, see Working with Amazon RDS event notification.

  • View your DB events by using the Amazon RDS console or API operations.

  • View the current state of your RDS Custom for Oracle Multi-AZ DB instance deployment by using the Amazon RDS console, CLI, or API operations.

Time to live (TTL) settings with applications using an RDS Custom for Oracle Multi-AZ deployment

The failover mechanism automatically changes the Domain Name System (DNS) record of the DB instance to point to the standby DB instance. As a result, you need to re-establish any existing connections to your DB instance. Ensure that any DNS cache time-to-live (TTL) configuration value is low, and validate that your application will not cache DNS for an extended time. A high TTL value might prevent your application from quickly reconnecting to the DB instance after failover.