Troubleshooting latency issues in Amazon Database Migration Service - Amazon Database Migration Service
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Troubleshooting latency issues in Amazon Database Migration Service

This section provides an overview of the common causes for Amazon DMS task latency during the ongoing replication phase (CDC). Amazon DMS replicates data asynchronously. Latency is the delay between when a change was committed on the source and when the change was replicated to the target. Latency can be caused due to misconfiguration of replication components, such as the following:

  • Source endpoint or data source

  • Target endpoint or data source

  • Replication instances

  • The network between these components

We recommend that you use a test migration as a proof of concept to gather information about your replication. You can then use this information for tuning your replication configuration to minimize latency. For information about running a proof of concept migration, see Running a proof of concept.

Types of CDC latency

This section contains types of replication latency that may occur during CDC.

Source latency

The delay, in seconds, between the commit time of the last event captured from the source endpoint, and the current system timestamp of the replication instance. You can monitor the latency between the data source and your replication instance using the CDCLatencySource CloudWatch metric. A high CDCLatencySource metric indicates that the process of capturing changes from the source is delayed. For example, if your application commits an insert to the source at 10:00, and Amazon DMS consumes the change at 10:02, the CDCLatencySource metric is 120 seconds.

For information about CloudWatch metrics for Amazon DMS, see Replication task metrics.

Target latency

The delay, in seconds, between the commit time on the source of the first event waiting to commit to the target, and the current timestamp of the DMS replication instance. You can monitor the latency between commits on the data source and your data target using the CDCLatencyTarget CloudWatch metric. This means that CDCLatencyTarget includes any delays in reading from the source. As a result, CDCLatencyTarget is always greater than or equal to CDCLatencySource.

For example, if your application commits an insert to the source at 10:00, and Amazon DMS consumes it at 10:02 and writes it to the target at 10:05, the CDCLatencyTarget metric is 300 seconds.

Common causes of CDC latency

This section contains causes of latency that your replication may experience during CDC.

Endpoint resources

The following factors significantly affect replication performance and latency:

  • Source and target database configurations

  • Instance size

  • Under-provisioned or misconfigured source or target data stores

To identify causes for latency caused by endpoint issues for Amazon-hosted sources and targets, monitor the following CloudWatch metrics:

  • FreeMemory

  • CPUUtilization

  • Throughput and I/O metrics, such as WriteIOPS, WriteThroughput, or ReadLatency

  • Transaction volume metrics such as CDCIncomingChanges.

For information about monitoring CloudWatch metrics, see Amazon Database Migration Service metrics.

Replication instance resources

Replication instance resources are critical for replication, and you should make sure that there are no resource bottlenecks, as they can lead to both source and target latency.

To identify resource bottlenecks for your replication instance, verify the following:

  • Critical CloudWatch metrics such as CPU, Memory, I/O per second, and storage are not experiencing spikes or consistenly high values.

  • Your replication instance is sized appropriately for your workload. For information about determining the correct size of a replication instance, see Selecting the best size for a replication instance.

Network speed and bandwidth

Network bandwith is a factor that affects data transmission. To analyze the network performance of your replication, do one of the following:

  • Check the ReadThroughput and WriteThroughput metrics at the instance level. For information about monitoring CloudWatch metrics, see Amazon Database Migration Service metrics.

  • Use the Amazon DMS Diagnostic Support AMI. If the Diagnostic Support AMI is not available in your region, you can download it from any supported region and copy it to your region to perform your network analysis. For information about the Diagnostic Support AMI, see Working with the Amazon DMS diagnostic support AMI.

CDC in Amazon DMS is single-threaded to ensure data consistency. As a result, you can determine the data volume your network can support by calculating your single-threaded data transfer rate. For example, if your task connects to its source using a 100 Mbps (megabits per second) network, your replication has a theoretical maximum bandwidth allocation of 12.5 MBps (megabytes per second). This is equal to 45 gigabits per hour. If the rate of transaction log generation on the source is greater than 45 gigabits per hour, this means that the task has CDC latency. For a 100 MBps network, these rates are theoretical maximums; other factors such as network traffic and resource overhead on the source and target reduce the actual available bandwidth.

DMS configuration

This section contains recommended replication configurations that can help reduce latency.

  • Endpoint settings: Your source and target endpoint settings can cause your replication instance to suffer poor performance. Endpoint settings that turn on resource-intensive features will impact performance. For example, for an Oracle endpoint, disabling LogMiner and using Binary Reader improves performance, since LogMiner is resource-intensive. The following endpoing setting improves performance for an Oracle endpoint:

    useLogminerReader=N;useBfile=Y

    For more information about endpoint settings, see the documentation for your source and target endpoint engine in the Working with Amazon DMS endpoints topic.

  • Task settings: Some task settings for your particular replication scenario can cause your replication instance to suffer poor performance. For example, Amazon DMS uses transactional apply mode by default (BatchApplyEnabled=false) for CDC for all endpoints except for Amazon Redshift. However, for sources with a large number of changes, setting BatchApplyEnabled to true may improve performance.

    For more information about task settings, see Specifying task settings for Amazon Database Migration Service tasks.

  • Start Position of a CDC only task: Starting a CDC-only task from a position or timestamp in the past will start the task with increased CDC source latency. Depending on the volume of changes on the source, task latency will take time to subside.

  • LOB settings: Large Object data types can hinder replication performance due to the way Amazon DMS replicates large binary data. For more information, see the following topics:

Replication scenarios

This section describes specific replication scenarios and how they may affect latency.

Stopping a task for an extended period of time

When you stop a task, Amazon DMS saves the position of the last transaction log that was read from the source. When you resume the task, DMS tries to continue reading from the same transaction log position. Resuming a task after several hours or days causes CDC source latency to increase until DMS finishes consuming the transaction backlog.

Cached changes

Cached changes are changes that your application writes to the data source while Amazon DMS runs the full-load replication phase. DMS doesn't apply these changes until the full-load phase completes and the CDC phase starts. For a source with large number of transactions, cached changes take longer to apply, so source latency increases when the CDC phase starts. We recommend that you run the full-load phase when transaction volumes are low to minimize the number of cached changes.

Cross-region replication

Locating your DMS endpoints or your replication instance in different Amazon regions increases network latency. This increases replication latency. For best performance, locate your source endpoint, target endpoint, and replication instance in the same Amazon region.