PostgreSQL Endpoint Troubleshooting - Amazon Database Migration Service
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

PostgreSQL Endpoint Troubleshooting

This section contains replication scenarios specific to PostgreSQL.

Long-running transaction on source

When there are long-running transactions in the source database, such as a few thousand inserts in a single transaction, the DMS CDC event and transaction counters do not increase until the transaction is complete. This delay can cause latency issues that you can measure using the CDCLatencyTarget metric.

To review long-running transactions, do one of the following:

  • Use the pg_replication_slots view. If the restart_lsn value isn't updating, it is likely that PostgreSQL is unable to release Write Ahead Logs (WALs) due to long-running active transactions. For information about the pg_replication_slots view, see pg_replication_slots in the PostgreSQL 15.4 Documentation.

  • Use the following query to return a list of all active queries in the database, along with related information:

    SELECT pid, age(clock_timestamp(), query_start), usename, query FROM pg_stat_activity WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%' ORDER BY query_start desc;

    In the query results, the age field shows the active duration of each query, which you can use to identify long-running queries.

High workload on source

If your source PostgreSQL has a high workload, check the following to reduce latency:

  • You may experience high latency when using the test_decoding plugin while migrating a subset of tables from the source database with a high transactions per second (TPS) value. This is because the test_decoding plugin sends all database changes to the replication instance which DMS then filters, based on the task’s table mapping. Events for tables that aren’t part of the task’s table mapping can increase source latency.

  • Check TPS throughput using one of the following methods.

    • For Aurora PostgreSQL sources, use the CommitThroughput CloudWatch metric.

    • For PostgreSQL running on Amazon RDS or on-premises, use the following query using a PSQL client version 11 or higher (Press enter during the query to advance the results):

      SELECT SUM(xact_commit)::numeric as temp_num_tx_ini FROM pg_stat_database; \gset select pg_sleep(60); SELECT SUM(xact_commit)::numeric as temp_num_tx_final FROM pg_stat_database; \gset select (:temp_num_tx_final - :temp_num_tx_ini)/ 60.0 as "Transactions Per Second";
  • To reduce latency when using the test_decoding plugin, consider using the pglogical plugin instead. Unlike the test_decoding plugin, the pglogical plugin filters write ahead log (WAL) changes at the source, and only sends relevant changes to the replication instance. For information about using the pglogical plugin with Amazon DMS, see Configuring the pglogical plugin.

High network throughput

Your replication may have high network bandwidth use when using the test_decoding plugin, especially during high-volume transactions. This is because the test_decoding plugin processes changes, and converts them into a human-readable format that is larger than the original binary format.

To improve performance, consider using the pglogical plugin instead, which is a binary plugin. Unlike the test_decoding plugin, the pglogical plugin generates binary format output, resulting in compressed write ahead log (WAL) stream changes.