PostgreSQL Endpoint Troubleshooting
This section contains replication scenarios specific to PostgreSQL.
Long-running transaction on source
When there are long-running transactions in the source database, such as a few thousand inserts in a single transaction,
the DMS CDC event and transaction counters do not increase until the transaction is complete. This delay can cause latency
issues that you can measure using the CDCLatencyTarget
metric.
To review long-running transactions, do one of the following:
Use the
pg_replication_slots
view. If therestart_lsn
value isn't updating, it is likely that PostgreSQL is unable to release Write Ahead Logs (WALs) due to long-running active transactions. For information about thepg_replication_slots
view, see pg_replication_slotsin the PostgreSQL 15.4 Documentation . Use the following query to return a list of all active queries in the database, along with related information:
SELECT pid, age(clock_timestamp(), query_start), usename, query FROM pg_stat_activity WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%' ORDER BY query_start desc;
In the query results, the
age
field shows the active duration of each query, which you can use to identify long-running queries.
High workload on source
If your source PostgreSQL has a high workload, check the following to reduce latency:
-
You may experience high latency when using the
test_decoding
plugin while migrating a subset of tables from the source database with a high transactions per second (TPS) value. This is because thetest_decoding
plugin sends all database changes to the replication instance which DMS then filters, based on the task’s table mapping. Events for tables that aren’t part of the task’s table mapping can increase source latency. -
Check TPS throughput using one of the following methods.
For Aurora PostgreSQL sources, use the
CommitThroughput
CloudWatch metric.For PostgreSQL running on Amazon RDS or on-premises, use the following query using a PSQL client version 11 or higher (Press
enter
during the query to advance the results):SELECT SUM(xact_commit)::numeric as temp_num_tx_ini FROM pg_stat_database; \gset select pg_sleep(60); SELECT SUM(xact_commit)::numeric as temp_num_tx_final FROM pg_stat_database; \gset select (:temp_num_tx_final - :temp_num_tx_ini)/ 60.0 as "Transactions Per Second";
To reduce latency when using the
test_decoding
plugin, consider using thepglogical
plugin instead. Unlike thetest_decoding
plugin, thepglogical
plugin filters write ahead log (WAL) changes at the source, and only sends relevant changes to the replication instance. For information about using thepglogical
plugin with Amazon DMS, see Configuring the pglogical plugin.
High network throughput
Your replication may have high network bandwidth use when using the test_decoding
plugin,
especially during high-volume transactions.
This is because the test_decoding
plugin processes changes, and converts them into a
human-readable format that is larger than the original binary format.
To improve performance, consider using the pglogical
plugin instead, which is a binary plugin.
Unlike the test_decoding
plugin, the pglogical
plugin generates binary format output,
resulting in compressed write ahead log (WAL) stream changes.