# Migrating PostgreSQL Databases to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL


This walkthrough gets you started with homogeneous database migration from PostgreSQL to Amazon Relational Database Service (Amazon RDS) for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition. This guide provides a quick overview of the data migration process and provides suggestions on how to select the best option to use.

Customers looking to migrate self-managed PostgreSQL databases to Amazon RDS for PostgreSQL or Aurora PostgreSQL, can use one of the three main approaches.
+ Use a native or third-party database migration method such as pg\$1dump and pg\$1restore for full load only migrations.
+ Use a managed service such as Amazon Database Migration Service (Amazon DMS) for full load and ongoing replication.
+ Use a native tool for full load and a managed Amazon DMS service for ongoing replication. We call this strategy the *hybrid approach*.

This document describes the hybrid approach. The following diagram shows the components of the hybrid approach.

![\[PostgreSQL Databases to Amazon RDS for PostgreSQL hybrid migration approach\]](http://docs.amazonaws.cn/en_us/dms/latest/sbs/images/postgresql-rds-postgresql-migration-approach.png)


The hybrid approach provides the following advantages.
+ Automation of the creation of secondary database objects such as views, indexes, and constraints.
+  Amazon DMS data validation to ensure that your target data matches with the source, row by row and column by column.
+ Other capabilities provided by Amazon DMS, for example CloudWatch monitoring and table statistics. It may be simpler to use Amazon DMS to track migration progress, transactional workload, receive and transmit throughput, source and target latency, and so on.

This document describes the native options for the full load. It also includes a comparison so that you can evaluate the options for your migration requirements. In conclusion, you can find a brief description of how to use Amazon DMS for ongoing replication.

**Topics**
+ [

## Summary
](#_summary)
+ [

# Full load PostgreSQL database migration
](chap-manageddatabases.postgresql-rds-postgresql-full-load.md)
+ [

# Full load PostgreSQL database migration options performance comparison
](chap-manageddatabases.postgresql-rds-postgresql-performance-comparison.md)
+ [

# Migrate PostgreSQL database with Amazon DMS ongoing replication
](chap-manageddatabases.postgresql-rds-postgresql-ongoing-replication.md)

## Summary


This document describes the hybrid approach for migrating between PostgreSQL databases. We analyzed three options for full load and demonstrated the relative performance of each for a test database.

If you need to create secondary database objects, then pg\$1dump and pg\$1restore is the most appropriate option. However, this option incurs a performance tradeoff compared to other options.

pglogical has a slight performance advantage over publisher and subscriber. However, you need to install the pglogical extension on your source database server.

You can use these guidelines to choose the option that best matches your migration goal.

For more information about the performance of these tools, see [Performance Comparison](chap-manageddatabases.postgresql-rds-postgresql-performance-comparison.md).

# Full load PostgreSQL database migration


The full load migration phase populates the target database with a copy of the source data. This chapter describes the following native methods to help you choose the one that best matches your migration scenario.
+ pg\$1dump and pg\$1restore
+ Publisher and Subscriber
+ pglogical

We recommend that you begin by reviewing the following table to understand the tools suitable for your use case.


| Method | Supported versions | Support of metadata migration | Suitable database sizes | Performance | 
| --- | --- | --- | --- | --- | 
|  pg\$1dump and pg\$1restore  |  All versions of PostgreSQL  |  Yes  |  100 GB or less  |  Medium  | 
|  Publisher and Subscriber  |  PostgreSQL 10.0 and higher  |  No  |  Any size  |  High  | 
|  pglogical  |  PostgreSQL 9.4 and higher  |  Yes  |  Any size  |  High  | 

The suitable database sizes provided in the preceding table are the Amazon DMS recommendations. These recommendations are based on customer migration experiences and aren’t the limitation of the native tools.

**Topics**
+ [

# Preparing for Ongoing Replication
](chap-manageddatabases.postgresql-rds-postgresql-full-load-preparing.md)
+ [

# PostgreSQL pg\$1dump and pg\$1restore utility
](chap-manageddatabases.postgresql-rds-postgresql-full-load-pd_dump.md)
+ [

# PostgreSQL publisher and subscriber model
](chap-manageddatabases.postgresql-rds-postgresql-full-load-publisher.md)
+ [

# PostgreSQL pglogical extension
](chap-manageddatabases.postgresql-rds-postgresql-full-load-pglogical.md)

# Preparing for Ongoing Replication


Before you start full load, make sure that you record the current log sequence number (LSN) as the starting position for ongoing replication. Use this LSN when you configure the ongoing replication task in Amazon DMS.

To avoid data loss or duplication with hybrid approach, make sure of the following:
+ You create a replication slot on the source database before you start the full load using the following command:

  ```
  SELECT * FROM pg_create_logical_replication_slot('test_slot', 'test_decoding');
  ```
+ When you create a replication slot, your source database doesn’t have open transactions. To confirm, use the following command:

  ```
  SELECT * FROM pg_stat_activity where state <> 'idle';
  ```
+ If you have open transactions, wait for them to complete or cancel them.

To capture the LSN, use the following command.

```
SELECT slot_name, confirmed_flush_lsn from pg_replication_slots where slot_name like 'test_slot';

slot_name | confirmed_flush_lsn
test_slot | 12/68000000
```

From the output of the preceding command, copy the `confirmed_flush_lsn` value. In the example preceding, this value is set to `12/68000000`. After you complete the full load, you can use this value as the start position for the Amazon DMS task.

# PostgreSQL pg\$1dump and pg\$1restore utility


pg\$1dump and pg\$1restore is a native PostgreSQL client utility. You can find this utility as part of the database installation. It produces a set of SQL statements that you can run to reproduce the original database object definitions and table data.

The pg\$1dump and pg\$1restore utility is suitable for the following use cases if:
+ Your database size is less than 100 GB.
+ You plan to migrate database metadata as well as table data.
+ You have a relatively large number of tables to migrate.

The pg\$1dump and pg\$1restore utility may not be suitable for the following use cases if:
+ Your database size is greater than 100 GB.
+ You want to avoid downtime.

## Example


At a high level, you can use the following steps to migrate the [https://github.com/aws-samples/aws-database-migration-samples/tree/master/PostgreSQL/sampledb/v1](https://github.com/aws-samples/aws-database-migration-samples/tree/master/PostgreSQL/sampledb/v1) database.

1. Export data to one or more dump files.

1. Create a target database.

1. Import the dump file or files.

1. (Optional) Migrate database roles and users.

## Export Data


You can use the following command to create dump files for your source database.

```
pg_dump -h <hostname> -p 5432 -U <username> -Fc -b -v -f <dumpfilelocation.sql> -d  <database_name>

-h is the name of source server where you would like to migrate your database.
-U is the name of the user present on the source server
-Fc: Sets the output as a custom-format archive suitable for input into pg_restore.
-b: Include large objects in the dump.
-v: Specifies verbose mode
-f: Dump file path
```

## Create a Database on Your Target Instance


First, login to your target database server.

```
psql -h <hostname> -p 5432 -U <username> -d <database_name>

-h is the name of target server where you would like to migrate your database.
-U is the name of the user present on the target server.
-d is the name of database name present on target already.
```

Then, use the following command to create a database.

```
create database migrated_database;
```

## Import Dump Files


You can use the following command to import the dump file into your Amazon RDS instance.

```
pg_restore -v -h <hostname> -U <username> -d <database_name> -j 2 <dumpfilelocation.sql>

-h is the name of target server where you would like to migrate your database.
-U is the name of the user present on the target server.
-d is the name of database name that was created in step 2.
<dumpfilelocation.sql> is the dump file that was created to generate the script of the database using pg_dump
```

## Migrate Database Roles and Users


To export such database objects as roles and users, you can use the `pg_dumpall` utility.

To generate a script for users and roles, run the following command on the source database.

```
pg_dumpall -U <username> -h <hostname>  -f <dumpfilelocation.sql> --no-role-passwords -g


-h is the name of source server where you would like to migrate your database.
-U is the name of the user present on the source server.
-f: Dump file path.
-g: Dump only global objects (roles and tablespaces), no databases.
```

To restore users and roles, run the following command on your target database.

```
psql -h <hostname> -U <username> -f <dumpfilelocation.sql>

-h is the name of target server where you would like to migrate your database.
-U is the name of the user present on the target server.
-f: Dump file path.
```

To complete the export and import operations, the pg\$1dump and pg\$1restore requires some time. This time depends on the following parameters.
+ The size of your source database.
+ The number of jobs.
+ The resources that you provision for your instance used to invoke pg\$1dump and pg\$1restore.

# PostgreSQL publisher and subscriber model


In PostgreSQL, logical replication uses a publisher and subscriber model. In this model, one or more subscribers subscribe to one or more publications on a publisher node. Subscribers pull data from the publications they subscribe to and may subsequently re-publish data to allow cascading replication or more complex configurations. PostgreSQL version 10.0 and higher supports the native publisher and subscriber model.

The publisher and subscriber model is suitable for the following use cases if:
+ Your database size is greater than 100 GB.
+ You have an existing schema in your target database and migrate only data.
+ You want to capture ongoing changes.
+ You want to minimize downtime.

The publisher and subscriber may not be suitable for the following use cases if:
+ You plan to migrate database metadata.
+ Your source PostgreSQL database version is lower than 10.x.
+ You want to replicate the schema, DDL, and sequences.

## Example


The migration process involves copying a snapshot of the publishing database to the subscriber. This step is also called the table synchronization phase. To reduce the time spent in this phase, you can spawn multiple table synchronization workers. However, you can only have one synchronization worker for each table.

The following example shows how to migrate all tables from a public schema.

## Configure the Source Database


To configure the source database with built-in logical replication, complete the following steps.

1. In the source database, edit the `postgresql.conf` file to add the following parameters.

   ```
   wal_level = 'logical'
   max_replication_slots = 10
   max_wal_senders = 10
   ```

   Make sure that you set `wal_level` to `logical`. This adds information necessary to support logical decoding. Also, make sure that you set `max_replication_slots` to at least the number of subscriptions expected to connect, plus some reserve for table synchronization. Finally, make sure that you set `max_wal_senders` to at least the same value as `max_replication_slots` plus the number of physical replicas that are connected at the same time.

1. Restart the source PostgreSQL instance for these parameters to take effect.

1. To make sure that you configured parameters on your source database correctly, run the following command.

   ```
   psql -h <hostname> -p 5432 -U <username> -d <database_name> -c "select name, setting from pg_settings where name in ('wal_level','max_worker_processes','max_replication_slots','max_wal_senders','shared_preload_libraries');"
   
   -h is the name of source server where you would like to migrate your database.
   -U is the name of the user present on the source server.
   -d is the name of database name present on source already.
   ```

## Set Up the Logical Replication


Now, you can configure built-in logical replication between your self-managed PostgreSQL databases and Amazon RDS for PostgreSQL or Aurora PostgreSQL.

To create the publication on the source database server, run the following command.

```
CREATE PUBLICATION my_publication FOR ALL TABLES;
```

You can specify only the tables that you want to publish. You can also limit the changes that will be published.

To replicate `DELETE` and `UPDATE` operations, make sure that the `published` table has a replica identity, which can be a primary key. This makes it possible for the subscriber to identify the modified rows. You can replicate `INSERT` operations without a replica identity. After you created the publications, you can create subscriptions in the subscriber node.

Before you create a subscriber node, make sure that you created the target Amazon RDS for PostgreSQL or Aurora PostgreSQL database. For more information, see [Creating a PostgreSQL DB instance and connecting to a database on a PostgreSQL DB instance](https://docs.amazonaws.cn/AmazonRDS/latest/UserGuide/CHAP_GettingStarted.CreatingConnecting.PostgreSQL.html).

To create a database on Amazon RDS for PostgreSQL or Aurora PostgreSQL, run the following command.

```
psql -h <hostname> -p 5432 -U <username> -d <database_name> -c "create database migrated_database;"

-h is the name of target server where you would like to migrate your database.
-U is the name of the user present on the target server.
-d is the name of database name present on target already.
```

To create the subscription on your target database, run the following command.

```
CREATE SUBSCRIPTION <subsription_name>
CONNECTION 'host=<host> port=<port_number> dbname=<database_name> user=<username> password=<password>' PUBLICATION <publication_name> WITH (copy_data=true);

-subscription_name: Provide the name of the subscription created at target.
-host: Provide the hostname of the source database.
-port_number: Provide the port on which the source database is running.
-database_name: Provide the name of the database where the publication is created.
-publication_name: Provide the name of the publication created at source.
-copy_data: Specifies whether the existing data in the publications that are being subscribed to should be copied once the replication starts. The default is true.
```

## Verify that the Data Replication Is Running


Make sure that no active transactions or data changes are happening on your source database. Then, check the status of your replication by running the following statement on your source database. Make sure that the WAL locations are the same for the `sent_location`, `write_location`, and `replay_location`. This indicates that the target database is at the same LSN position as the source database.

```
SELECT * FROM pg_stat_replication;
```

## Stop the Replication


When the data is in sync between your source and target databases, stop the subscriber on your target database.

```
ALTER SUBSCRIPTION <subsription_name> DISABLE;

-subscription_name: Provide the name of the subscription created at target.
```

Capture the slot name created by the publisher and subscriber on the target database.

```
select subslotname from pg_subscription where subname like 'subsription_name';

-subscription_name: Provide the name of the subscription created at target.
```

Capture the `confirmed_flush_lsn` value from the replication slot fetched. You can use this value as the start position for the Amazon DMS task.

```
SELECT slot_name, confirmed_flush_lsn from pg_replication_slots where slot_name like 'replication_slot_name';
```

## Drop Publication and Subscription Artifacts


To drop the subscription on your target database, run the following command.

```
DROP SUBSCRIPTION <subsription_name>;

-subscription_name: Provide the name of the subscription created at target.
```

To drop the publication on your source database, run the following command.

```
DROP PUBLICATION <publication_name>;

-publication_name: Provide the name of the publication created at source.
```

# PostgreSQL pglogical extension


The pglogical extension for PostgreSQL implements logical streaming replication, using a similar publish and subscribe built-in approach.

The pglogical extension is suitable for the following use cases if:
+ Your database size is greater than 100 GB.
+ You want to replicate the schema, DDL, sequences, and table data.
+ You want to capture ongoing changes.
+ You want to avoid downtime.

The pglogical extension may not be suitable for the following use cases if:
+ You have `UNLOGGED` and `TEMPORARY` tables.
+ You plan to migrate database metadata.

## Example


The following example shows how to migrate the public schema.

## Configure the Source Database


To configure the source database with logical replication, complete the following steps.

1. In the source database, edit the `postgresql.conf` file to add the following parameters.

   ```
   wal_level = 'logical'
   max_worker_processes = 10
   max_replication_slots = 10
   max_wal_senders = 10
   shared_preload_libraries = 'pglogical'
   ```

1. Restart the source PostgreSQL instance for these parameters to take effect.

1. To make sure that you configured parameters on your source database correctly, run the following command.

   ```
   psql -h <hostname> -p 5432 -U <username> -d <database_name> -c "select name, setting from pg_settings where name in ('rds.logical_replication','shared_preload_libraries');"
   
   -h is the name of source server where you would like to migrate your database.
   -U is the name of the user present on the source server.
   -d is the name of database name present on source already.
   ```

## Configure the Target Database


By default, Amazon RDS for PostgreSQL and Aurora PostgreSQL have the `pglogical` extension. To configure the target DB parameter group, complete the following steps.

1. To turn on the logical replication in the target database, set the following parameters in the database parameter group. For more information, see [Working with parameter groups](https://docs.amazonaws.cn/AmazonRDS/latest/UserGuide/USER_WorkingWithParamGroups.html).

   ```
   rds.logical_replication=1
   shared_preload_libraries = 'pglogical'
   ```

1. Reboot your Amazon RDS instance for these parameters to take effect.

1. To make sure that you configured parameters on your target database correctly, run the following command.

   ```
   psql -h <hostname> -p 5432 -U <username> -d <database_name> -c "select name, setting from pg_settings where name in ('rds.logical_replication','shared_preload_libraries');"
   
   -h is the name of target server where you would like to migrate your database.
   -U is the name of the user present on the target server.
   -d is the name of database name present on target already.
   ```

## Set Up the Logical Replication


Now, you can configure the logical replication between your self-managed PostgreSQL databases and Amazon RDS for PostgreSQL or Aurora PostgreSQL.

1. Download the `pglogical rpm` and install it on your source database.
   + For PostgreSQL 9.6, run the following command.

     ```
     curl https://access.2ndquadrant.com/api/repository/dl/default/release/9.6/rpm | bash
     yum install postgresql96-pglogical
     ```
   + For PostgreSQL 10, run the following command.

     ```
     curl https://access.2ndquadrant.com/api/repository/dl/default/release/10/rpm | bash
     yum install postgresql10-pglogical
     ```

1. Create the `pglogical` extension on your provider and subscriber.

   ```
   CREATE EXTENSION pglogical;
   ```

1. Create the publisher node on your source database.

   ```
   SELECT pglogical.create_node(
       node_name := 'publisher_name',
       dsn := 'host=<publisher_hostname> port=port_number dbname=<database_name>'
   );
   
   -publisher_name: Provide the name of the publication created at source.
   -publisher_hostname: Provide the hostname of the source database.
   -port_number: Provide the port on which the source database is running.
   -database_name: Provide the name of the database where the publication is created.
   ```

1. Add tables in public schema to the default replication set.

   ```
   SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);
   ```

1. Create the subscriber node on target database.

   ```
   SELECT pglogical.create_node(
       node_name := 'subscriber_name',
       dsn := 'host=<subscriber_hostname> port=port_number dbname=<database_name>'
   );
   
   -subscriber_hostname: Provide the hostname of the target database.
   -port_number: Provide the port on which the target database is running.
   -database_name: Provide the name of the database where the subscription is created.
   -subscriber_name: Provide the name of the subscription created at target.
   ```

1. Create the subscription on the subscriber node. This subscription starts synchronization and replication processes in background.

   ```
   SELECT pglogical.create_subscription(
       subscription_name := 'subscription_name',
       provider_dsn := 'host=<publisher_hostname> port=port_number dbname=<database_name>'
   );
   
   SELECT pglogical.wait_for_subscription_sync_complete('subscription_name');
   
   -publisher_hostname: Provide the hostname of the source database.
   -port_number: Provide the port on which the target database is running.
   -database_name: Provide the name of the database where the subscription is created.
   -subscription_name: Provide the name of the subscription created at target.
   ```

## Verify that the Data Replication Is Running


Make sure that no active transactions or data changes are happening on your source database. Then, check the status of your replication by running the following statement on your source database. Make sure that the WAL locations are the same for the `sent_location`, `write_location`, and `replay_location`. This indicates that the target database is at the same LSN position as the source database.

```
SELECT * FROM pg_stat_replication;
```

## Stop the Replication


When the data is in sync between your source and target databases, stop the subscriber on your target database.

```
select pglogical.alter_subscription_disable('subscriber_name');

-subscriber_name: Provide the name of the subscription created at target.
```

Capture the `confirmed_flush_lsn` value from the replication slot created by the `pglogical` setup. You can use this value as the start position for the Amazon DMS task.

```
SELECT slot_name, confirmed_flush_lsn from pg_replication_slots where slot_name like 'replication_slot_name';
```

## Drop the Subscription


To drop the subscription on your target database, run the following command.

```
select pglogical.drop_subscription('subscriber_name');

-subscriber_name: Provide the name of the subscription created at target.
```

# Full load PostgreSQL database migration options performance comparison


We analyzed the performance of pg\$1dump and pg\$1restore, publisher and subscriber, and pglogical in a full load migration. We migrated a 70 GB database that includes 6 tables and LOB data. We lifted and shifted this database from the source to the target.

The following image represents the performance comparison of the three migration methods. We expect similar performance trends for larger datasets.

![\[Full load performance comparison\]](http://docs.amazonaws.cn/en_us/dms/latest/sbs/images/postgresql-rds-postgresql-performance.png)


We performed this test to provide a basic overview of the full load performance. This performance may vary because it depends on such factors as network bandwidth, data structure, data size, and so on.

The elapsed time shown in the diagram is the actual migration time. It doesn’t include the time spent on implementing prerequisites.

You can compare the results of all three methods.
+ 35 minutes is the total elapsed time for pglogical.
+ 37 minutes is the total elapsed time for publisher and subscriber.
+ 46 minutes is the total elapsed time for pg\$1dump and pg\$1restore. This time includes:
  + 19 minutes to unload data using pg\$1dump.
  + 27 minutes to load data using pg\$1restore.

From the comparison, you can see that pglogical has the best performance among the three full load options. Consider this approach if you don’t need to migrate secondary database objects such as views, stored procedures, triggers, and so on. This is the preferred approach where the database size is greater than 100 GB and when you don’t have transformation or filtering requirements.

Publisher and subscriber may be appropriate if you don’t need to migrate secondary database objects such as views, stored procedures, triggers, and so on. You can use publisher and subscriber for smaller migrations where ease of use considerations override the minor performance gains provided by pglogical.

Using pg\$1dump and pg\$1restore is slower than both pglogical and publisher and subscriber. pg\$1dump is the only option that migrates your secondary database objects. Additionally, data files created by pg\$1dump may be orders of magnitude larger than the original table size.

# Migrate PostgreSQL database with Amazon DMS ongoing replication


After you complete the full load, make sure that you perform ongoing replication using Amazon DMS to keep the source and target databases in sync. To configure the ongoing replication task, open the Database Migration Service console. On the **Create database migration task** page, follow these steps to create a migration task.

1. For **Migration type**, choose **Replicate ongoing changes**.

1. Under **CDC start mode for source transactions**, choose **Enable custom CDC start mode**.

1. Under **Custom CDC start point**, paste the native start point you captured when you prepared for ongoing replication. For more information, see [Preparing for Ongoing Replication](chap-manageddatabases.postgresql-rds-postgresql-full-load-preparing.md).
**Note**  
PostgreSQL as a source doesn’t support a custom CDC start time. This is because the PostgreSQL database engine doesn’t have a way to map a timestamp to an LSN or SCN as Oracle and SQL Server do.

For more information, see [Creating tasks for ongoing replication](https://docs.amazonaws.cn/dms/latest/userguide/CHAP_Task.CDC.html) and [Migrate from PostgreSQL to Amazon RDS](https://www.amazonaws.cn/getting-started/hands-on/move-to-managed/migrate-postgresql-to-amazon-rds/).