

# Creating source data providers for homogeneous data migrations in Amazon DMS
Creating source data providers

Amazon DMS Schema Conversion supports the following databases as source data providers for [Homogeneous data migrations](data-migrations.md) projects: Microsoft SQL Server, Oracle, PostgreSQL, MySQL, IBM DB2 for z/OS, and SAP ASE (Sybase ASE). 

For supported database versions, see [Source data providers for DMS homogeneous data migrations](CHAP_Introduction.Sources.md#CHAP_Introduction.Sources.HomogeneousDataMigrations).

Your source data provider can be an on-premises, Amazon EC2, or Amazon RDS database.

**Topics**
+ [

# Using a MySQL compatible database as a source for homogeneous data migrations in Amazon DMS
](dm-data-providers-source-mysql.md)
+ [

# Using a PostgreSQL database as a source for homogeneous data migrations in Amazon DMS
](dm-data-providers-source-postgresql.md)
+ [

# Using a MongoDB compatible database as a source for homogeneous data migrations in Amazon DMS
](dm-data-providers-source-mongodb.md)

# Using a MySQL compatible database as a source for homogeneous data migrations in Amazon DMS
Using MySQL or MariaDB as a source

You can use a MySQL-compatible database (MySQL or MariaDB) as a source for [Homogeneous data migrations](data-migrations.md) in Amazon DMS. In this case, your source data provider can be an on-premises, Amazon EC2, or RDS for MySQL or MariaDB database.

To run homogeneous data migrations, you must use a database user with the `SELECT` privileges for the all source tables and secondary objects for replication. For change data capture (CDC) tasks, this user must also have the `REPLICATION CLIENT` (`BINLOG MONITOR` for MariaDB versions later than 10.5.2) and `REPLICATION SLAVE` privileges. For a full load data migration, you don't need these two privileges.

Use the following script to create a database user with the required permissions in your MySQL database. Run the `GRANT` queries for all databases that you migrate to Amazon.

```
CREATE USER 'your_user'@'%' IDENTIFIED BY 'your_password';

GRANT REPLICATION SLAVE, REPLICATION CLIENT  ON *.* TO 'your_user'@'%';
GRANT SELECT, RELOAD, LOCK TABLES, SHOW VIEW, EVENT, TRIGGER ON *.* TO 'your_user'@'%';

GRANT BACKUP_ADMIN ON *.* TO 'your_user'@'%';
```

In the preceding example, replace each *user input placeholder* with your own information. If your source MySQL database version is lower than 8.0, then you can skip the `GRANT BACKUP_ADMIN` command.

Use the following script to create a database user with the required permissions in your MariaDB database. Run the GRANT queries for all databases that you migrate to Amazon.

```
CREATE USER 'your_user'@'%' IDENTIFIED BY 'your_password';
GRANT SELECT, RELOAD, LOCK TABLES, REPLICATION SLAVE, BINLOG MONITOR, SHOW VIEW ON  *.* TO 'your_user'@'%';
```

In the preceding example, replace each *user input placeholder* with your own information. 

The following sections describe specific configuration prerequisites for self-managed and Amazon-managed MySQL databases.

**Topics**
+ [

## Using a self-managed MySQL compatible database as a source for homogeneous data migrations
](#dm-data-providers-source-mysql-sm)
+ [

## Using an Amazon-managed MySQL compatible database as a source for homogeneous data migrations in Amazon DMS
](#dm-data-providers-source-mysql-aws)
+ [

## Limitations for using a MySQL compatible database as a source for homogeneous data migrations
](#dm-data-providers-source-mysql-limitations)

## Using a self-managed MySQL compatible database as a source for homogeneous data migrations
Using self-managed MySQL as a source

This section describes how to configure your MySQL compatible databases that are hosted on-premises or on Amazon EC2 instances.

Check the version of your source MySQL or MariaDB database. Make sure that Amazon DMS supports your source MySQL or MariaDB database version as described in [Sources for DMS homogeneous data migrations](CHAP_Introduction.Sources.md#CHAP_Introduction.Sources.HomogeneousDataMigrations).

To use CDC, make sure to enable binary logging. To enable binary logging, configure the following parameters in the `my.ini` (Windows) or `my.cnf` (UNIX) file of your MySQL or MariaDB database.


| Parameter | Value | 
| --- | --- | 
| `server-id` | Set this parameter to a value of 1 or greater. | 
| `log-bin` | Set the path to the binary log file, such as `log-bin=E:\MySql_Logs\BinLog`. Don't include the file extension. | 
| `binlog_format` | Set this parameter to `ROW`. We recommend this setting during replication because in certain cases when `binlog_format` is set to `STATEMENT`, it can cause inconsistency when replicating data to the target. The database engine also writes similar inconsistent data to the target when `binlog_format` is set to `MIXED`, because the database engine automatically switches to `STATEMENT`-based logging. | 
| `expire_logs_days` | Set this parameter to a value of 1 or greater. To prevent overuse of disk space, we recommend that you don't use the default value of 0. | 
| `binlog_checksum` | Set this parameter to `NONE`. | 
| `binlog_row_image` | Set this parameter to `FULL`. | 
| `log_slave_updates` | Set this parameter to `TRUE` if you are using a MySQL or MariaDB replica as a source. | 

## Using an Amazon-managed MySQL compatible database as a source for homogeneous data migrations in Amazon DMS
Using Amazon-managed MySQL or MariaDB as a source

This section describes how to configure your Amazon RDS for MySQL and Amazon RDS for MariaDB database instances.

When you use an Amazon-managed MySQL or MariaDB database as a source for homogeneous data migrations in Amazon DMS, make sure that you have the following prerequisites for CDC:
+ To enable binary logs for RDS for MySQL and MariaDB, enable automatic backups at the instance level. To enable binary logs for an Aurora MySQL cluster, change the variable `binlog_format` in the parameter group. You don't need to enable automatic backups for an Aurora MySQL cluster.

  Next, set the `binlog_format` parameter to `ROW`.

  For more information about setting up automatic backups, see [Enabling automated backups](https://docs.amazonaws.cn/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.html#USER_WorkingWithAutomatedBackups.Enabling) in the *Amazon RDS User Guide*.

  For more information about setting up binary logging for an Amazon RDS for MySQL or MariaDB database, see [ Setting the binary logging format](https://docs.amazonaws.cn/AmazonRDS/latest/UserGuide/USER_LogAccess.MySQL.BinaryFormat.html) in the *Amazon RDS User Guide*. 

  For more information about setting up binary logging for an Aurora MySQL cluster, see [ How do I turn on binary logging for my Amazon Aurora MySQL cluster?](https://aws.amazon.com/premiumsupport/knowledge-center/enable-binary-logging-aurora/). 
+ Ensure that the binary logs are available to Amazon DMS. Because Amazon-managed MySQL and MariaDB databases purge the binary logs as soon as possible, you should increase the length of time that the logs remain available. For example, to increase log retention to 24 hours, run the following command. 

  ```
  call mysql.rds_set_configuration('binlog retention hours', 24);
  ```
+ Set the `binlog_row_image` parameter to `Full`. 
+ Set the `binlog_checksum` parameter to `NONE`.
+ If you are using an Amazon RDS MySQL or MariaDB replica as a source, enable backups on the read replica, and ensure the `log_slave_updates` parameter is set to `TRUE`.

## Limitations for using a MySQL compatible database as a source for homogeneous data migrations
Using MySQL as a source: Limitations

The following limitations apply when using a MySQL compatible database as a source for homogeneous data migrations:
+ MariaDB objects such as sequences are not supported in homogeneous migration tasks.
+ Migration from MariaDB to Amazon RDS MySQL/Aurora MySQL might fail due to incompatible object differences.
+ The username you use to connect to your data source has the following limitations:
  + Can be 2 to 64 characters in length.
  + Can't have spaces.
  + Can include the following characters: a-z, A-Z, 0-9, underscore (\$1).
  + Must start with a-z or A-Z.
+ The password you use to connect to your data source has the following limitations:
  + Can be 1 to 128 characters in length.
  + Can't contain any of the following: single quote ('), double quote ("), semicolon (;) or space.
+ Amazon DMS homogeneous data migrations creates unencrypted MySQL and MariaDB objects on the target Amazon RDS instances even if the source objects were encrypted. RDS for MySQL doesn't support the MySQL keyring\$1aws Amazon Keyring Plugin required for encrypted objects. Refer to the [MySQL Keyring Plugin not supported documentation](https://docs.amazonaws.cn/AmazonRDS/latest/UserGuide/MySQL.KnownIssuesAndLimitations.html#MySQL.Concepts.Limits.KeyRing) in the Amazon RDS User Guide
+ Amazon DMS does not use Global Transaction Identifiers (GTIDs) for for data replication even if the source data contains them.

# Using a PostgreSQL database as a source for homogeneous data migrations in Amazon DMS
Using PostgreSQL as a source

You can use a PostgreSQL database as a source for [Homogeneous data migrations](data-migrations.md) in Amazon DMS. In this case, your source data provider can be an on-premises, Amazon EC2, or RDS for PostgreSQL database.

To run homogeneous data migrations, grant superuser permissions for the database user that you specified in Amazon DMS for your PostgreSQL source database. The database user needs superuser permissions to access replication-specific functions in the source. For a full load data migration, your database user needs `SELECT` permissions on tables to migrate them.

Use the following script to create a database user with the required permissions in your PostgreSQL source database. Run the `GRANT` query for all databases that you migrate to Amazon.

```
CREATE USER your_user WITH LOGIN PASSWORD 'your_password';
ALTER USER your_user WITH SUPERUSER;
GRANT SELECT ON ALL TABLES IN SCHEMA schema_name TO your_user;
```

In the preceding example, replace each *user input placeholder* with your own information.

Amazon DMS supports CDC for PostgreSQL tables with primary keys. If a table doesn't have a primary key, the write-ahead logs (WAL) don't include a before image of the database row. Here, you can use additional configuration settings and use table replica identity as a workaround. However, this approach can generate extra logs. We recommend that you use table replica identity as a workaround only after careful testing. For more information, see [Additional configuration settings when using a PostgreSQL database as a DMS source](CHAP_Source.PostgreSQL.md#CHAP_Source.PostgreSQL.Advanced).

The following sections describe specific configuration prerequisites for self-managed and Amazon-managed PostgreSQL databases.

**Topics**
+ [

## Using a self-managed PostgreSQL database as a source for homogeneous data migrations in Amazon DMS
](#dm-data-providers-source-postgresql-sm)
+ [

## Using an Amazon-managed PostgreSQL database as a source for homogeneous data migrations in Amazon DMS
](#dm-data-providers-source-postgresql-aws)
+ [

## Limitations for using a PostgreSQL compatible database as a source for homogeneous data migrations
](#dm-data-providers-source-postgresql-limitations)

## Using a self-managed PostgreSQL database as a source for homogeneous data migrations in Amazon DMS
Using self-managed PostgreSQL as a source

This section describes how to configure your PostgreSQL databases that are hosted on-premises or on Amazon EC2 instances.

Check the version of your source PostgreSQL database. Make sure that Amazon DMS supports your source PostgreSQL database version as described in [Sources for DMS homogeneous data migrations](CHAP_Introduction.Sources.md#CHAP_Introduction.Sources.HomogeneousDataMigrations).

Homogeneous data migrations support change data capture (CDC) using logical replication. To turn on logical replication on a self-managed PostgreSQL source database, set the following parameters and values in the `postgresql.conf` configuration file:
+ Set `wal_level` to `logical`.
+ Set `max_replication_slots` to a value greater than 1.

  Set the `max_replication_slots` value according to the number of tasks that you want to run. For example, to run five tasks you set a minimum of five slots. Slots open automatically as soon as a task starts and remain open even when the task is no longer running. Make sure to manually delete open slots.
+ Set `max_wal_senders` to a value greater than 1.

  The `max_wal_senders` parameter sets the number of concurrent tasks that can run.
+ The `wal_sender_timeout` parameter ends replication connections that are inactive longer than the specified number of milliseconds. The default is 60000 milliseconds (60 seconds). Setting the value to 0 (zero) disables the timeout mechanism, and is a valid setting for DMS.

Some parameters are static, and you can only set them at server start. Any changes to their entries in the configuration file are ignored until the server is restarted. For more information, see the [PostgreSQL documentation](https://www.postgresql.org/docs/current/intro-whatis.html).

## Using an Amazon-managed PostgreSQL database as a source for homogeneous data migrations in Amazon DMS
Using Amazon-managed PostgreSQL as a source

This section describes how to configure your Amazon RDS for PostgreSQL database instances.

Use the Amazon master user account for the PostgreSQL DB instance as the user account for the PostgreSQL source data provider for homogeneous data migrations in Amazon DMS. The master user account has the required roles that allow it to set up CDC. If you use an account other than the master user account, then the account must have the `rds_superuser` role and the `rds_replication` role. The `rds_replication` role grants permissions to manage logical slots and to stream data using logical slots.

Use the following code example grant the `rds_superuser` and `rds_replication` roles.

```
GRANT rds_superuser to your_user;
GRANT rds_replication to your_user;
```

In the preceding example, replace *your\$1user* with the name of your database user.

To turn on logical replication, set the `rds.logical_replication` parameter in your DB parameter group to 1. This static parameter requires a reboot of the DB instance to take effect.

## Limitations for using a PostgreSQL compatible database as a source for homogeneous data migrations
Using PostgreSQL as a source: Limitations

The following limitations apply when using a PostgreSQL compatible database as a source for homogeneous data migrations:
+ The username you use to connect to your data source has the following limitations:
  + Can be 2 to 64 characters in length.
  + Can't have spaces.
  + Can include the following characters: a-z, A-Z, 0-9, underscore (\$1).
  + Must start with a-z or A-Z.
+ The password you use to connect to your data source has the following limitations:
  + Can be 1 to 128 characters in length.
  + Can't contain any of the following: single quote ('), double quote ("), semicolon (;) or space.

# Using a MongoDB compatible database as a source for homogeneous data migrations in Amazon DMS
Using MongoDB or Amazon DocumentDB as a source

You can use a MongoDB-compatible database as a source for Homogeneous data migrations in Amazon DMS. In this case, your source data provider can be an on-premises, Amazon EC2 for MongoDB database or Amazon DocumentDB (with MongoDB compatibility) database.

For supported database versions, see [Source data providers for DMS homogeneous data migrations](CHAP_Introduction.Sources.md#CHAP_Introduction.Sources.HomogeneousDataMigrations).

The following sections describe specific configuration prerequisites for self-managed MongoDB databases and Amazon-managed Amazon DocumentDB databases.

**Topics**
+ [

## Using a self-managed MongoDB database as a source for homogeneous data migrations in Amazon DMS
](#dm-data-providers-source-mongodb-sm)
+ [

## Using an Amazon DocumentDB database as a source for homogeneous data migrations in Amazon DMS
](#dm-data-providers-source-mongodb-aws)
+ [

## Features for using a MongoDB-compatible database as a source for homogeneous data migrations
](#dm-data-providers-source-mongodb-features)
+ [

## Limitations for using a MongoDB-compatible database as a source for homogeneous data migrations
](#dm-data-providers-source-mongodb-limitations)
+ [

## Best practices for using a MongoDB-compatible database as a source for homogeneous data migrations
](#dm-data-providers-source-mongodb-bestpractices)

## Using a self-managed MongoDB database as a source for homogeneous data migrations in Amazon DMS
Using self-managed MongoDB as a source

This section describes how to configure your MongoDB databases that are hosted on-premises or on Amazon EC2 instances.

Check the version of your source MongoDB database. Make sure that Amazon DMS supports your source MongoDB database version as described in [Source data providers for DMS homogeneous data migrations](CHAP_Introduction.Sources.md#CHAP_Introduction.Sources.HomogeneousDataMigrations).

To run homogeneous data migrations with a MongoDB source, you can create either a user account with root privileges, or a user with permissions only on the database to migrate. For more information about user creation, see [Permissions needed when using MongoDB as a source for Amazon DMS](CHAP_Source.MongoDB.md#CHAP_Source.MongoDB.PrerequisitesCDC).

To use ongoing replication or CDC with MongoDB, Amazon DMS requires access to the MongoDB operations log (oplog). For more information, see [Configuring a MongoDB replica set for CDC](CHAP_Source.MongoDB.md#CHAP_Source.MongoDB.PrerequisitesCDC.ReplicaSet). 

For information about MongoDB authentication methods, see [Security requirements when using MongoDB as a source for Amazon DMS](CHAP_Source.MongoDB.md#CHAP_Source.MongoDB.Security).

For MongoDB as a source, homogeneous data migrations supports all of the datatypes that Amazon DocumentDB supports.

For MongoDB as a source, to store user credentials in Secrets Manager, you need to provide them in plain text, using the **Other type of secrets** type. For more information, see [Using secrets to access Amazon Database Migration Service endpoints](security_iam_secretsmanager.md).

The following code sample demonstrates how to store database secrets using plain text.

```
{
  "username": "dbuser",
  "password": "dbpassword"
}
```

## Using an Amazon DocumentDB database as a source for homogeneous data migrations in Amazon DMS
Using Amazon DocumentDB as a source

This section describes how to configure your Amazon DocumentDB database instances for use as a source for homogeneous data migrations.

Use the master username for the Amazon DocumentDB instance as the user account for the MongoDB-compatible source data provider for homogeneous data migrations in Amazon DMS. The master user account has the required roles that allow it to set up CDC. If you use an account other than the master user account, then the account must have the root role. For more information on the user creation as a root account, see [Setting permissions to use Amazon DocumentDB as a source](CHAP_Source.DocumentDB.md#CHAP_Source.DocumentDB.Permissions).

To turn on logical replication, set the `change_stream_log_retention_duration` parameter in your database parameter group to a setting appropriate for your transaction workload. Changing this static parameter requires you to reboot your DB instance to take effect. Before starting data migration for all the task types including Full Load Only, enable Amazon DocumentDB change streams for all collections within a given database, or only for selected collections. For more information about enabling change streams for Amazon DocumentDB, see [ Enabling Change Streams ](https://docs.amazonaws.cn/documentdb/latest/developerguide/change_streams.html#change_streams-enabling) in the *Amazon DocumentDB developer guide*. 

**Note**  
Amazon DMS uses the Amazon DocumentDB change stream to capture changes during ongoing replication. If Amazon DocumentDB flushes out the records from the change stream before DMS reads them, your tasks will fail. We recommend setting the `change_stream_log_retention_duration` parameter to retain changes for at least 24 hours.

To use Amazon DocumentDB for homogeneous data migration, store user credentials in Secrets Manager under **Credentials for Amazon DocumentDB database**.

## Features for using a MongoDB-compatible database as a source for homogeneous data migrations
Using MongoDB as a source: Features
+ You can migrate all the secondary indexes that Amazon DocumentDB supports during the Full load phase.
+ Amazon DMS migrates collections in parallel. homogeneous data migrations calculates segments at runtime based on the average size of each document in the collection for maximum performance.
+ DMS can replicate the secondary indexes that you create in the CDC phase. DMS supports this feature in MongoDB version 6.0.
+ DMS supports documents with a nesting level greater than 97.

## Limitations for using a MongoDB-compatible database as a source for homogeneous data migrations
Using MongoDB as a source: Limitations
+ Documents can't have field names with a `$` prefix.
+ Amazon DMS doesn't support time series collection migration.
+ Amazon DMS doesn't support `create`, `drop`, or `rename collection` DDL events during the CDC phase.
+ Amazon DMS doesn't suport inconsistent datatypes in the collection for the `_id` field. For example, the following unsupported collection has multiple data types for the `_id` field.

  ```
  rs0 [direct: primary] test> db.collection1.aggregate([
  ...   {
  ...     $group: {
  ...       _id: { $type: "$_id" },
  ...       count: { $sum: 1 }
  ...     }
  ...   }
  ... ])
  [ { _id: 'string', count: 6136 }, { _id: 'objectId', count: 848033 } ]
  ```
+ For CDC-only tasks, Amazon DMS only supports the `immediate` start mode.
+ Amazon DMS doesn't support documents with invalid UTF8 characters.
+ Amazon DMS doesn't support sharded collections.

## Best practices for using a MongoDB-compatible database as a source for homogeneous data migrations
Using MongoDB as a source: Best practices
+ For multiple large databases and collections hosted on same MongoDB instance, we recommend you use selection rules for each database and collection to split the task between multiple data migration tasks and projects. You can tune your database and collection divisions for maximum performance.