Choosing an instance class and storage size
Before you start migrating your database, you need to consider your source, target, and replication instance resources such as CPU, memory, disk space, and network bandwidth/latency. How much workload will be placed on the source database, how to determine the sizing of the replication instance, and what instance class should be used for the target database are common questions when starting a migration.
There is no single answer to these questions. It’s hard to calculate because the optimal configuration varies depending on the amount of data in your source database, your workload, your Amazon DMS task configuration, and the number of tasks running concurrently. One of the benefits of using Amazon is the ability to flexibly and easily resize resources as needed. You can change your replication instance class or target database instance class in-place as needed in a few clicks and minutes. Test your migrations using a larger instance class first, then check the resource usage provided by CloudWatch metrics and resize if necessary.
In this walkthrough, we will use the following instance classes.
Source database
-
Instance class: db.c5.4xlarge
-
Allocated storage size: 1024 GiB
-
Storage type: io1
-
IOPS: 20000
Full-load typically requires more resources from the source database than CDC because full-load simultaneously transfers data from the source with the number of parallels you specify in the task settings. The default parallelism is 8, which means that all data from the source table will be transferred to the target through the replication instance in 8 parallel threads. In this walkthrough, we will allocate the resources described above to use a maximum possible parallelism of 49 threads for an Amazon DMS task.
Note that this setting yields 250-300 MiB/s read throughput on the source database. If you want to reduce the workload on the source database, you can lower the parallelism number described in later section.
Replication instance
-
dms.c5.9xlarge
-
Allocated storage size: 100 GiB
Because we are performing a heterogeneous migration and using the parallel full-load option with a maximum of 49 parallel threads, we start with the relatively large compute optimized instance dms.c5.9xlarge as the replication instance class. This instance class has enough performance to migrate source data to S3 in 49 parallel threads in our use case. It is also possible to use a smaller instance class if it reduces the number of threads. We’ll discuss this in a later section.
When Amazon S3 is the target, storage throughput is the primary factor when determining the full-load performance. This is because when Amazon DMS outputs a CSV or a Parquet file to Amazon S3, Amazon DMS first writes the file to storage on your replication instance, and then Amazon DMS uploads the file to the Amazon S3 bucket.
Amazon DMS supports GP2 EBS storage. IOPS for GP2 EBS storage depends on storage size. It increases at a rate of 3 IOPS/GiB. This value is the same as the EBS burst credits added per second. A single GP2 volume performs up to 3000 IOPS as long as it has burst credits, but once it runs out of credits, it only performs as much performance as the credits provided at 3 IOPS/GiB. For example, 100 GiB is 300 IOPS.
In this scenario, we will allocate 100 GiB of storage for a temporary maximum throughput of about 20-30 minutes. This is enough with this workload. Find the optimal disk size for your workload by running a test task. The storage size can be changed online even while the task is running. However, the storage performance may be temporarily degraded during the change. Also, the storage size can increase, but cannot decrease unless you recreate the replication instance.