Using Amazon Timestream as a target for Amazon Database Migration Service - Amazon Database Migration Service
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Using Amazon Timestream as a target for Amazon Database Migration Service

You can use Amazon Database Migration Service to migrate data from your source database to a Amazon Timestream target endpoint, with support for Full Load and CDC data migrations.

Amazon Timestream is a fast, scalable, and serverless time series database service built for high-volume data ingestion. Time series data is a sequence of data points collected over a time interval, and is used for measuring events that change over time. It is used to collect, store, and analyze metrics from IoT applications, DevOps applications, and analytics applications. Once you have your data in Timestream, you can visualize and identify trends and patterns in your data in near real-time. For information about Amazon Timestream, see What is Amazon Timestream? in the Amazon Timestream Developer Guide.

Prerequisites for using Amazon Timestream as a target for Amazon Database Migration Service

Before you set up Amazon Timestream as a target for Amazon DMS, make sure that you create an IAM role. This role must allow Amazon DMS to gain access to the data being migrated into Amazon Timestream. The minimum set of access permissions for the role that you use to migrate to Timestream is shown in the following IAM policy.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowDescribeEndpoints", "Effect": "Allow", "Action": [ "timestream:DescribeEndpoints" ], "Resource": "*" }, { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "timestream:ListTables", "timestream:DescribeDatabase" ], "Resource": "arn:aws:timestream:region:account_id:database/DATABASE_NAME" }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "timestream:DeleteTable", "timestream:WriteRecords", "timestream:UpdateTable", "timestream:CreateTable" ], "Resource": "arn:aws:timestream:region:account_id:database/DATABASE_NAME/table/TABLE_NAME" } ] }

If you intend to migrate all tables, use * for TABLE_NAME in the example above.

Note the following about using Timestream as a target:

  • If you intend to ingest historical data with timestamps exceeding 1 year old, we recommend to use Amazon DMS to write the data to Amazon S3 in a comma separated value (csv) format. Then, use Timestream’s batch load to ingest the data into Timestream. For more information, see Using batch load in Timestream in the Amazon Timestream developer guide.

  • For full-load data migrations of data less than 1 year old, we recommend setting the memory store retention period of the Timestream table greater than or equal to the oldest timestamp. Then, once migration completes, edit the table's memory store retention to the desired value. For example, to migrate data with the oldest timestamp being 2 months old, do the following:

    • Set the Timestream target table's memory store retention to 2 months.

    • Start the data migration using Amazon DMS.

    • Once the data migration completes, change the retention period of the target Timestream table to your desired value.

    We recommend estimating the memory store cost prior to the migration, using the information on the following pages:

  • For CDC data migrations, we recommend setting the memory store retention period of the target table such that ingested data falls within the memory store retention bounds. For more information, see Writes Best Practices in the Amazon Timestream developer guide.

Multithreaded full load task settings

To help increase the speed of data transfer, Amazon DMS supports a multithreaded full load migration task to a Timestream target endpoint with these task settings:

  • MaxFullLoadSubTasks – Use this option to indicate the maximum number of source tables to load in parallel. DMS loads each table into its corresponding Amazon Timestream target table using a dedicated subtask. The default is 8; the maximum value is 49.

  • ParallelLoadThreads – Use this option to specify the number of threads that Amazon DMS uses to load each table into its Amazon Timestream target table. The maximum value for a Timestream target is 32. You can ask to have this maximum limit increased.

  • ParallelLoadBufferSize – Use this option to specify the maximum number of records to store in the buffer that the parallel load threads use to load data to the Amazon Timestream target. The default value is 50. The maximum value is 1,000. Use this setting with ParallelLoadThreads. ParallelLoadBufferSize is valid only when there is more than one thread.

  • ParallelLoadQueuesPerThread – Use this option to specify the number of queues each concurrent thread accesses to take data records out of queues and generate a batch load for the target. The default is 1. However, for Amazon Timestream targets of various payload sizes, the valid range is 5–512 queues per thread.

Multithreaded CDC load task settings

To promote CDC performance, Amazon DMS supports these task settings:

  • ParallelApplyThreads – Specifies the number of concurrent threads that Amazon DMS uses during a CDC load to push data records to a Timestream target endpoint. The default value is 0 and the maximum value is 32.

  • ParallelApplyBufferSize – Specifies the maximum number of records to store in each buffer queue for concurrent threads to push to a Timestream target endpoint during a CDC load. The default value is 100 and the maximum value is 1,000. Use this option when ParallelApplyThreads specifies more than one thread.

  • ParallelApplyQueuesPerThread – Specifies the number of queues that each thread accesses to take data records out of queues and generate a batch load for a Timestream endpoint during CDC. The default value is 1 and the maximum value is 512.

Endpoint settings when using Timestream as a target for Amazon DMS

You can use endpoint settings to configure your Timestream target database similar to using extra connection attributes. You specify the settings when you create the target endpoint using the Amazon DMS console, or by using the create-endpoint command in the Amazon CLI, with the --timestream-settings '{"EndpointSetting": "value", ...}' JSON syntax.

The following table shows the endpoint settings that you can use with Timestream as a target.

Name Description

MemoryDuration

Set this attribute to specify the retention bound to store the data migrated in Timestream's memory store. Time is measured in units of hours. Timestream's memory store is optimized for high ingestion throughput and fast access.

Default value: 24 (hours)

Valid values: 1 to 8,736 (1 hour to 12 months measured in hours)

Example: --timestream-settings '{"MemoryDuration": 20}'

DatabaseName

Set this attribute to specify the target Timestream database name.

Type: string

Example: --timestream-settings '{"DatabaseName": "db_name"}'

TableName

Set this attribute to specify the target Timestream table name.

Type: string

Example: --timestream-settings '{"TableName": "table_name"}'

MagneticDuration

Set this attribute to specify the magnetic duration applied to the Timestream tables in days. This is the retention bound for the ingested data. Timestream deletes any timestamp exceeding the retention bound. For more information, see Storage in the Amazon Timestream Developer Guide.

Example: --timestream-settings '{"MagneticDuration": "3"}'

CdcInsertsAndUpdates

Set this attribute to true to specify that Amazon DMS only applies inserts and updates, and not deletes. Timestream does not allow deleting records, so if this value is false, Amazon DMS nulls out the corresponding record in the Timestream database rather than deleting it. For more information, see Limitations following.

Default value: false

Example: --timestream-settings '{"CdcInsertsAndUpdates": "true"}'

EnableMagneticStoreWrites

Set this attribute to true to enable magnetic store writes. When this value is false, Amazon DMS does not write records records with a timestamp older than the memory store retention period of the target table, because Timestream does not allow magnetic store writes by default. For more information, see Writes Best Practices in the Amazon Timestream Developer Guide.

Default value: false

Example: --timestream-settings '{"EnableMagneticStoreWrites": "true"}'

Creating and modifying an Amazon Timestream target endpoint

Once you have created an IAM role and established the minimum set of access permissions, you can create a Amazon Timestream target endpoint using the Amazon DMS console, or by using the create-endpoint command in the Amazon CLI, with the --timestream-settings '{"EndpointSetting": "value", ...}' JSON syntax.

The following examples show how to create and modify a Timestream target endpoint using the Amazon CLI.

Create Timestream target endpoint command

aws dms create-endpoint —endpoint-identifier timestream-target-demo --endpoint-type target —engine-name timestream --service-access-role-arn arn:aws:iam::123456789012:role/my-role --timestream-settings { "MemoryDuration": 20, "DatabaseName":"db_name", "MagneticDuration": 3, "CdcInsertsAndUpdates": true, "EnableMagneticStoreWrites": true, }

Modify Timestream target endpoint command

aws dms modify-endpoint —endpoint-identifier timestream-target-demo --endpoint-type target —engine-name timestream --service-access-role-arn arn:aws:iam::123456789012:role/my-role --timestream-settings { "MemoryDuration": 20, "MagneticDuration": 3, }

Using object mapping to migrate data to a Timestream topic

Amazon DMS uses table-mapping rules to map data from the source to the target Timestream topic. To map data to a target topic, you use a type of table-mapping rule called object mapping. You use object mapping to define how data records in the source map to the data records published to a Timestream topic.

Timestream topics don't have a preset structure other than having a partition key.

Note

You don't have to use object mapping. You can use regular table mapping for various transformations. However, the partition key type will follow these default behaviors:

  • Primary Key is used as a partition key for Full Load.

  • If no parallel-apply task settings are used, schema.table is used as a partition key for CDC.

  • If parallel-apply task settings are used, Primary key is used as a partition key for CDC.

To create an object-mapping rule, specify rule-type as object-mapping. This rule specifies what type of object mapping you want to use. The structure for the rule is as follows.

{ "rules": [ { "rule-type": "object-mapping", "rule-id": "id", "rule-name": "name", "rule-action": "valid object-mapping rule action", "object-locator": { "schema-name": "case-sensitive schema name", "table-name": "" } } ] }

{ "rules": [ { "rule-type": "object-mapping", "rule-id": "1", "rule-name": "timestream-map", "rule-action": "map-record-to-record", "target-table-name": "tablename", "object-locator": { "schema-name": "", "table-name": "" }, "mapping-parameters": { "timestream-dimensions": [ "column_name1", "column_name2" ], "timestream-timestamp-name": "time_column_name", "timestream-multi-measure-name": "column_name1or2", "timestream-hash-measure-name": true or false, "timestream-memory-duration": x, "timestream-magnetic-duration": y } } ] }

Amazon DMS currently supports map-record-to-record and map-record-to-document as the only valid values for the rule-action parameter. The map-record-to-record and map-record-to-document values specify what Amazon DMS does by default to records that aren't excluded as part of the exclude-columns attribute list. These values don't affect the attribute mappings in any way.

Use map-record-to-record when migrating from a relational database to a Timestream topic. This rule type uses the taskResourceId.schemaName.tableName value from the relational database as the partition key in the Timestream topic and creates an attribute for each column in the source database. When using map-record-to-record, for any column in the source table not listed in the exclude-columns attribute list, Amazon DMS creates a corresponding attribute in the target topic. This corresponding attribute is created regardless of whether that source column is used in an attribute mapping.

One way to understand map-record-to-record is to see it in action. For this example, assume that you are starting with a relational database table row with the following structure and data.

FirstName LastName StoreId HomeAddress HomePhone WorkAddress WorkPhone DateofBirth

Randy

Marsh

5

221B Baker Street

1234567890

31 Spooner Street, Quahog

9876543210

02/29/1988

To migrate this information from a schema named Test to a Timestream topic, you create rules to map the data to the target topic. The following rule illustrates the mapping.

{ "rules": [ { "rule-type": "selection", "rule-id": "1", "rule-name": "1", "rule-action": "include", "object-locator": { "schema-name": "Test", "table-name": "%" } }, { "rule-type": "object-mapping", "rule-id": "2", "rule-name": "DefaultMapToTimestream", "rule-action": "map-record-to-record", "object-locator": { "schema-name": "Test", "table-name": "Customers" } } ] }

Given a Timestream topic and a partition key (in this case, taskResourceId.schemaName.tableName), the following illustrates the resulting record format using our sample data in the Timestream target topic:

{ "FirstName": "Randy", "LastName": "Marsh", "StoreId": "5", "HomeAddress": "221B Baker Street", "HomePhone": "1234567890", "WorkAddress": "31 Spooner Street, Quahog", "WorkPhone": "9876543210", "DateOfBirth": "02/29/1988" }

Limitations when using Amazon Timestream as a target for Amazon Database Migration Service

The following limitations apply when using Amazon Timestream as a target:

  • Dimensions and Timestamps: Timestream uses the dimensions and timestamps in the source data like a composite primary key, and also does not allow you to upsert these values. This means that if you change the timestamp or the dimensions for a record in the source database, the Timestream database will try to create a new record. It is thus possible that if you change the dimension or timestamp of a record such that they match those of another existing record, then Amazon DMS updates the values of the other record instead of creating a new record or updating the previous corresponding record.

  • DDL Commands: The current release of Amazon DMS only supports CREATE TABLE and DROP TABLE DDL commands.

  • Record Limitations: Timestream has limitations for records such as record size and measure size. For more information, see Quotas in the Amazon Timestream Developer Guide.

  • Deleting Records and Null Values: Timestream doesn't support deleting records. To support migrating records deleted from the source, Amazon DMS clears the corresponding fields in the records in the Timestream target database. Amazon DMS changes the values in the fields of the corresponding target record with 0 for numeric fields, null for text fields, and false for boolean fields.

  • Timestream as a target doesn't support sources that aren't relational databases (RDBMS).

  • Amazon DMS only supports Timestream as a target in the following regions:

    • US East (N. Virginia)

    • US East (Ohio)

    • US West (Oregon)

    • Europe (Ireland)

    • Europe (Frankfurt)

    • Asia Pacific (Sydney)

    • Asia Pacific (Tokyo)

  • Timestream as a target doesn't support setting TargetTablePrepMode to TRUNCATE_BEFORE_LOAD. We recommend using DROP_AND_CREATE for this setting.