Migrate data using change data capture (CDC)
To successfully implement a change data capture (CDC) pipeline for migrating data from Cassandra to Amazon Keyspaces,
we recommend using the Debezium
The Debezium connector for Apache Cassandra
To address any potential data consistency issues, you can implement a process with Amazon MSK where a consumer compares the keys or partitions in Cassandra with those in Amazon Keyspaces.
To implement this solution successfully, we recommend to consider the following.
How to parse the CDC commit log, for example how to remove duplicate events.
How to maintain the CDC directory, for example how to delete old logs.
How to handle partial failures in Apache Cassandra, for example if a write only succeeds in one out of three replicas.
This pattern treats changes from Cassandra as a "hint" that a key may have changed from its previous state.
To determine if there are changes to propagate to the destination database, you must first read from the source Cassandra cluster
using a LOCAL_QUORUM
operation to receive the latest records and then write them to Amazon Keyspaces.
In the case of range deletes or range updates, you may need to perform a comparison against the entire partition to determine which write or update events need to be written to your destination database.
In cases where writes are not idempotent, you also need to compare your writes with what is already in the destination database before writing to Amazon Keyspaces.
The following diagram shows the typical architecture of a CDC pipeline using Debezium and Amazon MSK.