How to select the right tool for bulk uploading or migrating data to Amazon Keyspaces
In this section you can review the different tools that you can use to bulk upload or migrate data to Amazon Keyspaces, and learn how to select the correct tool based on your needs. In addition, this section provides an overview and use cases of the available step-by-step tutorials that demonstrate how to import data into Amazon Keyspaces.
To review the available strategies to migrate workloads from Apache Cassandra to Amazon Keyspaces, see Create a migration plan for migrating from Apache Cassandra to Amazon Keyspaces.
-
Migration tools
For large migrations, consider using an extract, transform, and load (ETL) tool. You can use Amazon Glue to quickly and effectively perform data transformation migrations. For more information, see Offline migration process: Apache Cassandra to Amazon Keyspaces.
CQLReplicator – CQLReplicator is an open source utility available on Github
that helps you to migrate data from Apache Cassandra to Amazon Keyspaces in near real time. For more information, see Migrate data using CQLReplicator.
To learn how to use the Apache Cassandra Spark connector to write data to Amazon Keyspaces, see Connecting to Amazon Keyspaces with Apache Spark.
Get started quickly with loading data into Amazon Keyspaces by using the cqlsh
COPY FROM
command. cqlsh is included with Apache Cassandra and is best suited for loading small datasets or test data. For step-by-step instructions, see Tutorial: Loading data into Amazon Keyspaces using cqlsh.You can also use the DataStax Bulk Loader for Apache Cassandra to load data into Amazon Keyspaces using the
dsbulk
command. DSBulk provides more robust import capabilities than cqlsh and is available from the GitHub repository. For step-by-step instructions, see Tutorial: Loading data into Amazon Keyspaces using DSBulk.
General considerations for data uploads to Amazon Keyspaces
-
Break the data upload down into smaller components.
Consider the following units of migration and their potential footprint in terms of raw data size. Uploading smaller amounts of data in one or more phases may help simplify your migration.
By cluster – Migrate all of your Cassandra data at once. This approach may be fine for smaller clusters.
-
By keyspace or table – Break up your migration into groups of keyspaces or tables. This approach can help you migrate data in phases based on your requirements for each workload.
By data – Consider migrating data for a specific group of users or products, to bring the size of data down even more.
-
Prioritize what data to upload first based on simplicity.
Consider if you have data that could be migrated first more easily—for example, data that does not change during specific times, data from nightly batch jobs, data not used during offline hours, or data from internal apps.