How to estimate capacity consumption in Amazon Keyspaces - Amazon Keyspaces (for Apache Cassandra)
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

How to estimate capacity consumption in Amazon Keyspaces

When you read or write data in Amazon Keyspaces, the amount of read/write request units (RRUs/WRUs) or read/write capacity units (RCUs/WCUs) your query consumes depends on the total amount of data Amazon Keyspaces has to process to run the query. In some cases, the data returned to the client can be a subset of the data that Amazon Keyspaces had to read to process the query. For conditional writes, Amazon Keyspaces consumes write capacity even if the conditional check fails.

To estimate the total amount of data being processed for a request, you have to consider the encoded size of a row and the total number of rows. This topic covers some examples of common scenarios and access patterns to show how Amazon Keyspaces processes queries and how that affects capacity consumption. You can follow the examples to estimate the capacity requirements of your tables and use Amazon CloudWatch to observe the read and write capacity consumption for these use cases.

For information on how to calculate the encoded size of rows in Amazon Keyspaces, see Calculating row size in Amazon Keyspaces.

Range queries

To look at the read capacity consumption of a range query, we use the following example table which is using on-demand capacity mode.

pk1 | pk2 | pk3 | ck1 | ck2 | ck3 | value -----+-----+-----+-----+-----+-----+------- a | b | 1 | a | b | 50 | <any value that results in a row size larger than 4KB> a | b | 1 | a | b | 60 | value_1 a | b | 1 | a | b | 70 | <any value that results in a row size larger than 4KB>

Now run the following query on this table.

SELECT * FROM amazon_keyspaces.example_table_1 WHERE pk1='a' AND pk2='b' AND pk3=1 AND ck1='a' AND ck2='b' AND ck3 > 50 AND ck3 < 70;

You receive the following result set from the query and the read operation performed by Amazon Keyspaces consumes 2 RRUs in LOCAL_QUORUM consistency mode.

pk1 | pk2 | pk3 | ck1 | ck2 | ck3 | value -----+-----+-----+-----+-----+-----+------- a | b | 1 | a | b | 60 | value_1

Amazon Keyspaces consumes 2 RRUs to evaluate the rows with the values ck3=60 and ck3=70 to process the query. However, Amazon Keyspaces only returns the row where the WHERE condition specified in the query is true, which is the row with value ck3=60. To evaluate the range specified in the query, Amazon Keyspaces reads the row matching the upper bound of the range, in this case ck3 = 70, but doesn’t return that row in the result. The read capacity consumption is based on the data read when processing the query, not on the data returned.

Limit queries

When processing a query that uses the LIMIT clause, Amazon Keyspaces reads rows up to the maximum page size when trying to match the condition specified in the query. If Amazon Keyspaces can't find sufficient matching data that meets the LIMIT value on the first page, one or more paginated calls could be needed. To continue reads on the next page, you can use a pagination token. The default page size is 1MB. To consume less read capacity when using LIMIT clauses, you can reduce the page size. For more information about pagination, see Paginating results in Amazon Keyspaces.

For an example, let's look at the following query.

SELECT * FROM my_table WHERE partition_key=1234 LIMIT 1;”

If you don’t set the page size, Amazon Keyspaces reads 1MB of data even though it returns only 1 row to you. To only have Amazon Keyspaces read one row, you can set the page size to 1 for this query. In this case, Amazon Keyspaces would only read one row provided you don’t have expired rows based on Time-to-live settings or client-side timestamps. To consume less read capacity, we recommend to set your page size equal to the LIMIT value to reduce the amount of data Amazon Keyspaces reads.

Table scans

Queries that result in full table scans, for example queries using the ALLOW FILTERING option, are another example of queries that process more reads than what they return as results. And the read capacity consumption is based on the data read, not the data returned.

For the table scan example we use the following example table in on-demand capacity mode.

pk | ck | value ---+----+--------- pk | 10 | <any value that results in a row size larger than 4KB> pk | 20 | value_1 pk | 30 | <any value that results in a row size larger than 4KB>

Amazon Keyspaces creates a table in on-demand capacity mode with four partitions by default. In this example table, all the data is stored in one partition and the remaining three partitions are empty.

Now run the following query on the table.

SELECT * from amazon_keyspaces.example_table_2;

This query results in a table scan operation where Amazon Keyspaces scans all four partitions of the table and consumes 6 RRUs in LOCAL_QUORUM consistency mode. First, Amazon Keyspaces consumes 3 RRUs for reading the three rows with pk=‘pk’. Then, Amazon Keyspaces consumes the additional 3 RRUs for scanning the three empty partitions of the table. Because this query results in a table scan, Amazon Keyspaces scans all the partitions in the table, including partitions without data.

Lightweight transactions

Lightweight transactions (LWT) allow you to perform conditional write operations against your table data. Conditional update operations are useful when inserting, updating and deleting records based on conditions that evaluate the current state.

In Amazon Keyspaces, all write operations require LOCAL_QUORUM consistency and there is no additional charge for using LWTs. The difference for LWTs is that when a LWT condition check results in FALSE, it consumes write capacity units. The number of write capacity units consumed depends on the size of the row. If the row size is 2 KB, the failed conditional write consumes two write capacity units. If the row doesn’t currently exist in the table, the operation consumes one write capacity unit. By monitoring the ConditionalCheckFailed metric in CloudWatch you can determine the capacity consumed by LWT condition check failures.

Estimate read and write capacity consumption with Amazon CloudWatch

To estimate and monitor read and write capacity consumption, you can use a CloudWatch dashboard. For more information about available metrics for Amazon Keyspaces, see Amazon Keyspaces metrics and dimensions.

To monitor read and write capacity units consumed by a specific statement with CloudWatch, you can follow these steps.

  1. Create a new table with sample data

  2. Configure a Amazon Keyspaces CloudWatch dashboard for the table. To get started, you can use a dashboard template available on Github.

  3. Run the CQL statement, for example using the ALLOW FILTERING option, and check the read capacity units consumed for the full table scan in the dashboard.