Estimate row size in Amazon Keyspaces
Amazon Keyspaces provides fully managed storage that offers single-digit millisecond read and write performance and stores data durably across multiple Amazon Availability Zones. Amazon Keyspaces attaches metadata to all rows and primary key columns to support efficient data access and high availability.
This section provides details about how to estimate the encoded size of rows in Amazon Keyspaces. The encoded row size is used when calculating your bill and quota use. You should also use the encoded row size when calculating provisioned throughput capacity requirements for tables. To calculate the encoded size of rows in Amazon Keyspaces, you can use the following guidelines.
For regular columns, which are columns that aren't primary keys, clustering columns, or
STATIC
columns, use the raw size of the cell data based on the data type and add the required metadata. For more information about the data types supported in Amazon Keyspaces, see Data types. Some key differences in how Amazon Keyspaces stores data type values and metadata are listed below.The space required for each column name is stored using a column identifier and added to each data value stored in the column. The storage value of the column identifier depends on the overall number of columns in your table:
1–62 columns: 1 byte
63–124 columns: 2 bytes
125–186 columns: 3 bytes
For each additional 62 columns add 1 byte. Note that in Amazon Keyspaces, up to 225 regular columns can be modified with a single
INSERT
orUPDATE
statement. For more information, see Amazon Keyspaces service quotas.Partition keys can contain up to 2048 bytes of data. Each key column in the partition key requires up to 3 bytes of metadata. When calculating the size of your row, you should assume each partition key column uses the full 3 bytes of metadata.
Clustering columns can store up to 850 bytes of data. In addition to the size of the data value, each clustering column requires up to 20% of the data value size for metadata. When calculating the size of your row, you should add 1 byte of metadata for each 5 bytes of clustering column data value.
Amazon Keyspaces stores the data value of each partition key and clustering key column twice. The extra overhead is used for efficient querying and built-in indexing.
Cassandra
ASCII
,TEXT
, andVARCHAR
string data types are all stored in Amazon Keyspaces using Unicode with UTF-8 binary encoding. The size of a string in Amazon Keyspaces equals the number of UTF-8 encoded bytes.Cassandra
INT
,BIGINT
,SMALLINT
, andTINYINT
data types are stored in Amazon Keyspaces as data values with variable length, with up to 38 significant digits. Leading and trailing zeroes are trimmed. The size of any of these data types is approximately 1 byte per two significant digits + 1 byte.A
BLOB
in Amazon Keyspaces is stored with the value's raw byte length.The size of a
Null
value or aBoolean
value is 1 byte.A column that stores collection data types like
LIST
orMAP
requires 3 bytes of metadata, regardless of its contents. The size of aLIST
orMAP
is (column id) + sum (size of nested elements) + (3 bytes). The size of an emptyLIST
orMAP
is (column id) + (3 bytes). Each individualLIST
orMAP
element also requires 1 byte of metadata.STATIC
column data doesn't count towards the maximum row size of 1 MB. To calculate the data size of static columns, see Calculate the static column size per logical partition in Amazon Keyspaces.Client-side timestamps are stored for every column in each row when the feature is turned on. These timestamps take up approximately 20–40 bytes (depending on your data), and contribute to the storage and throughput cost for the row. For more information, see Client-side timestamps in Amazon Keyspaces.
Add 100 bytes to the size of each row for row metadata.
The total size of an encoded row of data is based on the following formula:
partition key columns + clustering columns + regular columns + row metadata = total encoded size of row
Important
All column metadata, for example column ids, partition key metadata, clustering column metadata, as well as client-side timestamps and row metadata count towards the maximum row size of 1 MB.
Consider the following example of a table where all columns are of type integer. The table has two partition key columns, two clustering columns, and one regular column. Because this table has five columns, the space required for the column name identifier is 1 byte.
CREATE TABLE mykeyspace.mytable(pk_col1 int, pk_col2 int, ck_col1 int, ck_col2 int, reg_col1 int, primary key((pk_col1, pk_col2),ck_col1, ck_col2));
In this example, we calculate the size of data when we write a row to the table as shown in the following statement:
INSERT INTO mykeyspace.mytable (pk_col1, pk_col2, ck_col1, ck_col2, reg_col1) values(1,2,3,4,5);
To estimate the total bytes required by this write operation, you can use the following steps.
Calculate the size of a partition key column by adding the bytes for the data type stored in the column and the metadata bytes. Repeat this for all partition key columns.
Calculate the size of the first column of the partition key (pk_col1):
(2 bytes for the integer data type) x 2 + 1 byte for the column id + 3 bytes for partition key metadata = 8 bytes
Calculate the size of the second column of the partition key (pk_col2):
(2 bytes for the integer data type) x 2 + 1 byte for the column id + 3 bytes for partition key metadata = 8 bytes
Add both columns to get the total estimated size of the partition key columns:
8 bytes + 8 bytes = 16 bytes for the partition key columns
Calculate the size of the clustering column by adding the bytes for the data type stored in the column and the metadata bytes. Repeat this for all clustering columns.
Calculate the size of the first column of the clustering column (ck_col1):
(2 bytes for the integer data type) x 2 + 20% of the data value (2 bytes) for clustering column metadata + 1 byte for the column id = 6 bytes
Calculate the size of the second column of the clustering column (ck_col2):
(2 bytes for the integer data type) x 2 + 20% of the data value (2 bytes) for clustering column metadata + 1 byte for the column id = 6 bytes
Add both columns to get the total estimated size of the clustering columns:
6 bytes + 6 bytes = 12 bytes for the clustering columns
Add the size of the regular columns. In this example we only have one column that stores a single digit integer, which requires 2 bytes with 1 byte for the column id.
Finally, to get the total encoded row size, add up the bytes for all columns and add the additional 100 bytes for row metadata:
16 bytes for the partition key columns + 12 bytes for clustering columns + 3 bytes for the regular column + 100 bytes for row metadata = 131 bytes.
To learn how to monitor serverless resources with Amazon CloudWatch, see Monitoring Amazon Keyspaces with Amazon CloudWatch.