Working with queries in Amazon Keyspaces
This section gives an introduction into working with queries in Amazon Keyspaces (for Apache Cassandra). The CQL
statements available to query, transform, and manage data are SELECT
,
INSERT
, UPDATE
, and DELETE
. The following topics
outline some of the more complex options available when working with queries. For the
complete language syntax with examples, see DML statements (data manipulation language) in Amazon Keyspaces.
Topics
Using the IN
operator with the SELECT
Statement in Amazon Keyspaces
SELECT IN
You can query data from tables using the SELECT
statement, which reads
one or more columns for one or more rows in a table and returns a result-set containing
the rows matching the request. A SELECT
statement contains a
select_clause
that determines which columns to read and to return in
the result-set. The clause can contain instructions to transform the data before
returning it. The optional WHERE
clause specifies which rows must be
queried and is composed of relations on the columns that are part of the primary key.
Amazon Keyspaces supports the IN
keyword in the WHERE
clause.
This section uses examples to show how Amazon Keyspaces processes SELECT
statements with the
IN
keyword.
This examples demonstrates how Amazon Keyspaces breaks down the SELECT
statement with the IN
keyword
into subqueries. In this example we use a table with the name my_keyspace.customers
.
The table has one primary key column department_id
, two clustering columns sales_region_id
and sales_representative_id
, and one column
that contains the name of the customer in the customer_name
column.
SELECT * FROM my_keyspace.customers; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 0 | 0 | 0 | a 0 | 0 | 1 | b 0 | 1 | 0 | c 0 | 1 | 1 | d 1 | 0 | 0 | e 1 | 0 | 1 | f 1 | 1 | 0 | g 1 | 1 | 1 | h
Using this table, you can run the following SELECT
statement to find the customers in the departments and
sales regions that you are interested in with the IN
keyword in the WHERE
clause. The following statement is an example of this.
SELECT * FROM my_keyspace.customers WHERE department_id IN (0, 1) AND sales_region_id IN (0, 1);
Amazon Keyspaces divides this statement into four subqueries as shown in the following output.
SELECT * FROM my_keyspace.customers WHERE department_id = 0 AND sales_region_id = 0; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 0 | 0 | 0 | a 0 | 0 | 1 | b SELECT * FROM my_keyspace.customers WHERE department_id = 0 AND sales_region_id = 1; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 0 | 1 | 0 | c 0 | 1 | 1 | d SELECT * FROM my_keyspace.customers WHERE department_id = 1 AND sales_region_id = 0; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 1 | 0 | 0 | e 1 | 0 | 1 | f SELECT * FROM my_keyspace.customers WHERE department_id = 1 AND sales_region_id = 1; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 1 | 1 | 0 | g 1 | 1 | 1 | h
When the IN
keyword is used, Amazon Keyspaces automatically paginates the results in any of the following cases:
After every 10th subquery is processed.
After processing 1MB of logical IO.
If you configured a
PAGE SIZE
, Amazon Keyspaces paginates after reading the number of queries for processing based on the setPAGE SIZE
.When you use the
LIMIT
keyword to reduce the number of rows returned, Amazon Keyspaces paginates after reading the number of queries for processing based on the setLIMIT
.
The following table is used to illustrate this with an example.
For more information about pagination, see Paginating results in Amazon Keyspaces.
SELECT * FROM my_keyspace.customers; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 2 | 0 | 0 | g 2 | 1 | 1 | h 2 | 2 | 2 | i 0 | 0 | 0 | a 0 | 1 | 1 | b 0 | 2 | 2 | c 1 | 0 | 0 | d 1 | 1 | 1 | e 1 | 2 | 2 | f 3 | 0 | 0 | j 3 | 1 | 1 | k 3 | 2 | 2 | l
You can run the following statement on this table to see how pagination works.
SELECT * FROM my_keyspace.customers WHERE department_id IN (0, 1, 2, 3) AND sales_region_id IN (0, 1, 2) AND sales_representative_id IN (0, 1);
Amazon Keyspaces processes this statement as 24 subqueries, because the cardinality of the
Cartesian product of all the IN
terms contained in this query is 24.
department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 0 | 0 | 0 | a 0 | 1 | 1 | b 1 | 0 | 0 | d 1 | 1 | 1 | e ---MORE--- department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 2 | 0 | 0 | g 2 | 1 | 1 | h 3 | 0 | 0 | j ---MORE--- department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 3 | 1 | 1 | k
This example shows how you can use the ORDER BY
clause in a SELECT
statement
with the IN
keyword.
SELECT * FROM my_keyspace.customers WHERE department_id IN (3, 2, 1) ORDER BY sales_region_id DESC; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 3 | 2 | 2 | l 3 | 1 | 1 | k 3 | 0 | 0 | j 2 | 2 | 2 | i 2 | 1 | 1 | h 2 | 0 | 0 | g 1 | 2 | 2 | f 1 | 1 | 1 | e 1 | 0 | 0 | d
Subqueries are processed in the order in which the partition key and clustering key columns are presented in the query. In the example below, subqueries for partition key value ”2“ are processed first, followed by subqueries for partition key value ”3“ and ”1“. Results of a given subquery are ordered according to the query's ordering clause, if present, or the table's clustering order defined during table creation.
SELECT * FROM my_keyspace.customers WHERE department_id IN (2, 3, 1) ORDER BY sales_region_id DESC; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 2 | 2 | 2 | i 2 | 1 | 1 | h 2 | 0 | 0 | g 3 | 2 | 2 | l 3 | 1 | 1 | k 3 | 0 | 0 | j 1 | 2 | 2 | f 1 | 1 | 1 | e 1 | 0 | 0 | d
Ordering results in Amazon Keyspaces
The ORDER BY
clause specifies the sort order of the results returned
in a SELECT
statement. The statement takes a list of column names as arguments and for each column you can
specify the sort order for the data. You can only specify clustering columns in ordering clauses, non-clustering columns are not allowed.
The two available sort order options for the returned results are ASC
for ascending and DESC
for descending sort order.
SELECT * FROM my_keyspace.my_table ORDER BY (col1 ASC, col2 DESC, col3 ASC); col1 | col2 | col3 ------+------+------ 0 | 6 | a 1 | 5 | b 2 | 4 | c 3 | 3 | d 4 | 2 | e 5 | 1 | f 6 | 0 | g
SELECT * FROM my_keyspace.my_table ORDER BY (col1 DESC, col2 ASC, col3 DESC); col1 | col2 | col3 ------+------+------ 6 | 0 | g 5 | 1 | f 4 | 2 | e 3 | 3 | d 2 | 4 | c 1 | 5 | b 0 | 6 | a
If you don't specify the sort order in the query statement, the default ordering of the clustering column is used.
The possible sort orders you can use in an ordering clause depend on the sort order assigned to each clustering column at table creation. Query results can only be sorted in the order defined for all clustering columns at table creation or the inverse of the defined sort order. Other possible combinations are not allowed.
For example, if the table's CLUSTERING ORDER
is (col1 ASC, col2 DESC, col3 ASC), then the valid
parameters for ORDER BY
are either (col1 ASC, col2 DESC, col3 ASC) or (col1 DESC,
col2 ASC, col3 DESC). For more information on CLUSTERING ORDER
, see table_options
under CREATE TABLE.
Paginating results in Amazon Keyspaces
Amazon Keyspaces automatically paginates the results from SELECT
statements when the data read to process the SELECT
statement exceeds 1 MB. With pagination, the SELECT
statement results are divided into "pages" of data that
are 1 MB in size (or less). An application can process the first page of
results, then the second page, and so on. Clients should always check for pagination tokens when processing SELECT
queries that return
multiple rows.
If a client supplies a PAGE SIZE
that requires reading more than 1 MB of data,
Amazon Keyspaces breaks up the results automatically into multiple pages based on the 1 MB data-read increments.
For example, if the average size of a row is 100 KB and you specify a PAGE SIZE
of 20, Amazon Keyspaces paginates data automatically after it
reads 10 rows (1000 KB of data read).
Because Amazon Keyspaces paginates results based on the number of rows that it reads to process a request and not the number of rows returned in the result set, some pages may not contain any rows if you are running filtered queries.
For example, if you set PAGE SIZE
to 10 and Keyspaces evaluates 30 rows
to process your SELECT
query, Amazon Keyspaces will return three pages. If only a
subset of the rows matched your query, some pages may have less than 10 rows. For an
example how the PAGE SIZE
of LIMIT
queries can affect read
capacity, see Limit queries.