Aurora PostgreSQL Limitless Database 中的单分片查询 - Amazon Aurora
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

Aurora PostgreSQL Limitless Database 中的单分片查询

单分片查询是一种可以在保持 SQL ACID 语义的同时直接在分片上运行的查询。当查询计划器在路由器上遇到这样的查询时,计划器会检测到它并继续将整个 SQL 查询向下推送到相应的分片。

此优化可减少从路由器到分片的网络往返行程数,从而提高性能。目前,此优化是针对 INSERTSELECTUPDATEDELETE 查询执行的。

单分片查询示例

在以下示例中,我们有带有分片键 customer_id 的分片表 customers 和引用表 zipcodes

SELECT
postgres_limitless=> EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM customers WHERE customer_id = 100; QUERY PLAN --------------------------------------------------------- Foreign Scan Output: customer_id, other_id, customer_name, balance Remote SQL: SELECT customer_id, other_id, customer_name, balance FROM public.customers WHERE (customer_id = 100) Single Shard Optimized (9 rows)
postgres_limitless=> EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM orders LEFT JOIN zipcodes ON orders.zipcode_id = zipcodes.zipcode_id WHERE customer_id = 11; QUERY PLAN --------------------------------------------------------------------------------------------------------- Foreign Scan Output: customer_id, order_id, zipcode_id, customer_name, balance, zipcodes.zipcode_id, zipcodes.city Remote SQL: SELECT orders.customer_id, orders.order_id, orders.zipcode_id, orders.customer_name, orders.balance, zipcodes.zipcode_id, zipcodes.city FROM (public.orders LEFT JOIN public.zipcodes ON ((orders.zipcode_id = zipcodes.zipcode_id))) WHERE (orders.customer_id = 11) Single Shard Optimized (13 rows)
INSERT
postgres_limitless=> EXPLAIN (VERBOSE, COSTS OFF) INSERT INTO customers (customer_id, other_id, customer_name, balance) VALUES (1, 10, 'saikiran', 1000); QUERY PLAN ------------------------------------------------------- Insert on public.customers -> Result Output: 1, 10, 'saikiran'::text, '1000'::real Single Shard Optimized (4 rows)
UPDATE
postgres_limitless=> EXPLAIN (VERBOSE, COSTS OFF) UPDATE orders SET balance = balance + 100 WHERE customer_id = 100; QUERY PLAN --------------------------------------------------------------------------------------------- Update on public.orders Foreign Update on public.orders_fs00002 orders_1 -> Foreign Update Remote SQL: UPDATE public.orders SET balance = (balance + (100)::double precision) WHERE (customer_id = 100) Single Shard Optimized (6 rows)
DELETE
postgres_limitless=> EXPLAIN (VERBOSE, COSTS OFF) DELETE FROM orders WHERE customer_id = 100 and balance = 0; QUERY PLAN --------------------------------------------------------------------- Delete on public.orders Foreign Delete on public.orders_fs00002 orders_1 -> Foreign Delete Remote SQL: DELETE FROM public.orders WHERE ((customer_id = 100) AND (balance = (0)::double precision)) Single Shard Optimized (6 rows)

单分片查询的限制

单分片查询有以下限制:

函数

如果单分片查询包含函数,则仅当满足以下条件之一时,该查询才有资格进行单分片优化:

  • 该函数不可变。有关更多信息,请参阅 函数波动性

  • 该函数可变,但已在 rds_aurora.limitless_distributed_functions 视图中注册。有关更多信息,请参阅 函数分布

视图

如果查询包含一个或多个视图,则如果查询具有以下条件之一,将禁用单分片优化:

  • 任何一个视图具有 security_barrier 属性。

  • 查询中使用的对象需要多个用户权限。例如,一个查询包含两个视图,这些视图在两个不同的用户下运行。

CREATE VIEW v1 AS SELECT customer_name FROM customers c WHERE c.customer_id = 1; CREATE VIEW v2 WITH (security_barrier) AS SELECT customer_name FROM customers c WHERE c.customer_id = 1; postgres_limitless=> EXPLAIN VERBOSE SELECT * FROM v1; QUERY PLAN ------------------------------------------------------------------------------------ Foreign Scan (cost=100.00..101.00 rows=100 width=0) Output: customer_name Remote Plans from Shard postgres_s3: Seq Scan on public.customers_ts00001 c (cost=0.00..24.12 rows=6 width=32) Output: c.customer_name Filter: (c.customer_id = 1) Query Identifier: -6005737533846718506 Remote SQL: SELECT customer_name FROM ( SELECT c.customer_name FROM public.customers c WHERE (c.customer_id = 1)) v1 Query Identifier: -5754424854414896228 (12 rows) postgres_limitless=> EXPLAIN VERBOSE SELECT * FROM v2; QUERY PLAN -------------------------------------------------------------------------------------------- Foreign Scan on public.customers_fs00001 c (cost=100.00..128.41 rows=7 width=32) Output: c.customer_name Remote Plans from Shard postgres_s3: Seq Scan on public.customers_ts00001 customers (cost=0.00..24.12 rows=6 width=32) Output: customers.customer_name Filter: (customers.customer_id = 1) Query Identifier: 4136563775490008117 Remote SQL: SELECT customer_name FROM public.customers WHERE ((customer_id = 1)) Query Identifier: 5056054318010163757 (9 rows)
PREPARE 和 EXECUTE 语句

Aurora PostgreSQL Limitless Database 支持对预处理 SELECT 语句进行单分片优化。

但是,如果预处理语句是 UPDATEDELETE,并且包含预处理变量,则查询计划器会拒绝对该查询进行单分片优化。如下例所示:Single Shard Optimized 指标缺失。

postgres_limitless=> PREPARE testStmt AS SELECT customer_name FROM customers c WHERE c.customer_id = $1; PREPARE postgres_limitless=> EXECUTE testStmt(1); customer_name --------------- (0 rows)
postgres_limitless=> EXPLAIN verbose EXECUTE testStmt(1); QUERY PLAN -------------------------------------------------------------------------- Foreign Scan (cost=100.00..101.00 rows=100 width=0) Output: customer_name Remote Plans from Shard postgres_s3: Seq Scan on public.customers_ts00001 c (cost=0.00..24.12 rows=6 width=32) Output: c.customer_name Filter: (c.customer_id = 1) Query Identifier: 1520926022284463170 Remote SQL: SELECT customer_name FROM public.customers c WHERE (customer_id = $1) Query Identifier: 1520926022284463170 (11 rows)
PL/pgSQL

带有 PL/pgSQL 变量的查询将作为隐式预处理语句运行。如果查询包含任何 PL/pgSQL 变量,则查询计划器会拒绝单分片优化。

如果语句不包含任何 PL/pgSQL 变量,则支持在 PL/pgSQL 块中进行优化。

完全限定(显式)联接

单分片优化基于分区消除。PostgreSQL 优化器会根据恒定条件消除分区。如果 Aurora PostgreSQL Limitless Database 发现所有剩余的分区和表都在同一个分片上,则它会将查询标记为符合单分片优化的条件。所有筛选条件都必须是显式的,分区消除才能起作用。如果语句中每个分片表的分片键上没有一个或多个联接谓词或筛选谓词,则 Aurora PostgreSQL Limitless Database 无法消除分区。

假设我们已经根据 customer_id 列对 customersordersorder_details 表进行了分区。在此架构下,应用程序会尝试将客户的所有数据保存在单个分片上。

请考虑以下查询:

SELECT * FROM customers c, orders o, order_details od WHERE c.customer_id = o.customer_id AND od.order_id = o.order_id AND c.customer_id = 1;

此查询检索客户的所有数据 (c.customer_id = 1)。该客户的数据位于单个分片上,但是 Aurora PostgreSQL Limitless Database 不能将此查询限定为单分片查询。查询的优化程序流程如下所示:

  1. 优化器可以根据以下条件消除 customersorders 的分区:

    c.customer_id = 1 c.customer_id = o.customer_id o.customer_id = 1 (transitive implicit condition)
  2. 优化器无法消除 order_details 中的任何分区,因为表中没有恒定条件。

  3. 优化器得出结论,它已经从 order_details 中读取了所有分区。因此,该查询不符合单分片优化的资格。

为了使其成为单分片查询,我们添加了以下显式联接条件:

o.customer_id = od.customer_id

更改后的查询类似于以下示例:

SELECT * FROM customers c, orders o, order_details od WHERE c.customer_id = o.customer_id AND o.customer_id = od.customer_id AND od. order_id = o. order_id AND c.customer_id = 1;

现在,优化器可以消除 order_details 的分区。新查询变为单分片查询并有资格进行优化。

设置活动分片密钥

此功能允许您在查询数据库时设置单个分片键,从而使所有 SELECT 和 DML 查询都将分片键作为常量谓词附加到数据库中。如果您已迁移到 Aurora PostgreSQL Limitless Database,并已通过向表中添加分片键对架构进行去规范化处理,则此功能非常有用。

您可以自动将分片键谓词附加到现有 SQL 逻辑中,而不必更改查询的语义。添加活动分片键谓词仅适用于兼容表

活动分片键功能使用 rds_aurora.limitless_active_shard_key 变量,其语法如下:

SET [session | local] rds_aurora.limitless_active_shard_key = '{"col1_value", "col2_value", ...}';

假设我们有一个在 customer_id 列上进行了分片的 customers 表。

BEGIN; SET local rds_aurora.limitless_create_table_mode='sharded'; SET local rds_aurora.limitless_create_table_shard_key='{"customer_id"}'; CREATE TABLE customers(customer_id int PRIMARY KEY, name text , email text); COMMIT;

如果设置了活跃的分片键,则查询会进行以下转换。

SELECT
SET rds_aurora.limitless_active_shard_key = '{"123"}'; SELECT * FROM customers; -- This statement is changed to: SELECT * FROM customers WHERE customer_id = '123'::int;
INSERT
SET rds_aurora.limitless_active_shard_key = '{"123"}'; INSERT INTO customers(name, email) VALUES('Alex', 'alex@example.com'); -- This statement is changed to: INSERT INTO customers(customer_id, name, email) VALUES('123'::int, 'Alex', 'alex@example.com');
UPDATE
SET rds_aurora.limitless_active_shard_key = '{"123"}'; UPDATE customers SET email = 'alex_new_email@example.com'; -- This statement is changed to: UPDATE customers SET email = 'alex_new_email@example.com' WHERE customer_id = '123'::int;
DELETE
SET rds_aurora.limitless_active_shard_key = '{"123"}'; DELETE FROM customers; -- This statement is changed to: DELETE FROM customers WHERE customer_id = '123'::int;
联接

对具有活动分片键的表执行联接操作时,分片键谓词会自动添加到联接中涉及的所有表中。只有当查询中的所有表都属于同一个搭配组时,才会自动添加分片键谓词。如果查询涉及来自不同搭配组的表,则会引发错误。

假设我们还有 ordersorder_details 表与 customers 表并置。

SET local rds_aurora.limitless_create_table_mode='sharded'; SET local rds_aurora.limitless_create_table_collocate_with='customers'; SET local rds_aurora.limitless_create_table_shard_key='{"customer_id"}'; CREATE TABLE orders (id int , customer_id int, total_amount int, date date); CREATE TABLE order_details (id int , order_id int, customer_id int, product_name VARCHAR(100), price int); COMMIT;

检索客户 ID 为 10 的客户的最近 10 张订单发票。

SET rds_aurora.limitless_active_shard_key = '{"10"}'; SELECT * FROM customers, orders, order_details WHERE orders.customer_id = customers.customer_id AND order_details.order_id = orders.order_id AND customers.customer_id = 10 order by order_date limit 10;

该查询转换为以下内容:

SELECT * FROM customers, orders, order_details WHERE orders.customer_id = customers.customer_id AND orders.order_id = order_details.order_id AND customers.customer_id = 10 AND order_details.customer_id = 10 AND orders.customer_id = 10 AND ORDER BY "order_date" LIMIT 10;
活动分片键兼容表

分片键谓词仅添加到与活动分片键兼容的表中。如果表的分片键中的列数与 rds_aurora.limitless_active_shard_key 变量中指定的列数相同,则该表被视为兼容。如果查询涉及与活动分片键不兼容的表,则系统会提示错误并停止查询。

例如:

-- Compatible table SET rds_aurora.limitless_active_shard_key = '{"10"}'; -- The following query works because the customers table is sharded on one column. SELECT * FROM customers; -- Incompatible table SET rds_aurora.limitless_active_shard_key = '{"10","20"}'; -- The following query raises a error because the customers table isn't sharded on two columns. SELECT * FROM customers;