Amazon Redshift
数据库开发人员指南 (API Version 2012-12-01)
AWS 服务或AWS文档中描述的功能,可能因地区/位置而异。点 击 Getting Started with Amazon AWS to see specific differences applicable to the China (Beijing) Region.

步骤 7:优化后重新测试系统性能

重新创建具有选定排序键、分配方式和压缩编码的测试数据集后,重新测试系统性能。

优化后重新测试系统性能

  1. 记录存储使用情况。

    查询 STV_BLOCKLIST 表,确定每张表使用多少个 1 MB 的磁盘空间数据块,并将结果记录到基准表中。

    Copy
    select stv_tbl_perm.name as "table", count(*) as "blocks (mb)" from stv_blocklist, stv_tbl_perm where stv_blocklist.tbl = stv_tbl_perm.id and stv_blocklist.slice = stv_tbl_perm.slice and stv_tbl_perm.name in ('customer', 'part', 'supplier', 'dwdate', 'lineorder') group by stv_tbl_perm.name order by 1 asc;

    您的结果与如下显示类似:

    Copy
    table | blocks (mb) -----------+----------------- customer 604 dwdate 160 lineorder 27152 part 200 supplier 236
  2. 检查分配偏斜。

    不均匀的分配或数据分配偏斜会导致某些节点执行的工作比其他节点多,从而制约查询性能。

    要检查分配偏斜,请查询 SVV_DISKUSAGE 系统视图。SVV_DISKUSAGE 中的每行都记录着一个磁盘数据块的统计数据。num_values 列给出了该磁盘数据块中的行数,因此,sum(num_values) 返回每个切片上的行数。

    执行下面的查询以查看 SSB 数据库中所有表的分配。

    Copy
    select trim(name) as table, slice, sum(num_values) as rows, min(minvalue), max(maxvalue) from svv_diskusage where name in ('customer', 'part', 'supplier', 'dwdate', 'lineorder') and col =0 group by name, slice order by name, slice;

    您的结果与如下显示类似:

    Copy
    table | slice | rows | min | max -----------+-------+----------+----------+----------- customer | 0 | 3000000 | 1 | 3000000 customer | 2 | 3000000 | 1 | 3000000 customer | 4 | 3000000 | 1 | 3000000 customer | 6 | 3000000 | 1 | 3000000 dwdate | 0 | 2556 | 19920101 | 19981230 dwdate | 2 | 2556 | 19920101 | 19981230 dwdate | 4 | 2556 | 19920101 | 19981230 dwdate | 6 | 2556 | 19920101 | 19981230 lineorder | 0 | 75029991 | 3 | 599999975 lineorder | 1 | 75059242 | 7 | 600000000 lineorder | 2 | 75238172 | 1 | 599999975 lineorder | 3 | 75065416 | 1 | 599999973 lineorder | 4 | 74801845 | 3 | 599999975 lineorder | 5 | 75177053 | 1 | 599999975 lineorder | 6 | 74631775 | 1 | 600000000 lineorder | 7 | 75034408 | 1 | 599999974 part | 0 | 175006 | 15 | 1399997 part | 1 | 175199 | 1 | 1399999 part | 2 | 175441 | 4 | 1399989 part | 3 | 175000 | 3 | 1399995 part | 4 | 175018 | 5 | 1399979 part | 5 | 175091 | 11 | 1400000 part | 6 | 174253 | 2 | 1399969 part | 7 | 174992 | 13 | 1399996 supplier | 0 | 1000000 | 1 | 1000000 supplier | 2 | 1000000 | 1 | 1000000 supplier | 4 | 1000000 | 1 | 1000000 supplier | 6 | 1000000 | 1 | 1000000 (28 rows)

    下图显示了三个最大的表的分配。(列不按比例。)注意,CUSTOMER 使用了 ALL 分配,因此,它只分配到每个节点上的一个切片。

    分配相对均匀,因此,您无需为分配偏斜进行调整。

  3. 对每个查询运行 EXPLAIN 命令以查看查询计划。

    下面的示例显示了查询 2 的 EXPLAIN 命令。

    Copy
    explain select sum(lo_revenue), d_year, p_brand1 from lineorder, dwdate, part, supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_category = 'MFGR#12' and s_region = 'AMERICA' group by d_year, p_brand1 order by d_year, p_brand1;

    注意,在查询 2 的 EXPLAIN 计划中,DS_BCAST_INNER 标签已被 DS_DIST_ALL_NONE 和 DS_DIST_NONE 取代,这意味着无需为这些步骤执行重新分配操作,因此,查询的运行速度应会快得多。

    Copy
    QUERY PLAN XN Merge (cost=1000014243538.45..1000014243539.15 rows=280 width=20) Merge Key: dwdate.d_year, part.p_brand1 -> XN Network (cost=1000014243538.45..1000014243539.15 rows=280 width=20) Send to leader -> XN Sort (cost=1000014243538.45..1000014243539.15 rows=280 width=20) Sort Key: dwdate.d_year, part.p_brand1 -> XN HashAggregate (cost=14243526.37..14243527.07 rows=280 width=20) -> XN Hash Join DS_DIST_ALL_NONE (cost=30643.30..14211277.03 rows=4299912 Hash Cond: ("outer".lo_orderdate = "inner".d_datekey) -> XN Hash Join DS_DIST_ALL_NONE (cost=30611.35..14114497.06 Hash Cond: ("outer".lo_suppkey = "inner".s_suppkey) -> XN Hash Join DS_DIST_NONE (cost=17640.00..13758507.64 Hash Cond: ("outer".lo_partkey = "inner".p_partkey) -> XN Seq Scan on lineorder (cost=0.00..6000378.88 -> XN Hash (cost=17500.00..17500.00 rows=56000 width=16) -> XN Seq Scan on part (cost=0.00..17500.00 Filter: ((p_category)::text = 'MFGR#12'::text) -> XN Hash (cost=12500.00..12500.00 rows=188541 width=4) -> XN Seq Scan on supplier (cost=0.00..12500.00 Filter: ((s_region)::text = 'AMERICA'::text) -> XN Hash (cost=25.56..25.56 rows=2556 width=8) -> XN Seq Scan on dwdate (cost=0.00..25.56 rows=2556 width=8)
  4. 再次运行相同的测试查询。

    如您之前所做的那样,运行以下查询两次,以消除编译时间。在基准表中记录第二次执行每个查询的时间。

    Copy
    -- Query 1 -- Restrictions on only one dimension. select sum(lo_extendedprice*lo_discount) as revenue from lineorder, dwdate where lo_orderdate = d_datekey and d_year = 1997 and lo_discount between 1 and 3 and lo_quantity < 24; -- Query 2 -- Restrictions on two dimensions select sum(lo_revenue), d_year, p_brand1 from lineorder, dwdate, part, supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_category = 'MFGR#12' and s_region = 'AMERICA' group by d_year, p_brand1 order by d_year, p_brand1; -- Query 3 -- Drill down in time to just one month select c_city, s_city, d_year, sum(lo_revenue) as revenue from customer, lineorder, supplier, dwdate where lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_orderdate = d_datekey and (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_yearmonth = 'Dec1997' group by c_city, s_city, d_year order by d_year asc, revenue desc;

下面的基准表显示了基于本示例中所用群集的结果。您的结果可能会因多方面的原因而有所变化,但应该相差不大。

基准 优化前 优化后
加载时间(五张表) 10m 23s 12m 15s
占用存储
LINEORDER 51024 27152
PART 200 200
CUSTOMER 384 604
DWDATE 160 160
SUPPLIER 152 236
总存储空间 51920 28352
查询执行时间
查询 1 6.97 3.19
查询 2 12.81 9.02
查询 3 13.39 10.54
总执行时间 33.17 22.75

下一步

步骤 8:评估结果