确定具有数据偏斜或未排序行的表 - Amazon Redshift
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

确定具有数据偏斜或未排序行的表

以下查询标识数据分配不均匀(数据偏斜)或未排序行百分比较高的表。

skew 值较低表明表数据分配适当。如果表的 skew 值达到 4.00 或以上,可以考虑修改其数据分配方式。有关更多信息,请参阅 非最优数据分配

如果表的 pct_unsorted 值大于 20%,可以考虑运行 VACUUM 命令。有关更多信息,请参阅 未排序或排序错乱的行

还应检查每个表的 mbytespct_of_total 值。这些列标识表的大小及表占用的原始磁盘空间比例。原始磁盘空间包括 Amazon Redshift 保留供内部使用的空间,因此它大于名义磁盘容量(可供用户使用的磁盘空间量)。使用这些信息可验证可用磁盘空间等于最大表大小的 2.5 倍或以上。如果使用此空间,系统可以在处理复杂查询时将临时结果写入磁盘。

select trim(pgn.nspname) as schema, trim(a.name) as table, id as tableid, decode(pgc.reldiststyle,0, 'even',1,det.distkey ,8,'all') as distkey, dist_ratio.ratio::decimal(10,4) as skew, det.head_sort as "sortkey", det.n_sortkeys as "#sks", b.mbytes, decode(b.mbytes,0,0,((b.mbytes/part.total::decimal)*100)::decimal(5,2)) as pct_of_total, decode(det.max_enc,0,'n','y') as enc, a.rows, decode( det.n_sortkeys, 0, null, a.unsorted_rows ) as unsorted_rows , decode( det.n_sortkeys, 0, null, decode( a.rows,0,0, (a.unsorted_rows::decimal(32)/a.rows)*100) )::decimal(5,2) as pct_unsorted from (select db_id, id, name, sum(rows) as rows, sum(rows)-sum(sorted_rows) as unsorted_rows from stv_tbl_perm a group by db_id, id, name) as a join pg_class as pgc on pgc.oid = a.id join pg_namespace as pgn on pgn.oid = pgc.relnamespace left outer join (select tbl, count(*) as mbytes from stv_blocklist group by tbl) b on a.id=b.tbl inner join (select attrelid, min(case attisdistkey when 't' then attname else null end) as "distkey", min(case attsortkeyord when 1 then attname else null end ) as head_sort , max(attsortkeyord) as n_sortkeys, max(attencodingtype) as max_enc from pg_attribute group by 1) as det on det.attrelid = a.id inner join ( select tbl, max(mbytes)::decimal(32)/min(mbytes) as ratio from (select tbl, trim(name) as name, slice, count(*) as mbytes from svv_diskusage group by tbl, name, slice ) group by tbl, name ) as dist_ratio on a.id = dist_ratio.tbl join ( select sum(capacity) as total from stv_partitions where part_begin=0 ) as part on 1=1 where mbytes is not null order by mbytes desc;