Amazon Redshift
数据库开发人员指南 (API Version 2012-12-01)
AWS 服务或AWS文档中描述的功能,可能因地区/位置而异。点 击 Getting Started with Amazon AWS to see specific differences applicable to the China (Beijing) Region.

步骤 4:在 Amazon S3 中查询您的数据

在创建外部表之后,您可使用用于查询其他 Amazon Redshift 表的同一 SELECT 语句查询外部表。这些 SELECT 语句查询包括联接表、聚合数据和筛选谓词。

在 Amazon S3 中查询您的数据

  1. 获取 SPECTRUM.SALES 表中的行数。

    Copy
    select count(*) from spectrum.sales;
    count ------ 172462
  2. 作为最佳实践,将较大的事实数据表保存在 Amazon S3 中并将较小的维度表保存在 Amazon Redshift 中。如果您已加载 Amazon Redshift 入门中的示例数据,您的数据库中将有一个名为 EVENT 的表。如果没有,请使用以下命令创建 EVENT 表。

    Copy
    create table event( eventid integer not null distkey, venueid smallint not null, catid smallint not null, dateid smallint not null sortkey, eventname varchar(200), starttime timestamp);
  3. 通过将以下 COPY 命令中的 IAM 角色 ARN 替换为步骤 1 中创建的角色 ARN 来加载 EVENT 表。

    Copy
    copy event from 's3://awssampledbuswest2/tickit/allevents_pipe.txt' iam_role 'arn:aws:iam::123456789012:role/mySpectrumRole' delimiter '|' timeformat 'YYYY-MM-DD HH:MI:SS' region 'us-west-2';

    以下示例将外部表 SPECTRUM.SALES 与本地表 EVENT 联接以查找排名前十的活动的销量总额。

    Copy
    select top 10 spectrum.sales.eventid, sum(spectrum.sales.pricepaid) from spectrum.sales, event where spectrum.sales.eventid = event.eventid and spectrum.sales.pricepaid > 30 group by spectrum.sales.eventid order by 2 desc;
    eventid | sum --------+--------- 289 | 51846.00 7895 | 51049.00 1602 | 50301.00 851 | 49956.00 7315 | 49823.00 6471 | 47997.00 2118 | 47863.00 984 | 46780.00 7851 | 46661.00 5638 | 46280.00
  4. 查看上一查询的查询计划。注意针对 Amazon S3 上的数据执行的 S3 Seq ScanS3 HashAggregateS3 Query Scan 步骤。

    Copy
    explain select top 10 spectrum.sales.eventid, sum(spectrum.sales.pricepaid) from spectrum.sales, event where spectrum.sales.eventid = event.eventid and spectrum.sales.pricepaid > 30 group by spectrum.sales.eventid order by 2 desc;
    Copy
    QUERY PLAN ----------------------------------------------------------------------------- XN Limit (cost=1001055770628.63..1001055770628.65 rows=10 width=31) -> XN Merge (cost=1001055770628.63..1001055770629.13 rows=200 width=31) Merge Key: sum(sales.derived_col2) -> XN Network (cost=1001055770628.63..1001055770629.13 rows=200 width=31) Send to leader -> XN Sort (cost=1001055770628.63..1001055770629.13 rows=200 width=31) Sort Key: sum(sales.derived_col2) -> XN HashAggregate (cost=1055770620.49..1055770620.99 rows=200 width=31) -> XN Hash Join DS_BCAST_INNER (cost=3119.97..1055769620.49 rows=200000 width=31) Hash Cond: ("outer".derived_col1 = "inner".eventid) -> XN S3 Query Scan sales (cost=3010.00..5010.50 rows=200000 width=31) -> S3 HashAggregate (cost=3010.00..3010.50 rows=200000 width=16) -> S3 Seq Scan spectrum.sales location:"s3://awssampledbuswest2/tickit/spectrum/sales" format:TEXT (cost=0.00..2150.00 rows=172000 width=16) Filter: (pricepaid > 30.00) -> XN Hash (cost=87.98..87.98 rows=8798 width=4) -> XN Seq Scan on event (cost=0.00..87.98 rows=8798 width=4)