Amazon Redshift
数据库开发人员指南 (API Version 2012-12-01)
AWS 服务或AWS文档中描述的功能,可能因地区/位置而异。点 击 Getting Started with Amazon AWS to see specific differences applicable to the China (Beijing) Region.

CREATE EXTERNAL TABLE

在指定架构中创建一个新外部表。所有外部表必须在外部架构中创建。外部架构和外部表不支持搜索路径。有关更多信息,请参阅 CREATE EXTERNAL SCHEMA

对外部表的访问由对外部架构的访问权限控制。您无法对外部表的权限执行 GRANTREVOKE 操作。相反,您可授予或撤销对外部架构的权限。

除了使用 CREATE EXTERNAL TABLE 命令创建的外部表之外,Amazon Redshift 还可引用 Amazon Athena 数据目录或 Hive 元存储中定义的外部表。使用 CREATE EXTERNAL SCHEMA 命令可注册在 Athena 数据目录或 Hive 元存储中定义的外部数据库并使外部表可在 Amazon Redshift 中使用。如果外部表已存在于 Athena 数据目录或 Hive 元存储中,您无需使用 CREATE EXTERNAL TABLE 创建该表。要查看外部表,请查询 SVV_EXTERNAL_TABLES 系统视图。

您可使用用于其他 Amazon Redshift 表的同一 SELECT 语法查询外部表。外部表为只读。您无法对外部表进行写入。

您无法在事务 (BEGIN … END) 内执行 CREATE EXTERNAL TABLE。

语法

CREATE EXTERNAL TABLE external_schema.table_name (column_name data_type [, …] ) [ PARTITIONED BY (col_name data_type [, … ] )] [ ROW FORMAT DELIMITED row_format ] STORED AS format LOCATION 's3://bucket/folder/' [ TABLE PROPERTIES ( 'numRows'='row_count' ) ]

参数

external_schema.table_name

要创建的表的名称 (由外部架构名称进行限定)。外部表必须在外部架构中创建。有关更多信息,请参阅 CREATE EXTERNAL SCHEMA

表名称的最大长度为 127 个字节;更长的名称将被截断为 127 个字节。您可以使用 UTF-8 多字节字符,每个字符最多为四个字节。Amazon Redshift 强制实施每个集群 9,900 个表的限制,包括用户定义的临时表以及 Amazon Redshift 在查询处理或系统维护期间创建的临时表。您也可以使用数据库名称限定表名称。在下面的示例中,spectrum_db 是数据库名称,spectrum_schema 是外部架构名称,而 test 是表名称。

Copy
create external table spectrum_db.spectrum_schema.test (c1 int) stored as textfile location 's3://mybucket/myfolder/';

如果指定数据库或架构不存在,则不会创建表,并且语句将返回错误。您无法在系统数据库 template0template1padb_harvest 中创建表或视图。

表名称对于指定架构必须是唯一名称。

有关有效名称的更多信息,请参阅名称和标识符

( column_name data_type )

要创建的每个列的名称和数据类型。

列名称的最大长度为 127 个字节;更长的名称将被截断为 127 个字节。您可以使用 UTF-8 多字节字符,每个字符最多为四个字节。可在单个表中定义的列的最大数目为 1,600。有关有效名称的更多信息,请参阅名称和标识符

注意

如果您创建的是“宽表”,请注意,在加载和查询处理期间,不要让列列表超出中间结果的行宽度边界。有关更多信息,请参阅 使用说明

data_type

支持以下数据类型

  • SMALLINT (INT2)

  • INTEGER (INT, INT4)

  • BIGINT (INT8)

  • DECIMAL (NUMERIC)

  • REAL (FLOAT4)

  • DOUBLE PRECISION (FLOAT8)

  • BOOLEAN (BOOL)

  • CHAR (CHARACTER)

  • VARCHAR (CHARACTER VARYING)

  • DATE (DATE 数据类型只能用作分区列)

  • TIMESTAMP

文本文件中的时间戳值的格式必须为 yyyy-MM-dd HH:mm:ss.SSSSSS,如以下时间戳值所示:2017-05-01 11:30:59.000000

PARTITIONED BY (col_name data_type [, … ] )

用于定义包含一个或多个分区列的已分区表的子句。单独的数据目录用于每个指定的组合,这在某些情况下可提高查询性能。已分区列在表数据本身中不存在。如果您将与表列相同的某个值用于 col_name,则会产生错误。

在创建已分区表后,请使用 ALTER TABLE … ADD PARTITION 语句修改表以添加分区。在添加分区时,您应定义包含分区数据的子文件夹在 Amazon S3 上的位置。您在每个 ALTER TABLE 语句中只能添加一个分区。

例如,如果表 spectrum.lineitem_part 是使用 PARTITIONED BY (l_shipdate date) 定义的,请执行以下 ALTER TABLE 命令来添加分区。

Copy
ALTER TABLE spectrum.lineitem_part ADD PARTITION (l_shipdate='1992-01-29') LOCATION 's3://spectrum-public/lineitem_partition/l_shipdate=1992-01-29';

要查看分区,请查询 SVV_EXTERNAL_PARTITIONS 系统视图。

ROW FORMAT DELIMITED rowformat

用于指定基础数据的格式的子句。rowformat 的可能值如下所示:

  • FIELDS TERMINATED BY 'delimiter'

  • LINES TERMINATED BY 'delimiter'

如果忽略了 ROW FORMAT DELIMITED,则默认格式为由“\A”终止的字段和由“\n”终止的行。

STORED AS format

数据文件的文件格式。

有效格式如下所示:

  • AVRO

  • PARQUET

  • RCFILE (仅针对使用 ColumnarSerDe 而不是 LazyBinaryColumnarSerDe 的数据)

  • SEQUENCEFILE

  • TEXTFILE

LOCATION 's3://bucket/folder/'

包含数据文件的 Amazon S3 文件夹的路径 - 例如“s3://mybucket/custdata/”。该存储桶必须与集群位于同一区域。支持的区域包括 us-east-1、us-east-2 和 us-west-2。

Redshift Spectrum 扫描指定的文件夹和任意子文件夹中的文件。Redshift Spectrum 将忽略隐藏文件以及以句点、下划线或哈希标记 (.、_ 或 #) 开头或以波形符结尾的文件。

TABLE PROPERTIES ( 'numRows'='row_count' )

用于为表定义设置 numRows 值的子句。要明确更新外部表的统计数据,请设置 numRows 属性来指示表的大小。Amazon Redshift 不分析外部表来生成表统计数据,查询优化程序会使用这些统计数据来生成查询计划。如果没有为外部表设置表统计数据,则 Amazon Redshift 假设外部表是较大的表,本地表是较小的表,以此来生成查询执行计划。

注意

numRows 属性区分大小写。

使用说明

您无法查看所用的资源与您用于标准 Amazon Redshift 表 (如 PG_TABLE_DEFSTV_TBL_PERM、PG_CLASS 或 information_schema) 的资源相同的 Amazon Redshift Spectrum 表的详细信息。如果您的商业智能或分析工具无法识别 Redshift Spectrum 外部表,请将您的应用程序为配置查询 SVV_EXTERNAL_TABLESSVV_EXTERNAL_COLUMNS

示例

以下示例在名为 spectrum 的 Amazon Redshift 外部架构中创建一个名为 SALES 的表。数据位于制表符分隔的文本文件中。TABLE PROPERTIES 子句将 numRows 属性设置为 170000 行。

Copy
create external table spectrum.sales( salesid integer, listid integer, sellerid integer, buyerid integer, eventid integer, dateid smallint, qtysold smallint, pricepaid decimal(8,2), commission decimal(8,2), saletime timestamp) row format delimited fields terminated by '\t' stored as textfile location 's3://awssampledbuswest2/tickit/spectrum/sales/' table properties ('numRows'='170000');

要创建按日期分区的外部表,请运行以下命令。

Copy
create external table spectrum.sales_part( salesid integer, listid integer, sellerid integer, buyerid integer, eventid integer, dateid smallint, qtysold smallint, pricepaid decimal(8,2), commission decimal(8,2), saletime timestamp) partitioned by (saledate date) row format delimited fields terminated by '|' stored as textfile location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/' table properties ('numRows'='170000');

要添加分区,请运行以下 ALTER TABLE 命令。

Copy
alter table spectrum.sales_part add partition(saledate='2008-01-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-01/'; alter table spectrum.sales_part add partition(saledate='2008-02-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02/'; alter table spectrum.sales_part add partition(saledate='2008-03-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-03/'; alter table spectrum.sales_part add partition(saledate='2008-04-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-04/'; alter table spectrum.sales_part add partition(saledate='2008-05-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-05/'; alter table spectrum.sales_part add partition(saledate='2008-06-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-06/'; alter table spectrum.sales_part add partition(saledate='2008-07-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-07/'; alter table spectrum.sales_part add partition(saledate='2008-08-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-08/'; alter table spectrum.sales_part add partition(saledate='2008-09-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-09/'; alter table spectrum.sales_part add partition(saledate='2008-10-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-10/'; alter table spectrum.sales_part add partition(saledate='2008-11-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-11/'; alter table spectrum.sales_part add partition(saledate='2008-12-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-12/';

运行以下查询可选择已分区表中的数据。

Copy
select top 10 spectrum.sales_part.eventid, sum(spectrum.sales_part.pricepaid) from spectrum.sales_part, event where spectrum.sales_part.eventid = event.eventid and spectrum.sales_part.pricepaid > 30 and saledate = '2008-12-01' group by spectrum.sales_part.eventid order by 2 desc;
eventid | sum --------+--------- 914 | 36173.00 5478 | 27303.00 5061 | 26383.00 4406 | 26252.00 5324 | 24015.00 1829 | 23911.00 3601 | 23616.00 3665 | 23214.00 6069 | 22869.00 5638 | 22551.00

要查看外部表分区,请查询 SVV_EXTERNAL_PARTITIONS 系统视图。

Copy
select schemaname, tablename, values, location from svv_external_partitions where tablename = 'sales_part';
Copy
schemaname | tablename | values | location -----------+------------+----------------+-------------------------------------------------- spectrum | sales_part | ["2008-01-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-01 spectrum | sales_part | ["2008-02-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02 spectrum | sales_part | ["2008-03-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-03 spectrum | sales_part | ["2008-04-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-04 spectrum | sales_part | ["2008-05-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-05 spectrum | sales_part | ["2008-06-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-06 spectrum | sales_part | ["2008-07-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-07 spectrum | sales_part | ["2008-08-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-08 spectrum | sales_part | ["2008-09-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-09 spectrum | sales_part | ["2008-10-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-10 spectrum | sales_part | ["2008-11-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-11 spectrum | sales_part | ["2008-12-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-12

SVV_EXTERNAL_DATABASES 系统视图中查找外部数据目录中的现有数据库的列表

Copy
select eskind,databasename,esoptions from svv_external_databases order by databasename;
eskind | databasename | esoptions -------+--------------+---------------------------------------------------------------------------------- 1 | default | {"REGION":"us-west-2","IAM_ROLE":"arn:aws:iam::123456789012:role/mySpectrumRole"} 1 | sampledb | {"REGION":"us-west-2","IAM_ROLE":"arn:aws:iam::123456789012:role/mySpectrumRole"} 1 | spectrumdb | {"REGION":"us-west-2","IAM_ROLE":"arn:aws:iam::123456789012:role/mySpectrumRole"} 

要查看外部表的详细信息,请查询 SVV_EXTERNAL_TABLESSVV_EXTERNAL_COLUMNS 系统视图。

以下示例将查询 SVV_EXTERNAL_TABLES 视图。

Copy
select schemaname, tablename, location from svv_external_tables;
schemaname | tablename | location -----------+----------------------+-------------------------------------------------------- spectrum | sales | s3://awssampledbuswest2/tickit/spectrum/sales spectrum | sales_part | s3://awssampledbuswest2/tickit/spectrum/sales_partition spectrum | sales_part_timestamp | s3://rs-spectrum/sales_partition

以下示例将查询 SVV_EXTERNAL_COLUMNS 视图。

Copy
select * from svv_external_columns where schemaname like 'rsspectrum%' and tablename ='listing';

 schemaname | tablename | columnname | external_type | part_key ------------+-----------+----------------+---------------+---------- rsspectrum | listing | dateid | smallint | 0 rsspectrum | listing | eventid | int | 0 rsspectrum | listing | listid | int | 0 rsspectrum | listing | listtime | timestamp | 0 rsspectrum | listing | numtickets | smallint | 0 rsspectrum | listing | priceperticket | decimal(8,2) | 0 rsspectrum | listing | sellerid | int | 0 rsspectrum | listing | totalprice | decimal(8,2) | 0 (8 rows)

使用以下查询可查看表分区。

Copy
select schemaname, tablename, values, location from svv_external_partitions where tablename = 'sales_part';
Copy
schemaname | tablename | values | location -----------+------------+----------------+------------------------------------------------------------------------- spectrum | sales_part | ["2008-01-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-01 spectrum | sales_part | ["2008-02-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02 spectrum | sales_part | ["2008-03-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-03 spectrum | sales_part | ["2008-04-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-04 spectrum | sales_part | ["2008-05-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-05 spectrum | sales_part | ["2008-06-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-06 spectrum | sales_part | ["2008-07-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-07 spectrum | sales_part | ["2008-08-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-08 spectrum | sales_part | ["2008-09-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-09 spectrum | sales_part | ["2008-10-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-10 spectrum | sales_part | ["2008-11-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-11 spectrum | sales_part | ["2008-12-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-12

本页内容: