读取时合并(MoR)创建表示例 - Amazon Athena
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

读取时合并(MoR)创建表示例

Hudi 在元数据仓中为 MoR 创建了两个表:一个用于快照查询的表,一个用于读取优化查询的表。两个表都可查询。在 0.5.1 之前的 Hudi 版本中,用于读优化查询的表具有您在创建表时指定的名称。从 Hudi 版本 0.5.1 开始,表名将默认以 _ro 为后缀。用于快照查询的表的名称是指定的名称,其中附加了 _rt

未分区的读取时合并(MoR)表

以下示例在 Athena 中创建了一个未分区的 MoR 表用于读取优化查询。请注意,读取优化查询使用输入格式 HoodieParquetInputFormat

CREATE EXTERNAL TABLE `nonpartition_mor`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `event_id` string, `event_time` string, `event_name` string, `event_guests` int, `event_type` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/folder/nonpartition_mor/'

以下示例在 Athena 中创建了一个未分区的 MoR 表用于快照查询。对于快照查询,请使用输入格式 HoodieParquetRealtimeInputFormat

CREATE EXTERNAL TABLE `nonpartition_mor_rt`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `event_id` string, `event_time` string, `event_name` string, `event_guests` int, `event_type` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/folder/nonpartition_mor/'

已分区的读取时合并(MoR)表

以下示例在 Athena 中创建了一个已分区的 MoR 表用于读取优化查询。

CREATE EXTERNAL TABLE `partition_mor`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `event_id` string, `event_time` string, `event_name` string, `event_guests` int) PARTITIONED BY ( `event_type` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/folder/partition_mor/'

以下 ALTER TABLE ADD PARTITION 示例将两个分区添加到了示例 partition_mor 表。

ALTER TABLE partition_mor ADD PARTITION (event_type = 'one') LOCATION 's3://amzn-s3-demo-bucket/folder/partition_mor/one/' PARTITION (event_type = 'two') LOCATION 's3://amzn-s3-demo-bucket/folder/partition_mor/two/'

以下示例在 Athena 中创建了一个已分区的 MoR 表用于快照查询。

CREATE EXTERNAL TABLE `partition_mor_rt`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `event_id` string, `event_time` string, `event_name` string, `event_guests` int) PARTITIONED BY ( `event_type` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/folder/partition_mor/'

同样,以下 ALTER TABLE ADD PARTITION 示例将两个分区添加到了示例 partition_mor_rt 表。

ALTER TABLE partition_mor_rt ADD PARTITION (event_type = 'one') LOCATION 's3://amzn-s3-demo-bucket/folder/partition_mor/one/' PARTITION (event_type = 'two') LOCATION 's3://amzn-s3-demo-bucket/folder/partition_mor/two/'