使用 HCatalog - Amazon EMR
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

使用 HCatalog

您可以在使用 Hive 元存储的各种应用程序中使用 HCatalog。此部分中的示例演示如何创建表以及如何在 Pig 和 Spark SQL 上下文中使用该表。

使用 HCatalog HStorer 时禁用直接写入

当应用程序使用 HCatStorer 对存储在 Amazon S3 中的 HCatalog 表进行写入时,禁用 Amazon EMR 的直接写入功能。例如,在使用 Pig STORE 命令或运行将 HCatalog 表写入 Amazon S3 的 Sqoop 作业时,禁用直接写入。您可以通过将 mapred.output.direct.NativeS3FileSystemmapred.output.direct.EmrFileSystem 配置设置为 false 来禁用直接写入功能。以下示例演示如何使用 Java 设置这些配置。

Configuration conf = new Configuration(); conf.set("mapred.output.direct.NativeS3FileSystem", "false"); conf.set("mapred.output.direct.EmrFileSystem", "false");

使用 HCat CLI 创建表并在 Pig 中使用该数据

在您的集群上创建以下脚本 impressions.q

CREATE EXTERNAL TABLE impressions ( requestBeginTime string, adId string, impressionId string, referrer string, userAgent string, userCookie string, ip string ) PARTITIONED BY (dt string) ROW FORMAT serde 'org.apache.hive.hcatalog.data.JsonSerDe' with serdeproperties ( 'paths'='requestBeginTime, adId, impressionId, referrer, userAgent, userCookie, ip' ) LOCATION 's3://[your region].elasticmapreduce/samples/hive-ads/tables/impressions/'; ALTER TABLE impressions ADD PARTITION (dt='2009-04-13-08-05');

使用 HCat CLI 执行脚本:

% hcat -f impressions.q Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties OK Time taken: 4.001 seconds OK Time taken: 0.519 seconds

打开 Grunt shell 并访问 impressions 中的数据:

% pig -useHCatalog -e "A = LOAD 'impressions' USING org.apache.hive.hcatalog.pig.HCatLoader(); B = LIMIT A 5; dump B;" <snip> (1239610346000,m9nwdo67Nx6q2kI25qt5On7peICfUM,omkxkaRpNhGPDucAiBErSh1cs0MThC,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; FunWebProducts; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET,wcVWWTascoPbGt6bdqDbuWTPPHgOPs,69.191.224.234,2009-04-13-08-05) (1239611000000,NjriQjdODgWBKnkGJUP6GNTbDeK4An,AWtXPkfaWGOaNeL9OOsFU8Hcj6eLHt,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6; .NET CLR 1.1.4322),OaMU1F2gE4CtADVHAbKjjRRks5kIgg,57.34.133.110,2009-04-13-08-05) (1239610462000,Irpv3oiu0I5QNQiwSSTIshrLdo9cM1,i1LDq44LRSJF0hbmhB8Gk7k9gMWtBq,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; InfoPath.1),QSb3wkLR4JAIut4Uq6FNFQIR1rCVwU,42.174.193.253,2009-04-13-08-05) (1239611007000,q2Awfnpe0JAvhInaIp0VGx9KTs0oPO,s3HvTflPB8JIE0IuM6hOEebWWpOtJV,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; InfoPath.1),QSb3wkLR4JAIut4Uq6FNFQIR1rCVwU,42.174.193.253,2009-04-13-08-05) (1239610398000,c362vpAB0soPKGHRS43cj6TRwNeOGn,jeas5nXbQInGAgFB8jlkhnprN6cMw7,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6; .NET CLR 1.1.4322),k96n5PnUmwHKfiUI0TFP0TNMfADgh9,51.131.29.87,2009-04-13-08-05) 7120 [main] INFO org.apache.pig.Main - Pig script completed in 7 seconds and 199 milliseconds (7199 ms) 16/03/08 23:17:10 INFO pig.Main: Pig script completed in 7 seconds and 199 milliseconds (7199 ms)

使用 Spark SQL 访问表

此示例通过第一个示例中创建的表来创建 Spark DataFrame,并显示前 20 行:

% spark-shell --jars /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-1.0.0-amzn-3.jar <snip> scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc); scala> val df = hiveContext.sql("SELECT * FROM impressions") scala> df.show() <snip> 16/03/09 17:18:46 INFO DAGScheduler: ResultStage 0 (show at <console>:32) finished in 10.702 s 16/03/09 17:18:46 INFO DAGScheduler: Job 0 finished: show at <console>:32, took 10.839905 s +----------------+--------------------+--------------------+------------------+--------------------+--------------------+--------------+----------------+ |requestbegintime| adid| impressionid| referrer| useragent| usercookie| ip| dt| +----------------+--------------------+--------------------+------------------+--------------------+--------------------+--------------+----------------+ | 1239610346000|m9nwdo67Nx6q2kI25...|omkxkaRpNhGPDucAi...|cartoonnetwork.com|Mozilla/4.0 (comp...|wcVWWTascoPbGt6bd...|69.191.224.234|2009-04-13-08-05| | 1239611000000|NjriQjdODgWBKnkGJ...|AWtXPkfaWGOaNeL9O...|cartoonnetwork.com|Mozilla/4.0 (comp...|OaMU1F2gE4CtADVHA...| 57.34.133.110|2009-04-13-08-05| | 1239610462000|Irpv3oiu0I5QNQiwS...|i1LDq44LRSJF0hbmh...|cartoonnetwork.com|Mozilla/4.0 (comp...|QSb3wkLR4JAIut4Uq...|42.174.193.253|2009-04-13-08-05| | 1239611007000|q2Awfnpe0JAvhInaI...|s3HvTflPB8JIE0IuM...|cartoonnetwork.com|Mozilla/4.0 (comp...|QSb3wkLR4JAIut4Uq...|42.174.193.253|2009-04-13-08-05| | 1239610398000|c362vpAB0soPKGHRS...|jeas5nXbQInGAgFB8...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610600000|cjBTpruoaiEtqLuMX...|XwlohBSs8Ipxs1bRa...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610804000|Ms3eJHNAEItpxvimd...|4SIj4pGmgVLl625BD...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610872000|h5bccHX6wJReDi1jL...|EFAWIiBdVfnxwAMWP...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610365000|874NBpGmxNFfxEPKM...|xSvE4XtGbdtXPF2Lb...|cartoonnetwork.com|Mozilla/5.0 (Maci...|eWDEVVUphlnRa273j...| 22.91.173.232|2009-04-13-08-05| | 1239610348000|X8gISpUTSqh1A5reS...|TrFblGT99AgE75vuj...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610743000|kbKreLWB6QVueFrDm...|kVnxx9Ie2i3OLTxFj...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610812000|9lxOSRpEi3bmEeTCu...|1B2sff99AEIwSuLVV...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610876000|lijjmCf2kuxfBTnjL...|AjvufgUtakUFcsIM9...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610941000|t8t8trgjNRPIlmxuD...|agu2u2TCdqWP08rAA...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610490000|OGRLPVNGxiGgrCmWL...|mJg2raBUpPrC8OlUm...| corriere.it|Mozilla/4.0 (comp...|r2k96t1CNjSU9fJKN...| 71.124.66.3|2009-04-13-08-05| | 1239610556000|OnJID12x0RXKPUgrD...|P7Pm2mPdW6wO8KA3R...| corriere.it|Mozilla/4.0 (comp...|r2k96t1CNjSU9fJKN...| 71.124.66.3|2009-04-13-08-05| | 1239610373000|WflsvKIgOqfIE5KwR...|TJHd1VBspNcua0XPn...| corriere.it|Mozilla/5.0 (Maci...|fj2L1ILTFGMfhdrt3...| 75.117.56.155|2009-04-13-08-05| | 1239610768000|4MJR0XxiVCU1ueXKV...|1OhGWmbvKf8ajoU8a...| corriere.it|Mozilla/5.0 (Maci...|fj2L1ILTFGMfhdrt3...| 75.117.56.155|2009-04-13-08-05| | 1239610832000|gWIrpDiN57i3sHatv...|RNL4C7xPi3tdar2Uc...| corriere.it|Mozilla/5.0 (Maci...|fj2L1ILTFGMfhdrt3...| 75.117.56.155|2009-04-13-08-05| | 1239610789000|pTne9k62kJ14QViXI...|RVxJVIQousjxUVI3r...| pixnet.net|Mozilla/5.0 (Maci...|1bGOKiBD2xmui9OkF...| 33.176.101.80|2009-04-13-08-05| +----------------+--------------------+--------------------+------------------+--------------------+--------------------+--------------+----------------+ only showing top 20 rows scala>