Amazon EMR
Amazon EMR 版本指南
AWS 文档中描述的 AWS 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 AWS 服务入门

使用 Tez

以下示例为您演示如何对教程入门:使用 Amazon EMR 分析大数据中使用的数据和脚本(如步骤 3 中所示)使用 Tez。

比较 MapReduce 与 Tez 的 Hive 运行时

  1. 按照名为使用控制台创建安装了 Tez 的集群的过程创建集群。除了 Tez 之外,选择 Hive 作为应用程序。

  2. 使用 SSH 连接到集群主节点。有关更多信息,请参阅使用 SSH 连接到主节点

  3. 通过以下命令使用 MapReduce 运行 Hive_CloudFront.q 脚本,其中 region 是您的集群所在的区域:

    hive -f s3://region.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q \ -d INPUT=s3://region.elasticmapreduce.samples -d OUTPUT=s3://myBucket/mr-test/

    输出应与以下内容类似:

    <snip> Starting Job = job_1464200677872_0002, Tracking URL = http://ec2-host:20888/proxy/application_1464200677872_0002/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1464200677872_0002 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2016-05-27 04:53:11,258 Stage-1 map = 0%, reduce = 0% 2016-05-27 04:53:25,820 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 10.45 sec 2016-05-27 04:53:32,034 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 16.06 sec 2016-05-27 04:53:35,139 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 18.9 sec 2016-05-27 04:53:37,211 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 21.6 sec 2016-05-27 04:53:41,371 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 25.08 sec 2016-05-27 04:53:49,675 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 29.93 sec MapReduce Total cumulative CPU time: 29 seconds 930 msec Ended Job = job_1464200677872_0002 Moving data to: s3://myBucket/mr-test/os_requests MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 29.93 sec HDFS Read: 599 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 29 seconds 930 msec OK Time taken: 49.699 seconds
  4. 使用文本编辑器,将 hive.execution.engine 中的tez 值替换为 /etc/hive/conf/hive-site.xml

  5. 使用以下命令结束 HiveServer2 进程:

    sudo kill -9 $(pgrep -f HiveServer2)

    Upstart 将自动重新启动 Hive 服务器,并加载您的配置更改。

  6. 现在使用以下命令通过 Tez 执行引擎运行此作业:

    hive -f s3://region.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q \ -d INPUT=s3://region.elasticmapreduce.samples -d OUTPUT=s3://myBucket/tez-test/

    输出应与以下内容类似:

    Time taken: 0.517 seconds Query ID = hadoop_20160527050505_dcdc075f-8338-4041-adc3-d2ffe69dfcdd Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1464200677872_0003) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 27.61 s -------------------------------------------------------------------------------- Moving data to: s3://myBucket/tez-test/os_requests OK Time taken: 30.711 seconds

    运行同一应用程序所需的时间比使用 Tez 少约 20 秒 (40%)。