Amazon EMR
管理指南
AWS 文档中描述的 AWS 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 Amazon AWS 入门

测试集群

本节介绍如何通过对示例输入文件执行字数统计来测试集群。

创建文件和运行测试作业

  1. 以用户 hadoop 通过 SSH 连接主节点。将您的 .pem 凭证文件传递到带有 -i 标志的 ssh,如此示例所示:

    ssh -i /path_to_pemfile/credentials.pem hadoop@masterDNS.amazonaws.com.cn
  2. 创建一个简单的文本文件:

    cd /mapr/MapR_EMR.amazonaws.com.cn mkdir in echo "the quick brown fox jumps over the lazy dog" > in/data.txt
  3. 运行以下命令对文本文件执行字数统计:

    hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar wordcount /mapr/MapR_EMR.amazonaws.com/in/ /mapr/MapR_EMR.amazonaws.com/out/

    在任务运行时,您可以看到类似以下内容的终端输出:

    12/06/09 00:00:37 INFO fs.JobTrackerWatcher: Current running JobTracker is: ip10118194139.ec2.internal/10.118.194.139:9001 12/06/09 00:00:37 INFO input.FileInputFormat: Total input paths to process : 1 12/06/09 00:00:37 INFO mapred.JobClient: Running job: job_201206082332_0004 12/06/09 00:00:38 INFO mapred.JobClient: map 0% reduce 0% 12/06/09 00:00:50 INFO mapred.JobClient: map 100% reduce 0% 12/06/09 00:00:57 INFO mapred.JobClient: map 100% reduce 100% 12/06/09 00:00:58 INFO mapred.JobClient: Job complete: job_201206082332_0004 12/06/09 00:00:58 INFO mapred.JobClient: Counters: 25 12/06/09 00:00:58 INFO mapred.JobClient: Job Counters 12/06/09 00:00:58 INFO mapred.JobClient: Launched reduce tasks=1 12/06/09 00:00:58 INFO mapred.JobClient: Aggregate execution time of mappers(ms)=6193 12/06/09 00:00:58 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/09 00:00:58 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/09 00:00:58 INFO mapred.JobClient: Launched map tasks=1 12/06/09 00:00:58 INFO mapred.JobClient: Datalocalmap tasks=1 12/06/09 00:00:58 INFO mapred.JobClient: Aggregate execution time of reducers(ms)=4875 12/06/09 00:00:58 INFO mapred.JobClient: FileSystemCounters 12/06/09 00:00:58 INFO mapred.JobClient: MAPRFS_BYTES_READ=385 12/06/09 00:00:58 INFO mapred.JobClient: MAPRFS_BYTES_WRITTEN=276 12/06/09 00:00:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=94449 12/06/09 00:00:58 INFO mapred.JobClient: MapReduce Framework 12/06/09 00:00:58 INFO mapred.JobClient: Map input records=1 12/06/09 00:00:58 INFO mapred.JobClient: Reduce shuffle bytes=94 12/06/09 00:00:58 INFO mapred.JobClient: Spilled Records=16 12/06/09 00:00:58 INFO mapred.JobClient: Map output bytes=80 12/06/09 00:00:58 INFO mapred.JobClient: CPU_MILLISECONDS=1530 12/06/09 00:00:58 INFO mapred.JobClient: Combine input records=9 12/06/09 00:00:58 INFO mapred.JobClient: SPLIT_RAW_BYTES=125 12/06/09 00:00:58 INFO mapred.JobClient: Reduce input records=8 12/06/09 00:00:58 INFO mapred.JobClient: Reduce input groups=8 12/06/09 00:00:58 INFO mapred.JobClient: Combine output records=8 12/06/09 00:00:58 INFO mapred.JobClient: PHYSICAL_MEMORY_BYTES=329244672 12/06/09 00:00:58 INFO mapred.JobClient: Reduce output records=8 12/06/09 00:00:58 INFO mapred.JobClient: VIRTUAL_MEMORY_BYTES=3252969472 12/06/09 00:00:58 INFO mapred.JobClient: Map output records=9 12/06/09 00:00:58 INFO mapred.JobClient: GC time elapsed (ms)=1
  4. 检查 /mapr/MapR_EMR.amazonaws.com.cn/out 目录,以查找包含作业结果的名为 part-r-00000 文件。

    cat out/partr00000 brown 1 dog 1 fox 1 jumps 1 lazy 1 over 1 quick 1 the 2