要想wordcount在hadoop上运行,那么必须为wordcount程序指定输入路径和输出路径。输入路径是我们要进行词频统计的文本文件,在这里我们的文件名是20417.txt。而输出路径是词频统计结果存放的路径。如下图所示,是进行参数配置:WordCount.java->右键->Run As->Run Configuration
上述的路径是HDFS中的路径,HDFS路径可以查看下图:
在图一中我们输入完输入输出路径以后,我们点击Apply,但是这个时候不能点击Run,因为这里的run是指在单机上run,而我们是要在hadoop集群上run,因此我们执行以下步骤:WordCount.java->右键->Run as->Run on hadoop。
运行过程中console会提示一些信息,如下所示:
- 11/10/09 14:07:50 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
- 11/10/09 14:07:50 INFO input.FileInputFormat: Total input paths to process : 1
- 11/10/09 14:07:50 INFO mapred.JobClient: Running job: job_201110091333_0001
- 11/10/09 14:07:51 INFO mapred.JobClient: map 0% reduce 0%
- 11/10/09 14:07:59 INFO mapred.JobClient: map 100% reduce 0%
- 11/10/09 14:08:12 INFO mapred.JobClient: map 100% reduce 100%
- 11/10/09 14:08:14 INFO mapred.JobClient: Job complete: job_201110091333_0001
- 11/10/09 14:08:14 INFO mapred.JobClient: Counters: 17
- 11/10/09 14:08:14 INFO mapred.JobClient: Job Counters
- 11/10/09 14:08:14 INFO mapred.JobClient: Launched reduce tasks=1
- 11/10/09 14:08:14 INFO mapred.JobClient: Launched map tasks=1
- 11/10/09 14:08:14 INFO mapred.JobClient: Data-local map tasks=1
- 11/10/09 14:08:14 INFO mapred.JobClient: FileSystemCounters
- 11/10/09 14:08:14 INFO mapred.JobClient: FILE_BYTES_READ=143076
- 11/10/09 14:08:14 INFO mapred.JobClient: HDFS_BYTES_READ=674762
- 11/10/09 14:08:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=286184
- 11/10/09 14:08:14 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=205265
- 11/10/09 14:08:14 INFO mapred.JobClient: Map-Reduce Framework
- 11/10/09 14:08:14 INFO mapred.JobClient: Reduce input groups=0
- 11/10/09 14:08:14 INFO mapred.JobClient: Combine output records=10015
- 11/10/09 14:08:14 INFO mapred.JobClient: Map input records=12761
- 11/10/09 14:08:14 INFO mapred.JobClient: Reduce shuffle bytes=0
- 11/10/09 14:08:14 INFO mapred.JobClient: Reduce output records=0
- 11/10/09 14:08:14 INFO mapred.JobClient: Spilled Records=20030
- 11/10/09 14:08:14 INFO mapred.JobClient: Map output bytes=1082004
- 11/10/09 14:08:14 INFO mapred.JobClient: Combine input records=112607
- 11/10/09 14:08:14 INFO mapred.JobClient: Map output records=112607
- 11/10/09 14:08:14 INFO mapred.JobClient: Reduce input records=10015
- 11/10/09 14:08:14 INFO input.FileInputFormat: Total input paths to process : 1
- 11/10/09 14:08:14 INFO mapred.JobClient: Running job: job_201110091333_0002
- 11/10/09 14:08:15 INFO mapred.JobClient: map 0% reduce 0%
- 11/10/09 14:08:24 INFO mapred.JobClient: map 100% reduce 0%
- 11/10/09 14:08:36 INFO mapred.JobClient: map 100% reduce 100%
- 11/10/09 14:08:38 INFO mapred.JobClient: Job complete: job_201110091333_0002
- 11/10/09 14:08:38 INFO mapred.JobClient: Counters: 17
- 11/10/09 14:08:38 INFO mapred.JobClient: Job Counters
- 11/10/09 14:08:38 INFO mapred.JobClient: Launched reduce tasks=1
- 11/10/09 14:08:38 INFO mapred.JobClient: Launched map tasks=1
- 11/10/09 14:08:38 INFO mapred.JobClient: Data-local map tasks=1
- 11/10/09 14:08:38 INFO mapred.JobClient: FileSystemCounters
- 11/10/09 14:08:38 INFO mapred.JobClient: FILE_BYTES_READ=143076
- 11/10/09 14:08:38 INFO mapred.JobClient: HDFS_BYTES_READ=205265
- 11/10/09 14:08:38 INFO mapred.JobClient: FILE_BYTES_WRITTEN=286184
- 11/10/09 14:08:38 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=104533
- 11/10/09 14:08:38 INFO mapred.JobClient: Map-Reduce Framework
- 11/10/09 14:08:38 INFO mapred.JobClient: Reduce input groups=0
- 11/10/09 14:08:38 INFO mapred.JobClient: Combine output records=0
- 11/10/09 14:08:38 INFO mapred.JobClient: Map input records=10015
- 11/10/09 14:08:38 INFO mapred.JobClient: Reduce shuffle bytes=0
- 11/10/09 14:08:38 INFO mapred.JobClient: Reduce output records=0
- 11/10/09 14:08:38 INFO mapred.JobClient: Spilled Records=20030
- 11/10/09 14:08:38 INFO mapred.JobClient: Map output bytes=123040
- 11/10/09 14:08:38 INFO mapred.JobClient: Combine input records=0
- 11/10/09 14:08:38 INFO mapred.JobClient: Map output records=10015
- 11/10/09 14:08:38 INFO mapred.JobClient: Reduce input records=10015