hadoop jar命令解讀

原創

2020-02-20 20:39

假如執行下面命令：
cd /home/hadoop/ && hadoop jar ./test/wordcount/wordcount.jar org.codetree.hadoop.v1.WordCount /test/chqz/input /test/chqz/output
那麼這個命令內部到底做了些什麼呢？

1、首先，在 ${HADOOP_HOME}/bin/hadoop 腳本中我們可以看到有如下代碼：

由於這裏$starting_secure_dn=false的(這裏可以參見hadoop腳本)，所以最終會執行下面這行代碼：

從上面shell腳本中，我們可以明確看出當執行hadoop jar命令時，實際上執行了org.apache.hadoop.util.RunJar類。
下面#run it這一行代碼實質上是爲執行這個類的main方法設置所需的類路徑classpath。

2、繼續研究org.apache.hadoop.util.RunJar來內發生的事情：

因此，命令hadoop jar ./test/wordcount/wordcount.jar org.codetree.hadoop.v1.WordCount /test/chqz/input /test/chqz/output的各段的含義：
(1) hadoop：${HADOOP_HOME}/bin下的shell腳本名。
(2) jar：hadoop腳本需要的command參數。
(3) ./test/wordcount/wordcount.jar：要執行的jar包在本地文件系統中的完整路徑，參遞給RunJar類。
(4) org.codetree.hadoop.v1.WordCount：main方法所在的類，參遞給RunJar類。
(5) /test/chqz/input：傳遞給WordCount類，作爲DFS文件系統的路徑，指示輸入數據來源。
(6) /test/chqz/output：傳遞給WordCount類，作爲DFS文件系統的路徑，指示輸出數據路徑。

3、org.codetree.hadoop.v1.WordCount類代碼：

Source Code

	WordCount.java
1.	package org.myorg;
2.
3.	import java.io.IOException;
4.	import java.util.*;
5.
6.	import org.apache.hadoop.fs.Path;
7.	import org.apache.hadoop.conf.*;
8.	import org.apache.hadoop.io.*;
9.	import org.apache.hadoop.mapred.*;
10.	import org.apache.hadoop.util.*;
11.
12.	public class WordCount {
13.
14.	public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
15.	private final static IntWritable one = new IntWritable(1);
16.	private Text word = new Text();
17.
18.	public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
19.	String line = value.toString();
20.	StringTokenizer tokenizer = new StringTokenizer(line);
21.	while (tokenizer.hasMoreTokens()) {
22.	word.set(tokenizer.nextToken());
23.	output.collect(word, one);
24.	}
25.	}
26.	}
27.
28.	public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
29.	public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
30.	int sum = 0;
31.	while (values.hasNext()) {
32.	sum += values.next().get();
33.	}
34.	output.collect(key, new IntWritable(sum));
35.	}
36.	}
37.
38.	public static void main(String[] args) throws Exception {
39.	JobConf conf = new JobConf(WordCount.class);
40.	conf.setJobName("wordcount");
41.
42.	conf.setOutputKeyClass(Text.class);
43.	conf.setOutputValueClass(IntWritable.class);
44.
45.	conf.setMapperClass(Map.class);
46.	conf.setCombinerClass(Reduce.class);
47.	conf.setReducerClass(Reduce.class);
48.
49.	conf.setInputFormat(TextInputFormat.class);
50.	conf.setOutputFormat(TextOutputFormat.class);
51.
52.	FileInputFormat.setInputPaths(conf, new Path(args[0]));
53.	FileOutputFormat.setOutputPath(conf, new Path(args[1]));
54.
55.	JobClient.runJob(conf);
57.	}
58.	}
59.

呼啦圈521

發佈了68 篇原創文章 · 獲贊 12 · 訪問量 12萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

hadoop jar命令解讀

Source Code

java調用hive時必須啓動hiveserver

storm理解

hive數據類型(翻譯自Hive Wiki)

hive和impala比較（轉載）

windows10下安裝tensorflow2.0-GPU和Cupy（不用搞CUDA+cudnn）

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結