安裝好 HADOOP+HBASE 於是開始在403準備試試好不好用然後做實驗~萬能的 wordcound 我來了~
1、mywordcound.java 書寫
主要是從網上抄了來,據說 eclipse 不支持 hadoop-2.2.0的插件,所以不能用來編譯和封包只能手寫?不詳。抄來的代碼如下所示:
//must I just coding without the help of eclipse ?
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class MyWordCount{
// mapper part *
public static class WordCountMapper extends Mapper<Object, Text, Text, IntWritable>{
private static final IntWritable one = new IntWritable(1);
private Text word = new Text();
//MAP FUNCTION
protected void map(Object key, Text value, Context context) throws IOException, InterruptedException{
String line = value.toString();
StringTokenizer words = new StringTokenizer(line); //split line into words
while(words.hasMoreElements()){
word.set(words.nextToken());
context.write(word, one);
}
}
}
// reducer part *
public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
private IntWritable totalNum = new IntWritable();
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
int sum = 0;
Iterator<IntWritable> it = values.iterator();
while(it.hasNext()){
sum += it.next().get();
}
totalNum.set(sum);
context.write(key, totalNum);
}
}
// configuration part *
@SuppressWarnings("deprecation")
public static void main(String[] args) throws Exception{
Configuration conf = new Configuration(); //~~~commons-logging-1.1.1.jar
Job job = new Job(conf, "MyWordCount"); //create a job ~~~ /hadoop/yarn/lib/log4j-1.2.17.jar
job.setJarByClass(MyWordCount.class); //set mapper.class reducer.class and Job.jar
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setCombinerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class); //set output key & value
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0])); //set input & output paths
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true)?0:1); //whether wait job to done or not
}
}
2、編譯
javac -classpath ./lib/xxx.jar:./lib/xxx.jar -d ../wordcount_classes *.java -encoding gdb
-classpath爲該java文件需要引用的外部jar包,用“:”符號分隔開;
-d 是指將編譯好的 .class 文件存放到 ./wordcount_classes 文件夾內;
*.java 是編譯的java文件對象
-encoding gdb 這個可有可無,實際上是沒有寫的。
3、打jar包
打包命令如下,也就是這個命令的錯誤直接導致了後面的 ClassNotFoundException 博大精深的java啊..
jar -cvf ../MyWordCount.jar -C wordcount_classes/ .
這裏注意後面的-C和最後的" ."(有一個空格)
整體編譯和打包的情況就如下圖所示了:
4、執行
ClassNotFound的圖今天看了一天了,所以截下來留着如下:
實際的運行結果應該是這個樣子的:
5、題外話
嘗試過修改 HADOOP_CLASSPATH 雖然沒找到估計是被HADOOP-2.2.0給刪了吧,但是楞加上也是沒有用的;
比對了使用 jar -cvf xxx.jar ./classes/* 和 jar -cvf xxx.jar -C ./classes . 生成的 jar 包,看裏面包含的內容和大小都是一樣一樣的,很是奇怪;
本來打算去把 share/hadoop/mapreduce/sources 裏面的 hadoop-mapreduce-examples-2.2.0-sources.jar 包裏面的文件拿出來的,發現裏面竟然都是 .java 文件唉,如下圖所示~ 還好小師弟及時跑來了,一個空格一個.解決問題~還有好吃的油桃和李子(xx黑加侖?)謝謝><