- 新建maven項目
- +Create New Project… ->Maven -> Next
- 填寫好GroupId和ArtifactId 點擊Next -> Finish
- +Create New Project… ->Maven -> Next
- 編寫wordcount項目
- 建立項目結構目錄:右鍵java -> New -> package 輸入package路徑(本例是com.hadoop.wdcount)建立package。類似的方式在創建好的package下建立三個類WordcountMain、WordcountMapper、WordcountReducer
2、 編寫pom.xml配置(引入要用到的hadoop的jar包)
- 建立項目結構目錄:右鍵java -> New -> package 輸入package路徑(本例是com.hadoop.wdcount)建立package。類似的方式在創建好的package下建立三個類WordcountMain、WordcountMapper、WordcountReducer
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>sa.hadoop</groupId>
<artifactId>wordcount</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.7</version>
<!-- 我們用的是2.7.7版本的hadoop -->
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.7</version>
</dependency>
</dependencies>
</project>
3、 編寫項目代碼
完成剛剛建立的三個類中的邏輯實現。
(1) WordcountMapper.java
package com.hadoop.wdcount;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import java.io.IOException;
public class WordcountMapper extends org.apache.hadoop.mapreduce.Mapper<LongWritable,Text,Text,IntWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line=value.toString();
String[] words=line.split(" ");
for (String word:words){
context.write(new Text(word),new IntWritable(1));
}
}
}
(2)WordcountReducer.java
package com.hadoop.wdcount;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
import java.util.Iterator;
public class WordcountReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
Integer counts=0;
for (IntWritable value:values){
counts+=value.get();
}
context.write(key,new IntWritable(counts));
}
}
(3)WordcountMain.java
package com.hadoop.wdcount;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordcountMain {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "wordcount");
job.setJarByClass(WordcountMain.class);
job.setMapperClass(WordcountMapper.class);
job.setReducerClass(WordcountReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean flag = job.waitForCompletion(true);
if (!flag) {
System.out.println("wordcount failed!");
}
}
}
-
將項目打包成jar
- 右鍵項目名稱 -> Open Module Settings
- Artifacts -> + -> JAR -> From modules with dependencies…
- 填寫Main Class(點擊…選擇WordcountMain),然後選擇extract to the target JAR,點擊OK。
- 勾選include in project build ,其中Output directory爲最後的輸出目錄,下面output layout是輸出的各jar包,點擊ok
- 點擊菜單Build——>Build Aritifacts…
- 選擇Build,結果可到前面4的output目錄查看或者項目結構中的out目錄
- 右鍵項目名稱 -> Open Module Settings
-
執行驗證(這裏採用win環境下的hadoop2.7.6作爲例子,wsl暫時未驗證)
-
先在創建jar包路徑下(C:\Users\USTC\Documents\maxyi\Java\wordcount\out\artifacts\wordcount_jar)建立一個input1.txt文檔,並添加內容“I believe that I will succeed!”並保存。等會兒要將該txt文件上傳到hadoop。
-
運行hadoop打開所有節點
cd hadoop-2.7.6/sbin
start-all.cmd
-
運行成功後,來到之前創建的jar包路徑,將編寫好的txt文件上傳到hadoop
cd /
cd C:\Users\USTC\Documents\maxyi\Java\wordcount\out\artifacts \wordcount_jar
hadoop fs -put ./input1.txt /input1
可以用以下代碼查看是否上傳成功。
hadoop fs -ls /
4、 刪除wordcount.jar/META-INF/LICENSE,否則不能創建hadoop運行時不能創建license,會報錯。
5、 運行wordcount
hadoop jar wordcount.jar com.hadoop.wdcount.WordcountMain /input1 /output2
jar 命令後有四個參數,
第一個wordcount.jar是打好的jar包
第二個com.hadoop.wdcount.WordcountMain是java工程總編寫的主類,這裏要包含package路徑
第三個/input1是剛剛上傳的輸入
第四個/output2是wordcount的輸出(一定要新建,不能重用之前建立的)
6、 下載輸出文件查看結果是否正確
hadoop fs -get /output2
-