以經典數詞程序爲例:
當mapper接收到一行value時,
package com.datang.mr;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable>{
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String [] str = value.toString().split(" ");
//接收到一行value,分成一個個詞
for(int i = 0;i<str.length;i++){
context.write(new Text(str[i]), new IntWritable(1));
//每個詞都對應一個鍵值對 <key:xxx, value:1>
}
}
}
reducer獲得經過shulffer後的數據
package com.datang.mr;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
// a 1
// a 1
//b 1
//b 1
//b 1
public class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
@Override
protected void reduce(Text arg0, Iterable<IntWritable> arg1,Context arg2)
throws IOException, InterruptedException {
int sum = 0;
for(IntWritable i :arg1){
sum = sum+i.get();
}
//統計出此迭代器的長度,就是此單詞(arg0)的總個數
arg2.write(arg0, new IntWritable(sum));
}
}
- 按行分割 得到一個個key(行號) value(行內容),分配給一個個mapper
- mapper將輸入的鍵值對處理成另一種鍵值對,key(單詞) value(1),意爲在本行中此單詞出現了一次
- shouffer將mapper們的鍵值對歸檔,得到一組組key(單詞)相同的鍵值對
- reducer得到經shouffer的鍵值對組,組名爲key(單詞)以及一個含有對應值的迭代器,迭代器的長度就是此單詞的長度,將其寫入上下文congtext
圖示:
新學大數據,以上只是個人理解,願讀者勘誤