MapReduce中自定義數據類型作爲key

原創

2020-06-23 21:40

在MapReduce編程模型中key通常是用來排序和劃分的。排序是指按照key的大小順序將 <k,v> 鍵值對排序，劃分是指按照key的hashcode值將 <k,v>劃分到指定的Reducer節點上。

MapReduce中的key類型必須實現WritableComparable接口，爲了方便用戶使用，Hadoop提供有一些內置的key類型。常見的key類型有 IntWritable 、LongWritable 、Text、FloatWritable等。但有時我們還需使用自己定義的數據類型作爲key。

下面小編就用數據表求交集的例子來介紹自定義數據類型怎麼作爲key。

有如下兩個數據表，數據關係屬於同一模型，字段有 id、name 、age 、grade。數據表table1和table2中的內容如下所示。

求交集的目的是輸出兩個表中相同的記錄。如上table1和table2所示，應該輸出id爲1、2的記錄。

求交集的思想爲在Map階段對每一條記錄r輸出 <r,1>，然後在Reduce階段彙總計數，將計數爲2的記錄r輸出即可。

我們用一個Stu類來存儲一條記錄，並將Stu類作爲key使用，Stu類要實現WritableComparable接口，要注意一下幾點：

必須有一個無參的構造函數。
必須重寫WritableComparable接口的hashCode()方法和equals()方法以及compareTo()方法。

Stu類的代碼如下：

package Eg1;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.WritableComparable;

/**
 * Stu類設計
 * @author liuchen
 */
public class Stu implements WritableComparable<Stu>
{
	private int id;
	private String name;
	private int age;
	private int grade;
	//無參的構造函數必不可少
	public Stu(){	
	}
	public Stu(int a, String b, int c, int d){
		this.id = a;
		this.name = b;
		this.age = c;
		this.grade = d;
	}
	public void readFields(DataInput in) throws IOException {
		this.id = in.readInt();
		this.name = in.readUTF();
		this.age = in.readInt();
		this.grade = in.readInt();	
	}
	public void write(DataOutput out) throws IOException {
		out.writeInt(id);
		out.writeUTF(name);
		out.writeInt(age);
		out.writeInt(grade);
	}
	//按照id降序排列
	public int compareTo(Stu o) {
		return this.id >= o.id ? -1 : 1 ;
	}
	public int hashCode() {
            return this.id + this.name.hashCode() + this.age + this.grade;
	}
	public boolean equals(Object obj) {
	    if (!(obj instanceof Stu)) {  
                return false;  
            }else{  
        	Stu r = (Stu) obj; 
    		if (this.id == r.id && this.name.equals(r.name) && this.age == r.age && this.grade == r.grade){
    	            return true;
    		}else{
    	            return false;
    		}
            }	 
	}
	
	public String toString() {
		return Integer.toString(id) + "\t" + name + "\t" + Integer.toString(age) + "\t" + Integer.toString(grade);
	}
}

Map階段的map()函數如下：

protected void map(LongWritable key, Text value,Context context)throws IOException, InterruptedException {
		final IntWritable one = new IntWritable(1);
		String[] arr = value.toString().split("\t");
		Stu stu = new Stu(Integer.parseInt(arr[0]),arr[1],Integer.parseInt(arr[2]),Integer.parseInt(arr[3]));
		context.write(stu,one);
}

Reduce階段的reduce()函數如下：

protected void reduce(Stu arg0, Iterable<IntWritable> arg1,Context arg2)throws IOException, InterruptedException {
		int sum = 0;
		for(IntWritable val : arg1){
			sum += val.get();
		}
		if(sum == 2){
			arg2.write(arg0,NullWritable.get());
		}
}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

MapReduce中自定義數據類型作爲key

《日本蠟燭圖》讀書筆記 & 技術分析回測

《期貨-市場技術分析》讀書筆記

Python多線程編程深度探索：從入門到實戰

mongodb處理json數據很好

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

基於機器學習中KNN算法的車牌字符識別

基於MapReduce的並行化大矩陣乘法

從影評的角度看《後來的我們》

MapReduce執行框架的組件和執行流程

幫別人做畢業設計程序是一種怎樣的體驗

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結