MapReduce編程實例（五）

前提準備：

1.hadoop安裝運行正常。Hadoop安裝配置請參考：Ubuntu下 Hadoop 1.2.1 配置安裝

2.集成開發環境正常。集成開發環境配置請參考：Ubuntu 搭建Hadoop源碼閱讀環境

MapReduce編程實例：

MapReduce編程實例（一）,詳細介紹在集成環境中運行第一個MapReduce程序 WordCount及代碼分析

MapReduce編程實例（二），計算學生平均成績

MapReduce編程實例（三），數據去重

MapReduce編程實例（四），排序

MapReduce編程實例（五），MapReduce實現單表關聯

MapReduce編程實例（六），MapReduce實現多表關聯

單表關聯：

描述：

單表的自連接求解問題。如下表，根據child-parent表列出grandchild-grandparent表的值。

child parent
Tom Lucy
Tom Jim
Lucy David
Lucy Lili
Jim Lilei
Jim SuSan
Lily Green
Lily Bians
Green Well
Green MillShell
Havid James
James LiT
Richard Cheng
Cheng LiHua

問題分析：

顯然需要分解爲左右兩張表來進行自連接，而左右兩張表其實都是child-parent表，通過parent字段做key值進行連接。結合MapReduce的特性，MapReduce會在shuffle過程把相同的key放在一起傳到Reduce進行處理。OK，這下有思路了，將左表的parent作爲key輸出，將右表的child做爲key輸出，這樣shuffle之後很自然的，左右就連接在一起了，有木有！然後通過對左右表進行求迪卡爾積便得到所需的數據。

package com.t.hadoop;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

/**
 * 單表關聯
 * @author daT [email protected]
 *
 */
public class STJoin {
	public static int time = 0;
	
	public static class STJoinMapper extends Mapper<Object, Text, Text, Text>{

		@Override
		protected void map(Object key, Text value, Context context)
				throws IOException, InterruptedException {
			String childName = new String();
			String parentName = new String();
			String relation = new String();
			String line = value.toString();
			int i =0;
			while(line.charAt(i)!=' '){
				i++;
			}
			String[] values = {line.substring(0,i),line.substring(i+1)};
			if(values[0].compareTo("child") != 0){
				childName = values[0];
				parentName = values[1];
				relation = "1";//左右表分區標誌
				context.write(new Text(parentName),new Text(relation+"+"+childName));//左表
				relation = "2";
				context.write(new Text(childName), new Text(relation+"+"+parentName));//右表
			}
		}
	}
	
	public static class STJoinReduce extends Reducer<Text, Text, Text, Text>{

		@Override
		protected void reduce(Text key, Iterable<Text> values,Context context)
				throws IOException, InterruptedException {
			if(time ==0){//輸出表頭
				context.write(new Text("grandChild"), new Text("grandParent"));
				time ++;
			}
			int grandChildNum = 0;
			String[] grandChild = new String[10];
			int grandParentNum = 0;
			String[] grandParent = new String[10];
			Iterator<Text> ite = values.iterator();
			while(ite.hasNext()){
				String record = ite.next().toString();
				int len = record.length();
				int i = 2;
				if(len ==0)	 continue;
				char relation = record.charAt(0);
				
				if(relation == '1'){//是左表拿child
					String childName = new String();
					while(i < len){//解析name
						childName = childName + record.charAt(i);
						i++;
					}
					grandChild[grandChildNum] = childName;
					grandChildNum++;
				}else{//是右表拿parent
					String parentName = new String();
					while(i < len){//解析name
						parentName = parentName + record.charAt(i);
						i++;
					}
					grandParent[grandParentNum] = parentName;
					grandParentNum++;
				}
			}
			//左右兩表求迪卡爾積
			if(grandChildNum!=0&&grandParentNum!=0){
				for(int m=0;m<grandChildNum;m++){
					for(int n=0;n<grandParentNum;n++){
						System.out.println("grandChild "+grandChild[m] +" grandParent "+ grandParent[n]);
						context.write(new Text(grandChild[m]),new Text(grandParent[n]));
					}
				}
			}
		}
	}
	
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{
		Configuration conf = new Configuration();
		String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
		if(otherArgs.length<2){
			System.out.println("parameter error");
			System.exit(2);
		}
		
		Job job = new Job(conf);
		job.setJarByClass(STJoin.class);
		job.setMapperClass(STJoinMapper.class);
		job.setReducerClass(STJoinReduce.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);
		
		FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
		FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
		
		System.exit(job.waitForCompletion(true)?0:1);
	}
}

傳入參數：

hdfs://localhost:9000/user/dat/stjon_input hdfs://localhost:9000/user/dat/stjon_output

輸出結果：

grandChild grandParent
Richard LiHua
Lily Well
Lily MillShell
Havid LiT
Tom Lilei
Tom SuSan
Tom Lili
Tom David

OK~!歡迎同學們多多交流～～

MapReduce編程實例（五）

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

Python 實現簡單的通訊錄

Python 創建文件備份

一些常用shell腳步

初識Hadoop

Hive,Pig,HBase 區別與聯繫

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結