1.數據樣例如下
Tom Lucy
Tom Jack
Jone Lucy
Jone Jack
Lucy Mary
Lucy Ben
Jack Alice
Jack Jesse
Terry Alice
Terry Jesse
Philip Terry
Philip Alma
Mark Terry
Mark Alma
2.map的代碼如下:
public static class ChildParentMapper extends MapReduceBase implements Mapper<Object, Text, Text, Text> {
private static Logger logger = Logger.getLogger(ChildParentMapper.class);
String childname = new String();
String parientname = new String();
String flag = new String();//左右表標識符
@Override
public void map(Object ikey, Text ivalue, OutputCollector<Text, Text> output, Reporter arg3)
throws IOException {
String str[] = ivalue.toString().split(" ");//分割出子和父的名稱
if (str[0].compareTo("child") != 0) {//忽略表頭
childname = str[0];//得到子名稱
parientname = str[1];//得到父名稱
// 左表=左表標識+子名稱+父名稱
flag = "1";
logger.info(new Text(parientname)+","+ new Text(flag + "+" + childname + "+" + parientname));
output.collect(new Text(parientname), new Text(flag + "+" + childname + "+" + parientname));
// 右表=右表標識+子名稱+父名稱
flag = "2";
logger.info(new Text(parientname)+","+ new Text(flag + "+" + childname + "+" + parientname));
output.collect(new Text(childname), new Text(flag + "+" + childname + "+" + parientname));
}
}
}
代碼解析:
第一步,定義以下三個參數:
1.子女名稱(childname ):
2.父母名稱(parientname ):
3.區分左表和右表的一個標識符號(flag ):
String childname = new String();
String parientname = new String();
String flag = new String();//左右表標識符
第二步,切割數據,分別得到子女名稱和父母名稱
String str[] = ivalue.toString().split(" ");
childname = str[0];//得到子名稱
parientname = str[1];//得到父名稱
第三步,做兩個key,value的輸出,分別標識出左表和右表
第一個:<父母名稱,左表表標識符+子名稱+父名稱>
flag = "1";
output.collect(new Text(parientname), new Text(flag + "+" + childname + "+" + parientname));
第二個:<子女名稱,右表表標識符+子名稱+父名稱>
flag = "2";
output.collect(new Text(childname), new Text(flag + "+" + childname + "+" + parientname));
第四步,mapper結果:
Alice 1+Terry+Alice
Alice 1+Jack+Alice
Alma 1+Mark+Alma
Alma 1+Philip+Alma
Ben 1+Lucy+Ben
Jack 2+Jack+Alice
Jack 1+Tom+Jack
Jack 1+Jone+Jack
Jack 2+Jack+Jesse
Jesse 1+Jack+Jesse
Jesse 1+Terry+Jesse
Jone 2+Jone+Lucy
Jone 2+Jone+Jack
Lucy 1+Tom+Lucy
Lucy 2+Lucy+Ben
Lucy 2+Lucy+Mary
Lucy 1+Jone+Lucy
Mark 2+Mark+Alma
Mark 2+Mark+Terry
Mary 1+Lucy+Mary
Philip 2+Philip+Terry
Philip 2+Philip+Alma
Terry 1+Philip+Terry
Terry 1+Mark+Terry
Terry 2+Terry+Alice
Terry 2+Terry+Jesse
Tom 2+Tom+Lucy
Tom 2+Tom+Jack
4.reduce代碼如下:
public static class ChildParentReduce extends MapReduceBase implements Reducer<Text, Text, Text, Text> {
private static Logger logger = Logger.getLogger(ChildParentReduce.class);
private int num = 0;
@Override
public void reduce(Text ikey, Iterator<Text> ivalue, OutputCollector<Text, Text> output, Reporter arg3)
throws IOException {
if (num == 0) {// 構造輸出表頭
output.collect(new Text("grandchild"), new Text("grandparient"));
num++;
}
int grandchildnum = 0;//多少個孫
int grandparientnum = 0;//多少個爺
String[] grandchild = new String[100];
String[] grandparient = new String[100];
while (ivalue.hasNext()){
String[] record = ivalue.next().toString().split("\\+");//根據“+”把數據分成三份
//左表數據
if (record[0].compareTo("1") == 0) {
grandchild[grandchildnum] = record[1];//拿到子名,放到數組中
grandchildnum++;
}
//右表數據
else if (record[0].compareTo("2") == 0) {
grandparient[grandparientnum] = record[2];//拿到父名,放到數組中
grandparientnum++;
}
}
if (grandchildnum != 0 && grandparientnum != 0) {
//執行笛卡爾乘積
for (int i = 0; i < grandparientnum; i++) {
for (int j = 0; j < grandchildnum; j++) {
logger.info(new Text(grandchild[i])+","+new Text(grandparient[j]));
output.collect(new Text(grandchild[i]), new Text(grandparient[j]));
}
}
}
}
代碼解析:
第一步:如果需要表頭就在第一行輸出表頭
if (num == 0) {// 構造輸出表頭
output.collect(new Text("grandchild"), new Text("grandparient"));
num++;
}
第二步:定義四個參數,分別用於存放孫子和祖輩的數組,孫子的數量和祖輩的數量
int grandchildnum = 0;//多少個孫
int grandparientnum = 0;//多少個爺
String[] grandchild = new String[100];
String[] grandparient = new String[100];
第三步:解析map中得到的value-list
第一:要解析的內容應該是這樣的:以mapper的結果Lucy作爲key,解析如下數據:
<Lucy, 1+Tom+Lucy,2+Lucy+Ben,2+Lucy+Mary,1+Jone+Lucy>
循環value:
//左表數據
if (record[0].compareTo("1") == 0) {
grandchild[grandchildnum] = record[1];//拿到子名,放到數組中
grandchildnum++;
}
孫子:Tom,Jone
//右表數據
else if (record[0].compareTo("2") == 0) {
grandparient[grandparientnum] = record[2];//拿到父名,放到數組中
grandparientnum++;
}
祖輩;Ben,Mary
使用笛卡爾乘積,得到祖輩與孫輩的關係結果:
if (grandchildnum != 0 && grandparientnum != 0) {
//執行笛卡爾乘積
for (int i = 0; i < grandparientnum; i++) {
for (int j = 0; j < grandchildnum; j++) {
logger.info(new Text(grandchild[i])+","+new Text(grandparient[j]));
output.collect(new Text(grandchild[i]), new Text(grandparient[j]));
}
}
}
Tom,Ben
Tom,Mary
Jone ,Ben
Jone ,Mary
附上main方法:
public static void main(String[] args) {
try {
String inputDir = "hdfs://192.168.1.61:9000/home/zhongml/childparent/input";
String outputDir = "hdfs://192.168.1.61:9000/home/zhongml/childparent/output";
JobConf con = new JobConf(ChildParent2.class);
con.setJobName("childparent");
con.setMapOutputKeyClass(Text.class);
con.setMapOutputValueClass(Text.class);
con.setOutputKeyClass(Text.class);
con.setOutputValueClass(Text.class);
con.setMapperClass(ChildParentMapper.class);
con.setReducerClass(ChildParentReduce.class);
con.setInputFormat(TextInputFormat.class);
con.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(con, new Path(inputDir));
FileOutputFormat.setOutputPath(con, new Path(outputDir));
JobClient.runJob(con);
System.exit(0);
} catch (IllegalArgumentException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}