案例3,mapreduce單表關聯,根據child-parient表解析出grandchild-grandparient表

1.數據樣例如下

Tom Lucy 

Tom Jack 

Jone Lucy 

Jone Jack 

Lucy Mary 

Lucy Ben 

Jack Alice 

Jack Jesse 

Terry Alice 

Terry Jesse 

Philip Terry 

Philip Alma 

Mark Terry 

Mark Alma

2.map的代碼如下:

            public static class ChildParentMapper extends MapReduceBase implements Mapper<Object, Text, Text, Text> {

                        private static Logger logger = Logger.getLogger(ChildParentMapper.class);

                        String childname = new String();

                        String parientname = new String();

                        String flag = new String();//左右表標識符

                        @Override

                        public void map(Object ikey, Text ivalue, OutputCollector<Text, Text> output, Reporter arg3)

                                                throws IOException {

                                    String str[] = ivalue.toString().split(" ");//分割出子和父的名稱

                                    if (str[0].compareTo("child") != 0) {//忽略表頭

                                                

                                                childname = str[0];//得到子名稱

                                                parientname = str[1];//得到父名稱

                                                // 左表=左表標識+子名稱+父名稱

                                                flag = "1";

                                                logger.info(new Text(parientname)+","+ new Text(flag + "+" + childname + "+" + parientname));

                                                output.collect(new Text(parientname), new Text(flag + "+" + childname + "+" + parientname));

                                                // 右表=右表標識+子名稱+父名稱

                                                flag = "2";

                                                logger.info(new Text(parientname)+","+ new Text(flag + "+" + childname + "+" + parientname));

                                                output.collect(new Text(childname), new Text(flag + "+" + childname + "+" + parientname));

                                    }

                        }

            }


代碼解析:

第一步,定義以下三個參數:

1.子女名稱(childname ):

2.父母名稱(parientname ):

3.區分左表和右表的一個標識符號(flag ):

  String childname = new String();

  String parientname = new String();

  String flag = new String();//左右表標識符


第二步,切割數據,分別得到子女名稱和父母名稱


  String str[] = ivalue.toString().split(" ");

  childname = str[0];//得到子名稱

  parientname = str[1];//得到父名稱


第三步,做兩個key,value的輸出,分別標識出左表和右表


           第一個:<父母名稱,左表表標識符+子名稱+父名稱>

                                                  flag = "1";

                                                output.collect(new Text(parientname), new Text(flag + "+" + childname + "+" + parientname));

           第二個:<子女名稱,右表表標識符+子名稱+父名稱>

                                                flag = "2";

                                                output.collect(new Text(childname), new Text(flag + "+" + childname + "+" + parientname));

第四步,mapper結果:


Alice  1+Terry+Alice

Alice  1+Jack+Alice

Alma   1+Mark+Alma

Alma   1+Philip+Alma

Ben    1+Lucy+Ben

Jack   2+Jack+Alice

Jack   1+Tom+Jack

Jack   1+Jone+Jack

Jack   2+Jack+Jesse

Jesse  1+Jack+Jesse

Jesse  1+Terry+Jesse

Jone   2+Jone+Lucy

Jone   2+Jone+Jack

Lucy   1+Tom+Lucy

Lucy   2+Lucy+Ben

Lucy   2+Lucy+Mary

Lucy   1+Jone+Lucy

Mark   2+Mark+Alma

Mark   2+Mark+Terry

Mary   1+Lucy+Mary

Philip 2+Philip+Terry

Philip 2+Philip+Alma

Terry  1+Philip+Terry

Terry  1+Mark+Terry

Terry  2+Terry+Alice

Terry  2+Terry+Jesse

Tom    2+Tom+Lucy

Tom    2+Tom+Jack

4.reduce代碼如下:

            public static class ChildParentReduce extends MapReduceBase implements Reducer<Text, Text, Text, Text> {

                        private static Logger logger = Logger.getLogger(ChildParentReduce.class);

                        private int num = 0;

                        @Override

                        public void reduce(Text ikey, Iterator<Text> ivalue, OutputCollector<Text, Text> output, Reporter arg3)

                                                throws IOException {

                                    if (num == 0) {// 構造輸出表頭

                                                output.collect(new Text("grandchild"), new Text("grandparient"));

                                                num++;

                                    }

                                    int grandchildnum = 0;//多少個孫

                                    int grandparientnum = 0;//多少個爺

                                    String[] grandchild = new String[100];

                                    String[] grandparient = new String[100];

                                    while (ivalue.hasNext()){

                                                String[] record = ivalue.next().toString().split("\\+");//根據“+”把數據分成三份

                                                //左表數據

                                                if (record[0].compareTo("1") == 0) {

                                                            grandchild[grandchildnum] = record[1];//拿到子名,放到數組中

                                                            grandchildnum++;

                                                }

                                                //右表數據

                                                else if (record[0].compareTo("2") == 0) {

                                                            grandparient[grandparientnum] = record[2];//拿到父名,放到數組中

                                                            grandparientnum++;

                                                }

                                    }

                                    if (grandchildnum != 0 && grandparientnum != 0) {

                                    //執行笛卡爾乘積

                                                for (int i = 0; i < grandparientnum; i++) {

                                                            for (int j = 0; j < grandchildnum; j++) {

                                                                        logger.info(new Text(grandchild[i])+","+new Text(grandparient[j]));

                                                                        output.collect(new Text(grandchild[i]), new Text(grandparient[j]));

                                                            }

                                                }

                                    }

                        }


代碼解析:

第一步:如果需要表頭就在第一行輸出表頭

                                    if (num == 0) {// 構造輸出表頭

                                                output.collect(new Text("grandchild"), new Text("grandparient"));

                                                num++;

                                    }

第二步:定義四個參數,分別用於存放孫子和祖輩的數組,孫子的數量和祖輩的數量


                                    int grandchildnum = 0;//多少個孫

                                    int grandparientnum = 0;//多少個爺

                                    String[] grandchild = new String[100];

                                    String[] grandparient = new String[100];

第三步:解析map中得到的value-list

           第一:要解析的內容應該是這樣的:以mapper的結果Lucy作爲key,解析如下數據:

                

 

<Lucy, 1+Tom+Lucy,2+Lucy+Ben,2+Lucy+Mary,1+Jone+Lucy>


循環value

                                              //左表數據

                                                if (record[0].compareTo("1") == 0) {

                                                            grandchild[grandchildnum] = record[1];//拿到子名,放到數組中

                                                            grandchildnum++;

                                                }

孫子:Tom,Jone


                                                 //右表數據

                                                else if (record[0].compareTo("2") == 0) {

                                                            grandparient[grandparientnum] = record[2];//拿到父名,放到數組中

                                                            grandparientnum++;

                                                }

祖輩;Ben,Mary


使用笛卡爾乘積,得到祖輩與孫輩的關係結果:

                                    if (grandchildnum != 0 && grandparientnum != 0) {

                                    //執行笛卡爾乘積

                                                for (int i = 0; i < grandparientnum; i++) {

                                                            for (int j = 0; j < grandchildnum; j++) {

                                                                        logger.info(new Text(grandchild[i])+","+new Text(grandparient[j]));

                                                                        output.collect(new Text(grandchild[i]), new Text(grandparient[j]));

                                                            }

                                                }

                                    }


Tom,Ben

TomMary

Jone Ben

Jone Mary



附上main方法:

public static void main(String[] args) {

                                    try {

                                                String inputDir = "hdfs://192.168.1.61:9000/home/zhongml/childparent/input";

                                                String outputDir = "hdfs://192.168.1.61:9000/home/zhongml/childparent/output";

                                                JobConf con = new JobConf(ChildParent2.class);

                                                con.setJobName("childparent");

                                                con.setMapOutputKeyClass(Text.class);

                                                con.setMapOutputValueClass(Text.class);

                                                con.setOutputKeyClass(Text.class);

                                                con.setOutputValueClass(Text.class);

                                                con.setMapperClass(ChildParentMapper.class);

                                                con.setReducerClass(ChildParentReduce.class);

                                                con.setInputFormat(TextInputFormat.class);

                                                con.setOutputFormat(TextOutputFormat.class);

                                                FileInputFormat.setInputPaths(con, new Path(inputDir));

                                                FileOutputFormat.setOutputPath(con, new Path(outputDir));

                                                JobClient.runJob(con);

                                                System.exit(0);

                                    } catch (IllegalArgumentException e) {

                                                e.printStackTrace();

                                    } catch (IOException e) {

                                                e.printStackTrace();

                                    }

                        }

            }





                    






發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章