場景:一些複雜的任務難以用一次mapreduce處理完成,需要多次mapreduce才能完成任務。如在日誌解析系統中,會分爲拆分、session_id,、上下文三個job。
在map/reduce迭代過程中,思想還是比較簡單,就像類似for循環一樣,前一個mapreduce的輸出結果,作爲下一個mapreduce的輸入,任務完成後中間結果都可以刪除。
如代碼:
Configuration conf = new Configuration();
Job job1 = new Job(conf1,"job1");
.....
FileInputFormat.addInputPath(job1,InputPaht1);
FileOutputFromat.setOoutputPath(job1,Outpath1);
job1.waitForCompletion(true);
//sub Mapreduce
//Configuration conf = new Configuration();
Job job2 = new Job(conf,"job2");
.....
FileInputFormat.addInputPath(job2,Outpath1);
FileOutputFromat.setOoutputPath(job2,Outpath2);
job2.waitForCompletion(true);
//sub Mapreduce
//Configuration conf = new Configuration();
Job job3 = new Job(conf,"job3");
.....
FileInputFormat.addInputPath(job3,Outpath2);
FileOutputFromat.setOoutputPath(job3,Outpath3);
job3.waitForCompletion(true);
.....