1、應用Map/Reduce的過程如下:
1)將要處理的數據組成一對對Key-Value的方式,並生成文件;
2)將這些Key-Value數據轉換映射成另外的Key-Value數據,這其中的轉化映射邏輯(算法)封裝成一個實現Mapper接口的Mapper;
public interface Mapper extends JobConfigurable, Closeable {
void map(WritableComparable key, Writable value,
OutputCollector output, Reporter reporter)
throws IOException;
}
3)將上一步通過map函數處理後生成(映射成)的Key-Value數據,再轉化爲最終需要的結果,這其中的邏輯(算法)封裝成一個實現Reducer接口的Reducer;
public interface Reducer extends JobConfigurable, Closeable {
void reduce(WritableComparable key, Iterator values,
OutputCollector output, Reporter reporter)
throws IOException;
}
4)new一個JobConf對象,顧名思義,JobConf就是對要處理的這個job(作業)的定義對象;
JobConf genJob = new JobConf(conf);
5)對JobConf賦值,如輸入數據文件路徑、上面定義的Mapper、Reducer等(參看後面的下面代碼中註釋);
genJob.setInputDir(randomIns); //設置要處理的數據對應的文件路徑
genJob.setInputKeyClass(IntWritable.class); //設置要處理的數據的Key的類型
genJob.setInputValueClass(IntWritable.class); //設置要處理的數據的Value的類型
genJob.setInputFormat(SequenceFileInputFormat.class); //設置要處理的數據對應的文件格式
genJob.setMapperClass(RandomGenMapper.class); //設置Mapper
genJob.setOutputDir(randomOuts); //設置最後輸出的數據對應的文件路徑
genJob.setOutputKeyClass(IntWritable.class); //設置最後輸出的數據的Key的類型
genJob.setOutputValueClass(IntWritable.class); //設置最後輸出的數據的Value的類型
genJob.setOutputFormat(TextOutputFormat.class); //設置最後輸出的數據對應的文件格式
genJob.setReducerClass(RandomGenReducer.class); //設置Reducer
genJob.setNumReduceTasks(1); //
6)調用JobClient的靜態方法runJob(),將上述的JobConf對象傳入,然後等待執行完成;
JobClient.runJob(genJob);
2、JobClient具體runJob()的實現如下:
1)構造一個JobClient;
JobClient jc = new JobClient(job);
在構造函數中,初始化jobSubmitClient,是本地的LocalJobRunner,還是通過RPC的getProxy方法獲取遠端Map/Reduce集羣中JobTracker的一個代理;
this.conf = conf;
String tracker = conf.get("mapred.job.tracker", "local");
if ("local".equals(tracker)) {
this.jobSubmitClient = new LocalJobRunner(conf); //本地的LocalJobRunner來
} else {
this.jobSubmitClient = (JobSubmissionProtocol)
RPC.getProxy(JobSubmissionProtocol.class,
JobTracker.getAddress(conf), conf);
}
2)提交job;
running = jc.submitJob(job);
3)循環等待job執行完畢,並每過1秒鐘,報告進度;
while (!running.isComplete()) {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {}
running = jc.getJob(jobId);
String report = null;
report = " map "+Math.round(running.mapProgress()*100)+"% reduce " + Math.round(running.reduceProgress()*100)+"%";
if (!report.equals(lastReport)) {
LOG.info(report);
lastReport = report;
}
}
if (!running.isSuccessful()) {
throw new IOException("Job failed!");
}
3、其中jc.submitJob(job)的具體實現過程如下:
1)將前面定義的jobConf生成文件,存於文件系統中,以便在集羣環境下運行時使用;
File submitJobDir = new File(job.getSystemDir(), "submit_" + Integer.toString(Math.abs(r.nextInt()), 36));
File submitJobFile = new File(submitJobDir, "job.xml");
// Write job file to JobTracker's fs
FSDataOutputStream out = fileSys.create(submitJobFile);
try {
job.write(out);
} finally {
out.close();
}
2)然後真正提交job,剩下就交由Map/Reduce集羣來完成了;
// Now, actually submit the job (using the submit name)
//
JobStatus status = jobSubmitClient.submitJob(submitJobFile.getPath());
這就是一個客戶端應用Map/Reduce的過程,對於客戶端來說比較簡單。