Hadoop源碼之TaskTracker

TaskTracker是Map/Reduce中執行任務的服務。

1、有如下線程爲其提供支撐:

1)提供一組RPC服務(TaskUmbilicalProtocol)的1個Listener線程與默認10個Handler線程;

2)提供另一組RPC服務(MapOutputProtocol)的1個Listener線程與默認2個Handler線程;

3)TaskTracker主線程:主線程運行offerService(),提供如下服務:

a)默認每隔10秒,向JobTracker發送1次心跳信息;

long waitTime = HEARTBEAT_INTERVAL - (now - lastHeartbeat);
if (waitTime > 0) {
try {
    Thread.sleep(waitTime);
} catch (InterruptedException ie) {
}
continue;
}
//代碼略
int resultCode = jobClient.emitHeartbeat(new TaskTrackerStatus(taskTrackerName, localHostname, mapOutputPort, taskReports), justStarted);                                          //發送心跳

b)向JobTracker請求任務,並執行;

if (mapTotal < maxCurrentTasks || reduceTotal < maxCurrentTasks) {
Task t = jobClient.pollForNewTask(taskTrackerName);                        //請求任務
if (t != null) {
    TaskInProgress tip = new TaskInProgress(t, this.fConf);
    synchronized (this) {
      tasks.put(t.getTaskId(), tip);
      if (t.isMapTask()) {
	  mapTotal++;
      } else {
	  reduceTotal++;
      }
      runningTasks.put(t.getTaskId(), tip);
    }
    tip.launchTask();                                                      //執行任務
}
}

 c)殺死超時不彙報進度的任務;

for (Iterator it = runningTasks.values().iterator(); it.hasNext(); ) {
    TaskInProgress tip = (TaskInProgress) it.next();
    if ((tip.getRunState() == TaskStatus.RUNNING) &&
	(System.currentTimeMillis() - tip.getLastProgressReport() > this.taskTimeout)) {
	LOG.info("Task " + tip.getTask().getTaskId() + " timed out.  Killing.");
	tip.reportDiagnosticInfo("Timed out.");
	tip.killAndCleanup();                                         //殺死
    }
}

 d)向JobTracker詢問可以終結的任務,並予以終結;

String toCloseId = jobClient.pollForTaskWithClosedJob(taskTrackerName);
if (toCloseId != null) {
synchronized (this) {
TaskInProgress tip = (TaskInProgress) tasks.get(toCloseId);
tip.jobHasFinished();
}
}


2、offerService中,請求到任務後,執行任務(tip.launchTask)的具體實現如下:

1)根據task是Map還是Reduce,構建MapTaskRunner或是ReduceTaskRunner線程;

2)不管是MapTaskRunner還是ReduceTaskRunner,都對抽象類TaskRunner進行了實現;

3)組裝Java子進程命令行,入口類爲TaskTracker的靜態Child類,並執行,其中組裝部件如下:

a)java執行文件;

b)classPath;

c)javaOpts,如堆內存大小;

d)TaskTracker.Child類名;

e)taskReport端口;

f)taskId;

 

3、執行任務的Java子進程實現如下:

1)獲取TaskTracker的本地RPC代理;

  TaskUmbilicalProtocol umbilical =
    (TaskUmbilicalProtocol)RPC.getProxy(TaskUmbilicalProtocol.class,
					new InetSocketAddress(port), conf);

2)通過代理獲取任務,並得到job信息;

Task task = umbilical.getTask(taskid);
JobConf job = new JobConf(task.getJobFile());

3)每隔1秒,發送ping信息到TaskTracker;

startPinging(umbilical, taskid);        // start pinging parent

4)通過實現抽象類Task的MapTask或ReduceTask執行任務;

task.run(job, umbilical);           // run the task


4、MapTask執行任務的具體實現如下:

1)根據Reduce任務的個數,將Map階段的輸出文件分爲幾塊,並打開;

    // open output files
    final int partitions = job.getNumReduceTasks();
    final SequenceFile.Writer[] outs = new SequenceFile.Writer[partitions];

2)根據Map階段要輸出文件的格式,Key、Value的類型,對Writer進行初始化;

      for (int i = 0; i < partitions; i++) {
        outs[i] =
          new SequenceFile.Writer(FileSystem.getNamed("local", job),
                                  this.mapOutputFile.getOutputFile(getTaskId(), i).toString(),
                                  job.getOutputKeyClass(),
                                  job.getOutputValueClass());
      }

3)構建輸出集合器,實現collect方法;

      OutputCollector partCollector = new OutputCollector() { // make collector
          public synchronized void collect(WritableComparable key,
                                           Writable value)
            throws IOException {
            outs[partitioner.getPartition(key, value, partitions)]
              .append(key, value);
            reportProgress(umbilical);
          }
        };

      OutputCollector collector = partCollector;

4)如果輸入到Reduce前,需要組合歸類的話,構建組合集合器;

      boolean combining = job.getCombinerClass() != null;
      if (combining) {                            // add combining collector
        collector = new CombiningCollector(job, partCollector, reporter);
      }

5)打開Map的輸入文件;

      final RecordReader rawIn =                  // open input
        job.getInputFormat().getRecordReader
        (FileSystem.get(job), split, job, reporter);

6)構建讀入器,並實現每讀入一對Key、Value,能夠更新進度;

      RecordReader in = new RecordReader() {      // wrap in progress reporter
          private float perByte = 1.0f /(float)split.getLength();

          public synchronized boolean next(Writable key, Writable value)
            throws IOException {

            float progress =                        // compute progress
              (float)Math.min((rawIn.getPos()-split.getStart())*perByte, 1.0f);
            reportProgress(umbilical, progress);

            return rawIn.next(key, value);
          }
          public long getPos() throws IOException { return rawIn.getPos(); }
          public void close() throws IOException { rawIn.close(); }
        };

7)構建MapRunner;

      MapRunnable runner =
        (MapRunnable)job.newInstance(job.getMapRunnerClass());

8)調用MapRunner的run方法;

runner.run(in, collector, reporter);      // run the map

9)run方法內,先獲取Key、Value,再循環調用用戶定義的Mapper運行,通過集合器收集map的輸出結果,直到所有數據對執行完畢。

  public void run(RecordReader input, OutputCollector output,
                  Reporter reporter)
    throws IOException {
    try {
      // allocate key & value instances that are re-used for all entries
      WritableComparable key =
        (WritableComparable)job.newInstance(inputKeyClass);
      Writable value = (Writable)job.newInstance(inputValueClass);
      while (input.next(key, value)) {
        // map pair to output
        mapper.map(key, value, output, reporter);
      }
    } finally {
        mapper.close();
    }
  }

 

5、ReduceTask執行任務的具體實現如下:

1)準備收集Map階段的所有輸出文件;

    // open a file to collect map output
    String file = job.getLocalFile(getTaskId(), "all.1").toString();
    SequenceFile.Writer writer =
      new SequenceFile.Writer(lfs, file, keyClass, valueClass);

2)所有Map階段的輸出文件,合併彙總成一個文件,同時報告合併進度;

      for (int i = 0; i < mapTaskIds.length; i++) {
        appendPhase.addPhase();                 // one per file
      }
      
      DataOutputBuffer buffer = new DataOutputBuffer();

      for (int i = 0; i < mapTaskIds.length; i++) {
        File partFile =
          this.mapOutputFile.getInputFile(mapTaskIds[i], getTaskId());
        float progPerByte = 1.0f / lfs.getLength(partFile);
        Progress phase = appendPhase.phase();
        phase.setStatus(partFile.toString());

        SequenceFile.Reader in =
          new SequenceFile.Reader(lfs, partFile.toString(), job);
        try {
          int keyLen;
          while((keyLen = in.next(buffer)) > 0) {
            writer.append(buffer.getData(), 0, buffer.getLength(), keyLen);
            phase.set(in.getPosition()*progPerByte);
            reportProgress(umbilical);
            buffer.reset();
          }
        } finally {
          in.close();
        }
        phase.complete();
      }
      
    } finally {
      writer.close();
    }

3)構建一個排序用的線程;

    Thread sortProgress = new Thread() {
        public void run() {
          while (!sortComplete) {
            try {
              reportProgress(umbilical);
              Thread.sleep(PROGRESS_INTERVAL);
            } catch (InterruptedException e) {
              continue;
            } catch (Throwable e) {
              return;
            }
          }
        }
      };

4)排序生成新文件,刪除掉排序前文件,並報告進度;

    String sortedFile = job.getLocalFile(getTaskId(), "all.2").toString();

    WritableComparator comparator = job.getOutputKeyComparator();
    
    try {
      sortProgress.start();

      // sort the input file
      SequenceFile.Sorter sorter =
        new SequenceFile.Sorter(lfs, comparator, valueClass, job);
      sorter.sort(file, sortedFile);              // sort
      lfs.delete(new File(file));                 // remove unsorted

    } finally {
      sortComplete = true;
    }

5)構建Reduce的輸出集合器;

    // make output collector
    String name = getOutputName(getPartition());
    final RecordWriter out =
      job.getOutputFormat().getRecordWriter(FileSystem.get(job), job, name);
    OutputCollector collector = new OutputCollector() {
        public void collect(WritableComparable key, Writable value)
          throws IOException {
          out.write(key, value);
          reportProgress(umbilical);
        }
      };

6)構建讀入器,循環調用用戶定義的Reducer運行,通過集合器收集reduce的結果,直到所有數據對都執行完畢。

    // apply reduce function
    SequenceFile.Reader in = new SequenceFile.Reader(lfs, sortedFile, job);
    Reporter reporter = getReporter(umbilical, getProgress());
    long length = lfs.getLength(new File(sortedFile));
    try {
      ValuesIterator values = new ValuesIterator(in, length, comparator,
                                                 umbilical);
      while (values.more()) {
        reducer.reduce(values.getKey(), values, collector, reporter);
        values.nextKey();
      }

 

這就是TaskTracker一個完整的結構與運行流程。

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章