JobTracker節點後臺線程之ExpireTrackers

      在Hadoop的集羣中,所有的TaskTracker在啓動之後都會向Master節點JobTracker進行註冊,通過TaskTracker的註冊,JobTracker節點就知道集羣中那些Slave節點可用,即可以給它們分配任務了。但是有時候某些TaskTracker節點會出現意外宕機的情況,而JobTracker節點也必須能及時的捕捉到該TaskTracker節點已經實現,然後不再給該節點分配任務,二來將已經分配給該節點且還沒有被它完成的Task從新分配給其它的TaskTracker節點來完成。因此,本文將主要講述JobTracker節點是如何檢測和善後處理已經失效的TaskTracker節點。

    對於JobTracker節點是如何來檢測某一個TaskTracker已經失效的問題實際上和NameNode節點檢測DataNode節點失效的機制是一摸一樣的,都是在後臺開啓一個線程來定時檢測Slave節點是否還有效,具體做法是:

1.TaskTracker節點會定時向JobTracker節點發送心跳包,這個心跳包會捎帶告訴JobTracker當前的一些狀態信息,比如:內存狀態信息、執行的Task狀態信息等。這個間隔時間值一開始是3000ms,之後就由JobTracker來決定,這個決定策略參考了當前集羣的規模,具體算法如下:

public static final int CLUSTER_INCREMENT = 100;
public static final int HEARTBEAT_INTERVAL_MIN = 3 * 1000;

public int getNextHeartbeatInterval() {
  // get the no of task trackers
  int clusterSize = getClusterStatus().getTaskTrackers();
  int heartbeatInterval =  Math.max((int)(1000 * Math.ceil((double)clusterSize / CLUSTER_INCREMENT)),HEARTBEAT_INTERVAL_MIN) ;
  return heartbeatInterval;
}

2.JobTracker收到一個TaskTracker節點的狀態信息之後就會更新這個TaskTracker上一次登記的狀態信息;

3.JobTracker節點在啓動的時候會開啓一個後臺檢測線程ExpireTrackers,來不斷的根據TaskTracker登記的狀態信息來判斷這個TaskTracker節點是否已經失效了。當然,爲了減輕JobTracker的工作壓力,這個後臺檢測線程每檢查一次還是有時間間隔的,間隔時間是TASKTRACKER_EXPIRE_INTVERAL/3 ms,它的默認值是10*60*1000,但是也可以通過JobTracker的配置文件來由用戶根據實際的情況來確定,對應的配置項爲:mapred.tasktracker.expiry.interval。這個判斷的源碼如下:

long now = System.currentTimeMillis();
TaskTrackerStatus leastRecent = null;
//從隊列中取出一個最有可能失效的TaskTracker
while ((trackerExpiryQueue.size() > 0) && ((leastRecent = trackerExpiryQueue.first()) != null) && (now - leastRecent.getLastSeen() > TASKTRACKER_EXPIRY_INTERVAL)) {
        
  // Remove profile from head of queue
  trackerExpiryQueue.remove(leastRecent);
  String trackerName = leastRecent.getTrackerName();
        
  //獲取可能失效的TaskTracker最新的狀態登記信息(減少可能因爲沒有及時更新而產生的誤判)
  TaskTrackerStatus newProfile = taskTrackers.get(leastRecent.getTrackerName());
  // Items might leave the taskTracker set through other means; the
  // status stored in 'taskTrackers' might be null, which means the
  // tracker has already been destroyed.
  if (newProfile != null) {
    //根據最新的登記信息也判斷它失效了
    if (now - newProfile.getLastSeen() > TASKTRACKER_EXPIRY_INTERVAL) {
      //善後處理已經失效的TaskTracker節點
      lostTaskTracker(leastRecent.getTrackerName());
      
      // 如果該失效的TaskTracker節點早已上了黑名單,則應該將其從黑名單上刪除
      if (isBlacklisted(trackerName)) {
        faultyTrackers.numBlacklistedTrackers -= 1;
      }
      updateTaskTrackerStatus(trackerName, null);//清空已失效TaskTracker節點的狀態登記信息
 } else {
      // Update time by inserting latest profile
      trackerExpiryQueue.add(newProfile);
    }
  }
}
   

    前面比較詳細的講述了JobTracker是如何對TaskTracker節點進行比較實時的expire檢測,那麼,如果當JobTracker檢測到一個TaskTracker節點已經expire之後,它又是如何來處理這個失效的Slave節點的呢?畢竟,TaskTracker可能被分配了一批Task,同時這些Task中又沒有都做完,這樣的話,JobTracker就不得不把這些TaskTracker沒有做完的任務交給其它TaskTracker來重新做。在JobTracker中,這個善後處理工作大致如下:

1.刪除TaskTracker---->JobsToCleanup任務映射;

2.刪除TaskTracker---->TasksToCleanup任務映射;

3.從作業恢復管理器RecoveryManager中清除TaskTracker節點;

4.對於分配給該TaskTacker節點,如果它有還未完成的map/reduce任務或者是已經完成的map任務且它所屬的Job有reduce操作的話,則要通知該Task對應的Job,它的這個Task沒有完成需要找其它的TaskTracker節點來重做。

5.清除TaskTracker登記的已經完成的Task記錄。

   這個過程對應的源代碼也順便貼出來吧!

void lostTaskTracker(String trackerName) {
    LOG.info("Lost tracker '" + trackerName + "'");
    
    // remove the tracker from the local structures
    synchronized (trackerToJobsToCleanup) {
      trackerToJobsToCleanup.remove(trackerName);
    }
    
    synchronized (trackerToTasksToCleanup) {
      trackerToTasksToCleanup.remove(trackerName);
    }
    
    // Inform the recovery manager
    recoveryManager.unMarkTracker(trackerName);
    
    Set<TaskAttemptID> lostTasks = trackerToTaskMap.get(trackerName);
    trackerToTaskMap.remove(trackerName);

    if (lostTasks != null) {
      // List of jobs which had any of their tasks fail on this tracker
      Set<JobInProgress> jobsWithFailures = new HashSet<JobInProgress>(); 
      for (TaskAttemptID taskId : lostTasks) {
        TaskInProgress tip = taskidToTIPMap.get(taskId);
        JobInProgress job = tip.getJob();

        // Completed reduce tasks never need to be failed, because 
        // their outputs go to dfs
        // And completed maps with zero reducers of the job 
        // never need to be failed. 
        if (!tip.isComplete() || 
            (tip.isMapTask() && !tip.isJobSetupTask() && job.desiredReduces() != 0)) {
          // if the job is done, we don't want to change anything
          if (job.getStatus().getRunState() == JobStatus.RUNNING ||
              job.getStatus().getRunState() == JobStatus.PREP) {
            // the state will be KILLED_UNCLEAN, if the task(map or reduce) 
            // was RUNNING on the tracker
            TaskStatus.State killState = (tip.isRunningTask(taskId) && !tip.isJobSetupTask() && !tip.isJobCleanupTask()) ? TaskStatus.State.KILLED_UNCLEAN : TaskStatus.State.KILLED;
            job.failedTask(tip, taskId, ("Lost task tracker: " + trackerName), (tip.isMapTask() ? TaskStatus.Phase.MAP : TaskStatus.Phase.REDUCE), killState, trackerName);
            jobsWithFailures.add(job);
          }
        } else {
          // Completed 'reduce' task and completed 'maps' with zero 
          // reducers of the job, not failed;
          // only removed from data-structures.
          markCompletedTaskAttempt(trackerName, taskId);
        }
      }
      
      // Penalize this tracker for each of the jobs which   
      // had any tasks running on it when it was 'lost' 
      for (JobInProgress job : jobsWithFailures) {
        job.addTrackerTaskFailure(trackerName);
      }
      
      // Purge 'marked' tasks, needs to be done  
      // here to prevent hanging references!
      removeMarkedTasks(trackerName);
    }
  }

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章