JobTracker节点后台线程之ExpireTrackers

      在Hadoop的集群中,所有的TaskTracker在启动之后都会向Master节点JobTracker进行注册,通过TaskTracker的注册,JobTracker节点就知道集群中那些Slave节点可用,即可以给它们分配任务了。但是有时候某些TaskTracker节点会出现意外宕机的情况,而JobTracker节点也必须能及时的捕捉到该TaskTracker节点已经实现,然后不再给该节点分配任务,二来将已经分配给该节点且还没有被它完成的Task从新分配给其它的TaskTracker节点来完成。因此,本文将主要讲述JobTracker节点是如何检测和善后处理已经失效的TaskTracker节点。

    对于JobTracker节点是如何来检测某一个TaskTracker已经失效的问题实际上和NameNode节点检测DataNode节点失效的机制是一摸一样的,都是在后台开启一个线程来定时检测Slave节点是否还有效,具体做法是:

1.TaskTracker节点会定时向JobTracker节点发送心跳包,这个心跳包会捎带告诉JobTracker当前的一些状态信息,比如:内存状态信息、执行的Task状态信息等。这个间隔时间值一开始是3000ms,之后就由JobTracker来决定,这个决定策略参考了当前集群的规模,具体算法如下:

public static final int CLUSTER_INCREMENT = 100;
public static final int HEARTBEAT_INTERVAL_MIN = 3 * 1000;

public int getNextHeartbeatInterval() {
  // get the no of task trackers
  int clusterSize = getClusterStatus().getTaskTrackers();
  int heartbeatInterval =  Math.max((int)(1000 * Math.ceil((double)clusterSize / CLUSTER_INCREMENT)),HEARTBEAT_INTERVAL_MIN) ;
  return heartbeatInterval;
}

2.JobTracker收到一个TaskTracker节点的状态信息之后就会更新这个TaskTracker上一次登记的状态信息;

3.JobTracker节点在启动的时候会开启一个后台检测线程ExpireTrackers,来不断的根据TaskTracker登记的状态信息来判断这个TaskTracker节点是否已经失效了。当然,为了减轻JobTracker的工作压力,这个后台检测线程每检查一次还是有时间间隔的,间隔时间是TASKTRACKER_EXPIRE_INTVERAL/3 ms,它的默认值是10*60*1000,但是也可以通过JobTracker的配置文件来由用户根据实际的情况来确定,对应的配置项为:mapred.tasktracker.expiry.interval。这个判断的源码如下:

long now = System.currentTimeMillis();
TaskTrackerStatus leastRecent = null;
//从队列中取出一个最有可能失效的TaskTracker
while ((trackerExpiryQueue.size() > 0) && ((leastRecent = trackerExpiryQueue.first()) != null) && (now - leastRecent.getLastSeen() > TASKTRACKER_EXPIRY_INTERVAL)) {
        
  // Remove profile from head of queue
  trackerExpiryQueue.remove(leastRecent);
  String trackerName = leastRecent.getTrackerName();
        
  //获取可能失效的TaskTracker最新的状态登记信息(减少可能因为没有及时更新而产生的误判)
  TaskTrackerStatus newProfile = taskTrackers.get(leastRecent.getTrackerName());
  // Items might leave the taskTracker set through other means; the
  // status stored in 'taskTrackers' might be null, which means the
  // tracker has already been destroyed.
  if (newProfile != null) {
    //根据最新的登记信息也判断它失效了
    if (now - newProfile.getLastSeen() > TASKTRACKER_EXPIRY_INTERVAL) {
      //善后处理已经失效的TaskTracker节点
      lostTaskTracker(leastRecent.getTrackerName());
      
      // 如果该失效的TaskTracker节点早已上了黑名单,则应该将其从黑名单上删除
      if (isBlacklisted(trackerName)) {
        faultyTrackers.numBlacklistedTrackers -= 1;
      }
      updateTaskTrackerStatus(trackerName, null);//清空已失效TaskTracker节点的状态登记信息
 } else {
      // Update time by inserting latest profile
      trackerExpiryQueue.add(newProfile);
    }
  }
}
   

    前面比较详细的讲述了JobTracker是如何对TaskTracker节点进行比较实时的expire检测,那么,如果当JobTracker检测到一个TaskTracker节点已经expire之后,它又是如何来处理这个失效的Slave节点的呢?毕竟,TaskTracker可能被分配了一批Task,同时这些Task中又没有都做完,这样的话,JobTracker就不得不把这些TaskTracker没有做完的任务交给其它TaskTracker来重新做。在JobTracker中,这个善后处理工作大致如下:

1.删除TaskTracker---->JobsToCleanup任务映射;

2.删除TaskTracker---->TasksToCleanup任务映射;

3.从作业恢复管理器RecoveryManager中清除TaskTracker节点;

4.对于分配给该TaskTacker节点,如果它有还未完成的map/reduce任务或者是已经完成的map任务且它所属的Job有reduce操作的话,则要通知该Task对应的Job,它的这个Task没有完成需要找其它的TaskTracker节点来重做。

5.清除TaskTracker登记的已经完成的Task记录。

   这个过程对应的源代码也顺便贴出来吧!

void lostTaskTracker(String trackerName) {
    LOG.info("Lost tracker '" + trackerName + "'");
    
    // remove the tracker from the local structures
    synchronized (trackerToJobsToCleanup) {
      trackerToJobsToCleanup.remove(trackerName);
    }
    
    synchronized (trackerToTasksToCleanup) {
      trackerToTasksToCleanup.remove(trackerName);
    }
    
    // Inform the recovery manager
    recoveryManager.unMarkTracker(trackerName);
    
    Set<TaskAttemptID> lostTasks = trackerToTaskMap.get(trackerName);
    trackerToTaskMap.remove(trackerName);

    if (lostTasks != null) {
      // List of jobs which had any of their tasks fail on this tracker
      Set<JobInProgress> jobsWithFailures = new HashSet<JobInProgress>(); 
      for (TaskAttemptID taskId : lostTasks) {
        TaskInProgress tip = taskidToTIPMap.get(taskId);
        JobInProgress job = tip.getJob();

        // Completed reduce tasks never need to be failed, because 
        // their outputs go to dfs
        // And completed maps with zero reducers of the job 
        // never need to be failed. 
        if (!tip.isComplete() || 
            (tip.isMapTask() && !tip.isJobSetupTask() && job.desiredReduces() != 0)) {
          // if the job is done, we don't want to change anything
          if (job.getStatus().getRunState() == JobStatus.RUNNING ||
              job.getStatus().getRunState() == JobStatus.PREP) {
            // the state will be KILLED_UNCLEAN, if the task(map or reduce) 
            // was RUNNING on the tracker
            TaskStatus.State killState = (tip.isRunningTask(taskId) && !tip.isJobSetupTask() && !tip.isJobCleanupTask()) ? TaskStatus.State.KILLED_UNCLEAN : TaskStatus.State.KILLED;
            job.failedTask(tip, taskId, ("Lost task tracker: " + trackerName), (tip.isMapTask() ? TaskStatus.Phase.MAP : TaskStatus.Phase.REDUCE), killState, trackerName);
            jobsWithFailures.add(job);
          }
        } else {
          // Completed 'reduce' task and completed 'maps' with zero 
          // reducers of the job, not failed;
          // only removed from data-structures.
          markCompletedTaskAttempt(trackerName, taskId);
        }
      }
      
      // Penalize this tracker for each of the jobs which   
      // had any tasks running on it when it was 'lost' 
      for (JobInProgress job : jobsWithFailures) {
        job.addTrackerTaskFailure(trackerName);
      }
      
      // Purge 'marked' tasks, needs to be done  
      // here to prevent hanging references!
      removeMarkedTasks(trackerName);
    }
  }

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章