在Hadoop的集群中,所有的TaskTracker在启动之后都会向Master节点JobTracker进行注册,通过TaskTracker的注册,JobTracker节点就知道集群中那些Slave节点可用,即可以给它们分配任务了。但是有时候某些TaskTracker节点会出现意外宕机的情况,而JobTracker节点也必须能及时的捕捉到该TaskTracker节点已经实现,然后不再给该节点分配任务,二来将已经分配给该节点且还没有被它完成的Task从新分配给其它的TaskTracker节点来完成。因此,本文将主要讲述JobTracker节点是如何检测和善后处理已经失效的TaskTracker节点。
对于JobTracker节点是如何来检测某一个TaskTracker已经失效的问题实际上和NameNode节点检测DataNode节点失效的机制是一摸一样的,都是在后台开启一个线程来定时检测Slave节点是否还有效,具体做法是:
1.TaskTracker节点会定时向JobTracker节点发送心跳包,这个心跳包会捎带告诉JobTracker当前的一些状态信息,比如:内存状态信息、执行的Task状态信息等。这个间隔时间值一开始是3000ms,之后就由JobTracker来决定,这个决定策略参考了当前集群的规模,具体算法如下:
public static final int CLUSTER_INCREMENT = 100;
public static final int HEARTBEAT_INTERVAL_MIN = 3 * 1000;
public int getNextHeartbeatInterval() {
// get the no of task trackers
int clusterSize = getClusterStatus().getTaskTrackers();
int heartbeatInterval = Math.max((int)(1000 * Math.ceil((double)clusterSize / CLUSTER_INCREMENT)),HEARTBEAT_INTERVAL_MIN) ;
return heartbeatInterval;
}
2.JobTracker收到一个TaskTracker节点的状态信息之后就会更新这个TaskTracker上一次登记的状态信息;
3.JobTracker节点在启动的时候会开启一个后台检测线程ExpireTrackers,来不断的根据TaskTracker登记的状态信息来判断这个TaskTracker节点是否已经失效了。当然,为了减轻JobTracker的工作压力,这个后台检测线程每检查一次还是有时间间隔的,间隔时间是TASKTRACKER_EXPIRE_INTVERAL/3 ms,它的默认值是10*60*1000,但是也可以通过JobTracker的配置文件来由用户根据实际的情况来确定,对应的配置项为:mapred.tasktracker.expiry.interval。这个判断的源码如下:
long now = System.currentTimeMillis();
TaskTrackerStatus leastRecent = null;
//从队列中取出一个最有可能失效的TaskTracker
while ((trackerExpiryQueue.size() > 0) && ((leastRecent = trackerExpiryQueue.first()) != null) && (now - leastRecent.getLastSeen() > TASKTRACKER_EXPIRY_INTERVAL)) {
// Remove profile from head of queue
trackerExpiryQueue.remove(leastRecent);
String trackerName = leastRecent.getTrackerName();
//获取可能失效的TaskTracker最新的状态登记信息(减少可能因为没有及时更新而产生的误判)
TaskTrackerStatus newProfile = taskTrackers.get(leastRecent.getTrackerName());
// Items might leave the taskTracker set through other means; the
// status stored in 'taskTrackers' might be null, which means the
// tracker has already been destroyed.
if (newProfile != null) {
//根据最新的登记信息也判断它失效了
if (now - newProfile.getLastSeen() > TASKTRACKER_EXPIRY_INTERVAL) {
//善后处理已经失效的TaskTracker节点
lostTaskTracker(leastRecent.getTrackerName());
// 如果该失效的TaskTracker节点早已上了黑名单,则应该将其从黑名单上删除
if (isBlacklisted(trackerName)) {
faultyTrackers.numBlacklistedTrackers -= 1;
}
updateTaskTrackerStatus(trackerName, null);//清空已失效TaskTracker节点的状态登记信息
} else {
// Update time by inserting latest profile
trackerExpiryQueue.add(newProfile);
}
}
}
前面比较详细的讲述了JobTracker是如何对TaskTracker节点进行比较实时的expire检测,那么,如果当JobTracker检测到一个TaskTracker节点已经expire之后,它又是如何来处理这个失效的Slave节点的呢?毕竟,TaskTracker可能被分配了一批Task,同时这些Task中又没有都做完,这样的话,JobTracker就不得不把这些TaskTracker没有做完的任务交给其它TaskTracker来重新做。在JobTracker中,这个善后处理工作大致如下:
1.删除TaskTracker---->JobsToCleanup任务映射;
2.删除TaskTracker---->TasksToCleanup任务映射;
3.从作业恢复管理器RecoveryManager中清除TaskTracker节点;
4.对于分配给该TaskTacker节点,如果它有还未完成的map/reduce任务或者是已经完成的map任务且它所属的Job有reduce操作的话,则要通知该Task对应的Job,它的这个Task没有完成需要找其它的TaskTracker节点来重做。
5.清除TaskTracker登记的已经完成的Task记录。
这个过程对应的源代码也顺便贴出来吧!
void lostTaskTracker(String trackerName) {
LOG.info("Lost tracker '" + trackerName + "'");
// remove the tracker from the local structures
synchronized (trackerToJobsToCleanup) {
trackerToJobsToCleanup.remove(trackerName);
}
synchronized (trackerToTasksToCleanup) {
trackerToTasksToCleanup.remove(trackerName);
}
// Inform the recovery manager
recoveryManager.unMarkTracker(trackerName);
Set<TaskAttemptID> lostTasks = trackerToTaskMap.get(trackerName);
trackerToTaskMap.remove(trackerName);
if (lostTasks != null) {
// List of jobs which had any of their tasks fail on this tracker
Set<JobInProgress> jobsWithFailures = new HashSet<JobInProgress>();
for (TaskAttemptID taskId : lostTasks) {
TaskInProgress tip = taskidToTIPMap.get(taskId);
JobInProgress job = tip.getJob();
// Completed reduce tasks never need to be failed, because
// their outputs go to dfs
// And completed maps with zero reducers of the job
// never need to be failed.
if (!tip.isComplete() ||
(tip.isMapTask() && !tip.isJobSetupTask() && job.desiredReduces() != 0)) {
// if the job is done, we don't want to change anything
if (job.getStatus().getRunState() == JobStatus.RUNNING ||
job.getStatus().getRunState() == JobStatus.PREP) {
// the state will be KILLED_UNCLEAN, if the task(map or reduce)
// was RUNNING on the tracker
TaskStatus.State killState = (tip.isRunningTask(taskId) && !tip.isJobSetupTask() && !tip.isJobCleanupTask()) ? TaskStatus.State.KILLED_UNCLEAN : TaskStatus.State.KILLED;
job.failedTask(tip, taskId, ("Lost task tracker: " + trackerName), (tip.isMapTask() ? TaskStatus.Phase.MAP : TaskStatus.Phase.REDUCE), killState, trackerName);
jobsWithFailures.add(job);
}
} else {
// Completed 'reduce' task and completed 'maps' with zero
// reducers of the job, not failed;
// only removed from data-structures.
markCompletedTaskAttempt(trackerName, taskId);
}
}
// Penalize this tracker for each of the jobs which
// had any tasks running on it when it was 'lost'
for (JobInProgress job : jobsWithFailures) {
job.addTrackerTaskFailure(trackerName);
}
// Purge 'marked' tasks, needs to be done
// here to prevent hanging references!
removeMarkedTasks(trackerName);
}
}