在Hadoop的集羣中,所有的TaskTracker在啓動之後都會向Master節點JobTracker進行註冊,通過TaskTracker的註冊,JobTracker節點就知道集羣中那些Slave節點可用,即可以給它們分配任務了。但是有時候某些TaskTracker節點會出現意外宕機的情況,而JobTracker節點也必須能及時的捕捉到該TaskTracker節點已經實現,然後不再給該節點分配任務,二來將已經分配給該節點且還沒有被它完成的Task從新分配給其它的TaskTracker節點來完成。因此,本文將主要講述JobTracker節點是如何檢測和善後處理已經失效的TaskTracker節點。
對於JobTracker節點是如何來檢測某一個TaskTracker已經失效的問題實際上和NameNode節點檢測DataNode節點失效的機制是一摸一樣的,都是在後臺開啓一個線程來定時檢測Slave節點是否還有效,具體做法是:
1.TaskTracker節點會定時向JobTracker節點發送心跳包,這個心跳包會捎帶告訴JobTracker當前的一些狀態信息,比如:內存狀態信息、執行的Task狀態信息等。這個間隔時間值一開始是3000ms,之後就由JobTracker來決定,這個決定策略參考了當前集羣的規模,具體算法如下:
public static final int CLUSTER_INCREMENT = 100;
public static final int HEARTBEAT_INTERVAL_MIN = 3 * 1000;
public int getNextHeartbeatInterval() {
// get the no of task trackers
int clusterSize = getClusterStatus().getTaskTrackers();
int heartbeatInterval = Math.max((int)(1000 * Math.ceil((double)clusterSize / CLUSTER_INCREMENT)),HEARTBEAT_INTERVAL_MIN) ;
return heartbeatInterval;
}
2.JobTracker收到一個TaskTracker節點的狀態信息之後就會更新這個TaskTracker上一次登記的狀態信息;
3.JobTracker節點在啓動的時候會開啓一個後臺檢測線程ExpireTrackers,來不斷的根據TaskTracker登記的狀態信息來判斷這個TaskTracker節點是否已經失效了。當然,爲了減輕JobTracker的工作壓力,這個後臺檢測線程每檢查一次還是有時間間隔的,間隔時間是TASKTRACKER_EXPIRE_INTVERAL/3 ms,它的默認值是10*60*1000,但是也可以通過JobTracker的配置文件來由用戶根據實際的情況來確定,對應的配置項爲:mapred.tasktracker.expiry.interval。這個判斷的源碼如下:
long now = System.currentTimeMillis();
TaskTrackerStatus leastRecent = null;
//從隊列中取出一個最有可能失效的TaskTracker
while ((trackerExpiryQueue.size() > 0) && ((leastRecent = trackerExpiryQueue.first()) != null) && (now - leastRecent.getLastSeen() > TASKTRACKER_EXPIRY_INTERVAL)) {
// Remove profile from head of queue
trackerExpiryQueue.remove(leastRecent);
String trackerName = leastRecent.getTrackerName();
//獲取可能失效的TaskTracker最新的狀態登記信息(減少可能因爲沒有及時更新而產生的誤判)
TaskTrackerStatus newProfile = taskTrackers.get(leastRecent.getTrackerName());
// Items might leave the taskTracker set through other means; the
// status stored in 'taskTrackers' might be null, which means the
// tracker has already been destroyed.
if (newProfile != null) {
//根據最新的登記信息也判斷它失效了
if (now - newProfile.getLastSeen() > TASKTRACKER_EXPIRY_INTERVAL) {
//善後處理已經失效的TaskTracker節點
lostTaskTracker(leastRecent.getTrackerName());
// 如果該失效的TaskTracker節點早已上了黑名單,則應該將其從黑名單上刪除
if (isBlacklisted(trackerName)) {
faultyTrackers.numBlacklistedTrackers -= 1;
}
updateTaskTrackerStatus(trackerName, null);//清空已失效TaskTracker節點的狀態登記信息
} else {
// Update time by inserting latest profile
trackerExpiryQueue.add(newProfile);
}
}
}
前面比較詳細的講述了JobTracker是如何對TaskTracker節點進行比較實時的expire檢測,那麼,如果當JobTracker檢測到一個TaskTracker節點已經expire之後,它又是如何來處理這個失效的Slave節點的呢?畢竟,TaskTracker可能被分配了一批Task,同時這些Task中又沒有都做完,這樣的話,JobTracker就不得不把這些TaskTracker沒有做完的任務交給其它TaskTracker來重新做。在JobTracker中,這個善後處理工作大致如下:
1.刪除TaskTracker---->JobsToCleanup任務映射;
2.刪除TaskTracker---->TasksToCleanup任務映射;
3.從作業恢復管理器RecoveryManager中清除TaskTracker節點;
4.對於分配給該TaskTacker節點,如果它有還未完成的map/reduce任務或者是已經完成的map任務且它所屬的Job有reduce操作的話,則要通知該Task對應的Job,它的這個Task沒有完成需要找其它的TaskTracker節點來重做。
5.清除TaskTracker登記的已經完成的Task記錄。
這個過程對應的源代碼也順便貼出來吧!
void lostTaskTracker(String trackerName) {
LOG.info("Lost tracker '" + trackerName + "'");
// remove the tracker from the local structures
synchronized (trackerToJobsToCleanup) {
trackerToJobsToCleanup.remove(trackerName);
}
synchronized (trackerToTasksToCleanup) {
trackerToTasksToCleanup.remove(trackerName);
}
// Inform the recovery manager
recoveryManager.unMarkTracker(trackerName);
Set<TaskAttemptID> lostTasks = trackerToTaskMap.get(trackerName);
trackerToTaskMap.remove(trackerName);
if (lostTasks != null) {
// List of jobs which had any of their tasks fail on this tracker
Set<JobInProgress> jobsWithFailures = new HashSet<JobInProgress>();
for (TaskAttemptID taskId : lostTasks) {
TaskInProgress tip = taskidToTIPMap.get(taskId);
JobInProgress job = tip.getJob();
// Completed reduce tasks never need to be failed, because
// their outputs go to dfs
// And completed maps with zero reducers of the job
// never need to be failed.
if (!tip.isComplete() ||
(tip.isMapTask() && !tip.isJobSetupTask() && job.desiredReduces() != 0)) {
// if the job is done, we don't want to change anything
if (job.getStatus().getRunState() == JobStatus.RUNNING ||
job.getStatus().getRunState() == JobStatus.PREP) {
// the state will be KILLED_UNCLEAN, if the task(map or reduce)
// was RUNNING on the tracker
TaskStatus.State killState = (tip.isRunningTask(taskId) && !tip.isJobSetupTask() && !tip.isJobCleanupTask()) ? TaskStatus.State.KILLED_UNCLEAN : TaskStatus.State.KILLED;
job.failedTask(tip, taskId, ("Lost task tracker: " + trackerName), (tip.isMapTask() ? TaskStatus.Phase.MAP : TaskStatus.Phase.REDUCE), killState, trackerName);
jobsWithFailures.add(job);
}
} else {
// Completed 'reduce' task and completed 'maps' with zero
// reducers of the job, not failed;
// only removed from data-structures.
markCompletedTaskAttempt(trackerName, taskId);
}
}
// Penalize this tracker for each of the jobs which
// had any tasks running on it when it was 'lost'
for (JobInProgress job : jobsWithFailures) {
job.addTrackerTaskFailure(trackerName);
}
// Purge 'marked' tasks, needs to be done
// here to prevent hanging references!
removeMarkedTasks(trackerName);
}
}