TaskSchedulerImpl
概述
不同類型的集羣對應於不同的SchedulerBackend:YarnSchedulerBackend、StandaloneSchedulerBackend、LocalSchedulerBackend等。TaskSchedulerImpl爲不同的SchedulerBackend處理相同的邏輯,例如決定任務之間的調度順序等。
client端必須先調用TaskSchedulerImpl的initialize()和start()方法,才能再調用TaskSchedulerImpl的submitTasks()方法提交TaskSet。
TaskSchedulerImpl#submitTasks()方法
提交一個TaskSet(stage)內的所有task去運行。
流程如下:
1、首先創建TaskSetManager。TaskSetManager負責管理TaskSchedulerImpl中一個單獨TaskSet,跟蹤每一個task,如果task失敗,負責重試task直到達到task重試次數的最多次數。並且通過延遲調度來執行task的位置感知調度。
2、將TaskSetManger加入schedulableBuilder。schedulableBuilder的類型是 SchedulerBuilder,SchedulerBuilder是一個trait,有兩個實現FIFOSchedulerBuilder和 FairSchedulerBuilder,並且默認採用的是FIFO方式。schedulableBuilder是TaskScheduler中一個重要成員,他根據調度策略決定了TaskSetManager的調度順序。
3、調用SchedulerBackend的riviveOffers方法對Task進行調度,決定task具體運行在哪個Executor中。
override def submitTasks(taskSet: TaskSet) {
//獲取TaskSet中Task數組
val tasks = taskSet.tasks
logInfo("Adding task set " + taskSet.id + " with " + tasks.length + " tasks")
this.synchronized {
//創建TaskSetManager,TaskSetManager將會管理一個TaskSet內所有Task的生命週期
val manager = createTaskSetManager(taskSet, maxTaskFailures)
//獲取TaskSet的stageId
val stage = taskSet.stageId
val stageTaskSets =
taskSetsByStageIdAndAttempt.getOrElseUpdate(stage, new HashMap[Int, TaskSetManager])
// Mark all the existing TaskSetManagers of this stage as zombie, as we are adding a new one.
// This is necessary to handle a corner case. Let's say a stage has 10 partitions and has 2
// TaskSetManagers: TSM1(zombie) and TSM2(active). TSM1 has a running task for partition 10
// and it completes. TSM2 finishes tasks for partition 1-9, and thinks he is still active
// because partition 10 is not completed yet. However, DAGScheduler gets task completion
// events for all the 10 partitions and thinks the stage is finished. If it's a shuffle stage
// and somehow it has missing map outputs, then DAGScheduler will resubmit it and create a
// TSM3 for it. As a stage can't have more than one active task set managers, we must mark
// TSM2 as zombie (it actually is).
stageTaskSets.foreach { case (_, ts) =>
ts.isZombie = true
}
stageTaskSets(taskSet.stageAttemptId) = manager
//將TaskSetManger加入schedulableBuilder
schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties)
if (!isLocal && !hasReceivedTask) {
starvationTimer.scheduleAtFixedRate(new TimerTask() {
override def run() {
if (!hasLaunchedTask) {
logWarning("Initial job has not accepted any resources; " +
"check your cluster UI to ensure that workers are registered " +
"and have sufficient resources")
} else {
this.cancel()
}
}
}, STARVATION_TIMEOUT_MS, STARVATION_TIMEOUT_MS)
}
hasReceivedTask = true
}
//schedulerBackEnd#reviveOffers()方法
backend.reviveOffers()
}
運行spark(on yarn)任務時典型的日誌如下:
[dag-scheduler-event-loop] INFO cluster.YarnScheduler:58: Adding task set 0.0 with 6 tasks
[dispatcher-event-loop-11] INFO scheduler.TaskSetManager:58: Starting task 0.0 in stage 0.0 (TID 0, hadoop5, partition 0,PROCESS_LOCAL, 9549 bytes)
[dispatcher-event-loop-11] INFO scheduler.TaskSetManager:58: Starting task 1.0 in stage 0.0 (TID 1, hadoop4, partition 1,PROCESS_LOCAL, 5974 bytes)
[dispatcher-event-loop-11] INFO scheduler.TaskSetManager:58: Starting task 2.0 in stage 0.0 (TID 2, hadoop9, partition 2,PROCESS_LOCAL, 4826 bytes)
[dispatcher-event-loop-11] INFO scheduler.TaskSetManager:58: Starting task 3.0 in stage 0.0 (TID 3, hadoop5, partition 3,PROCESS_LOCAL, 6937 bytes)
[dispatcher-event-loop-11] INFO scheduler.TaskSetManager:58: Starting task 4.0 in stage 0.0 (TID 4, hadoop4, partition 4,PROCESS_LOCAL, 5587 bytes)
[dispatcher-event-loop-11] INFO scheduler.TaskSetManager:58: Starting task 5.0 in stage 0.0 (TID 5, hadoop9, partition 5,PROCESS_LOCAL, 6734 bytes)
CoarseGrainedExecutorBackEnd#reviveOffers()方法
該方法給driverEndpoint發送ReviveOffer消息
override def reviveOffers() {
driverEndpoint.send(ReviveOffers)
}
driverEndpoint收到ReviveOffer消息後調用makeOffers方法
case ReviveOffers =>
makeOffers()
DriverEndpoint#makeOffers()方法
上面代碼中的executorDataMap,在客戶端向Master註冊Application的時候,Master已經爲Application分配並啓動好Executor,然後註冊給CoarseGrainedSchedulerBackend,註冊信息就是存儲在executorDataMap數據結構中。
// Make fake resource offers on all executors
private def makeOffers() {
// Make sure no executor is killed while some task is launching on it
val taskDescs = withLock {
// Filter out executors under killing
//過濾掉處於被killing狀態的Executor
val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
val workOffers = activeExecutors.map {
case (id, executorData) =>
////將Executor封裝成 WorkerOffer 對象
new WorkerOffer(id, executorData.executorHost, executorData.freeCores,
Some(executorData.executorAddress.hostPort))
}.toIndexedSeq
scheduler.resourceOffers(workOffers)
}
if (!taskDescs.isEmpty) {
launchTasks(taskDescs)
}
}