一,從StreamingContext.start()進入
===>在SparkContext有一個JobScheduler成員會被初始化,JobScheduler會被StreamingContext.start()被調用
private[streaming]valscheduler =newJobScheduler(this)
二,跟進入JobScheduler的start()方法,發現JobStart,JobCompleted是JobSchedulerEvent實現類,是如何被EventLoop調用的呢?
1,跟進去processEvent()方法查看一下,需要傳進來具體的子類,才能進行調用JobStrart()
//JobSchedulerEvent這個類是下面case對應的父類
private def processEvent(event: JobSchedulerEvent) {
try {
eventmatch{
caseJobStarted(job,startTime) =>handleJobStart(job,startTime)
caseJobCompleted(job,completedTime)=> handleJobCompletion(job,completedTime)
caseErrorReported(m,e)=> handleError(m,e)
}
}catch{
casee:Throwable=>
reportError("Error in job scheduler",e)
}
}
2,進入EventLoop到底是怎麼調用的呢?
==》通過上面代碼中 跟進去
eventLoop.start()
===》 eventThread.start()線程使用JDK併發包的BlockingQueue來操作 JobSchedulerEvent的子類
defstart():Unit= {
if (stopped.get){
throw new IllegalStateException(name +" has already been stopped")
}
// Call onStart before starting the eventthread to make sure it happens before onReceive
onStart()
eventThread.start()
}
===》發現eventThread線程會被BlockingQueue.take()進行阻塞,也就是說隊列會一直等到JobStarted或JobCompleted進入纔會進行調用
===》如果有元素進入隊列,就會回調EventLoop中的onReceive()方法,
===》這樣纔會真正去執行JobScheduler中的processEvent()
3,返回JobScheduler類之後,思考問題就變成了誰負責將JobStart傳到EventLoop中的BlockingQueue隊列中的呢?
===》分析一下,Dstream每個Batch Interval 生產的job是由JobGenerator來生成的,猜想JobStarted很有可能和這個類有關係
===》所以先將到“誰負責將JobStart傳到EventLoop中的BlockingQueue隊列中的呢” 先放一放。先分析一下JobGenerator相關內容。看是否能找到這個答案
4,在JobScheduler類中會先實例化成員JobGenerator,
private valjobGenerator=newJobGenerator(this)
==》並且在JobScheduler的start()方法中會調用JobGenerator.start()方法:
jobGenerator.start()
===》在JobGenerator被初始化時,RecurringTimer也被初始化了,並且向JobGenerator中的EventLoop放進一個GenerateJobs
==》先來分析一下RecurringTimer這個類怎麼調用主構造方法的匿名函數;
longTime =>eventLoop.post(GenerateJobs(newTime(longTime)))
===》進入RecurringTimer類
classRecurringTimer(clock:Clock,period:Long,callback: (Long)=>Unit,name:String)
extends Logging {
===》這個匿名函數是被RecurringTimer中 triggerActionForNextInterval方法調用==》它又被loop方法調用
===》loop方法又被RecurringTimer成員中的thread調用
private valthread=newThread("RecurringTimer - "+ name) {
setDaemon(true)
override def run() { loop }
}
===》這個thread被RecurringTimer中的start方法調用
/**
* Start at the given start time.
*/
def start(startTime:Long):Long = synchronized {
nextTime = startTime
thread.start()
logInfo("Started timer for "+ name +" at time "+nextTime)
nextTime
}
===》RecurringTimer.start方法又被JobGenerator中的startFirstTime()調用,而它又被JobGenerator的start方法調用
===》即然在JobGenerator中的EventLoop被放進一個GeneratorJobs,就會觸發onReceive方法
===》onReceive方法會調用processEvent()
override protected def onReceive(event:JobGeneratorEvent):Unit= processEvent(event)
===>就會JobGenerator調用generateJobs(time)
private def processEvent(event:JobGeneratorEvent) {
logDebug("Got event "+ event)
event match{
case GenerateJobs(time) => generateJobs(time)
case ClearMetadata(time) => clearMetadata(time)
case DoCheckpoint(time,clearCheckpointDataLater)=>
doCheckpoint(time,clearCheckpointDataLater)
case ClearCheckpointData(time) => clearCheckpointData(time)
}
}
===》而在JobGenerator的generateJobs方法中看到個接近,“誰負責將JobStart傳到EventLoop中的BlockingQueue隊列中的呢“的答案了
==》就是下面的
jobScheduler.submitJobSet(JobSet(time,jobs,streamIdToInputInfos))
5,再進入JobScheduler的submitJobSet方法
defsubmitJobSet(jobSet: JobSet){
if (jobSet.jobs.isEmpty) {
logInfo("No jobs added for time "+ jobSet.time)
} else {
listenerBus.post(StreamingListenerBatchSubmitted(jobSet.toBatchInfo))
jobSets.put(jobSet.time,jobSet)
jobSet.jobs.foreach(job =>jobExecutor.execute(newJobHandler(job)))
logInfo("Added jobs for time "+ jobSet.time)
}
}
===》發現jobExecutor.execute(new JobHandler(job))
===>在JobScheduler中的JobHandler內部類中發現JobStarted、JobCompleted是由JobHandler線程放到EventLoop中。(原來如此。。。哈哈。。。)
===》JobStarted被post到EventLoop中,就會觸發OnRecevie方法,從而再調用processEvent方法,從而每個JobStart從HandlerJorStart方法開始處理。
private defprocessEvent(event:JobSchedulerEvent) {
try {
eventmatch{
caseJobStarted(job,startTime) =>handleJobStart(job,startTime)
caseJobCompleted(job,completedTime)=> handleJobCompletion(job,completedTime)
caseErrorReported(m,e)=> handleError(m,e)
}
} catch {
case e:Throwable=>
reportError("Error in job scheduler",e)
}
}