SparkStream源碼分析:JobScheduler的JobStarted、JobCompleted是怎麼被調用的

一,從StreamingContext.start()進入


===>在SparkContext有一個JobScheduler成員會被初始化,JobScheduler會被StreamingContext.start()被調用

private[streaming]valscheduler =newJobScheduler(this)

二,跟進入JobSchedulerstart()方法,發現JobStart,JobCompletedJobSchedulerEvent實現類,是如何被EventLoop調用的呢?

1,跟進去processEvent()方法查看一下,需要傳進來具體的子類,才能進行調用JobStrart()

//JobSchedulerEvent這個類是下面case對應的父類

  private def processEvent(event: JobSchedulerEvent) {

    try {

      eventmatch{

       caseJobStarted(job,startTime) =>handleJobStart(job,startTime)

        caseJobCompleted(job,completedTime)=> handleJobCompletion(job,completedTime)

       caseErrorReported(m,e)=> handleError(m,e)

      }

    }catch{

     casee:Throwable=>

        reportError("Error in job scheduler",e)

    }

  }

2,進入EventLoop到底是怎麼調用的呢?

==》通過上面代碼中 跟進去

eventLoop.start()

===》 eventThread.start()線程使用JDK併發包的BlockingQueue來操作  JobSchedulerEvent的子類      

defstart():Unit= {

  if (stopped.get){

    throw new IllegalStateException(name +" has already been stopped")

  }

  // Call onStart before starting the eventthread to make sure it happens before onReceive

  onStart()

  eventThread.start()

}

===》發現eventThread線程會被BlockingQueue.take()進行阻塞,也就是說隊列會一直等到JobStarted或JobCompleted進入纔會進行調用


===》如果有元素進入隊列,就會回調EventLoop中的onReceive()方法,


===》這樣纔會真正去執行JobScheduler中的processEvent()


3,返回JobScheduler類之後,思考問題就變成了誰負責將JobStart傳到EventLoop中的BlockingQueue隊列中的呢?

===》分析一下,Dstream每個Batch Interval 生產的job是由JobGenerator來生成的,猜想JobStarted很有可能和這個類有關係

===》所以先將到“誰負責將JobStart傳到EventLoop中的BlockingQueue隊列中的呢” 先放一放。先分析一下JobGenerator相關內容。看是否能找到這個答案

 

4,在JobScheduler類中會先實例化成員JobGenerator,

private valjobGenerator=newJobGenerator(this)

==》並且在JobSchedulerstart()方法中會調用JobGenerator.start()方法:

jobGenerator.start()

===》在JobGenerator被初始化時,RecurringTimer也被初始化了,並且向JobGenerator中的EventLoop放進一個GenerateJobs

==》先來分析一下RecurringTimer這個類怎麼調用主構造方法的匿名函數;

longTime =>eventLoop.post(GenerateJobs(newTime(longTime)))

===》進入RecurringTimer

classRecurringTimer(clock:Clock,period:Long,callback: (Long)=>Unit,name:String)

  extends Logging {

===》這個匿名函數是被RecurringTimer triggerActionForNextInterval方法調用==》它又被loop方法調用

===loop方法又被RecurringTimer成員中的thread調用

private valthread=newThread("RecurringTimer - "+ name) {

  setDaemon(true)

  override def run() { loop }

}

===》這個thread被RecurringTimer中的start方法調用

/**

 * Start at the given start time.

 */

def start(startTime:Long):Long = synchronized {

  nextTime = startTime

  thread.start()

  logInfo("Started timer for "+ name +" at time "+nextTime)

  nextTime

}

===》RecurringTimer.start方法又被JobGenerator中的startFirstTime()調用,而它又被JobGeneratorstart方法調用

===》即然在JobGenerator中的EventLoop被放進一個GeneratorJobs,就會觸發onReceive方法

===onReceive方法會調用processEvent()

override protected def onReceive(event:JobGeneratorEvent):Unit= processEvent(event)

===>就會JobGenerator調用generateJobs(time)

private def processEvent(event:JobGeneratorEvent) {

  logDebug("Got event "+ event)

  event match{

    case GenerateJobs(time) => generateJobs(time)

    case ClearMetadata(time) => clearMetadata(time)

    case DoCheckpoint(time,clearCheckpointDataLater)=>

      doCheckpoint(time,clearCheckpointDataLater)

    case ClearCheckpointData(time) => clearCheckpointData(time)

  }

}

===》而在JobGeneratorgenerateJobs方法中看到個接近,誰負責將JobStart傳到EventLoop中的BlockingQueue隊列中的呢“的答案了

==》就是下面的

jobScheduler.submitJobSet(JobSet(time,jobs,streamIdToInputInfos))

5,再進入JobSchedulersubmitJobSet方法

defsubmitJobSet(jobSet: JobSet){

  if (jobSet.jobs.isEmpty) {

    logInfo("No jobs added for time "+ jobSet.time)

  } else {

    listenerBus.post(StreamingListenerBatchSubmitted(jobSet.toBatchInfo))

    jobSets.put(jobSet.time,jobSet)

    jobSet.jobs.foreach(job =>jobExecutor.execute(newJobHandler(job)))

    logInfo("Added jobs for time "+ jobSet.time)

  }

}

===》發現jobExecutor.execute(new JobHandler(job))

===>JobScheduler中的JobHandler內部類中發現JobStartedJobCompleted是由JobHandler線程放到EventLoop中。(原來如此。。。哈哈。。。)

===JobStartedpostEventLoop中,就會觸發OnRecevie方法,從而再調用processEvent方法,從而每個JobStartHandlerJorStart方法開始處理。

private defprocessEvent(event:JobSchedulerEvent) {

  try {

    eventmatch{

     caseJobStarted(job,startTime) =>handleJobStart(job,startTime)

     caseJobCompleted(job,completedTime)=> handleJobCompletion(job,completedTime)

     caseErrorReported(m,e)=> handleError(m,e)

    }

  } catch {

    case e:Throwable=>

      reportError("Error in job scheduler",e)

  }

}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章