spark job生成的時間驅動

原創

2020-06-27 23:58

JobGenerator中有一個timer成員，根據配置中的時間間隔不斷產生GenerateJobs事件來觸發job的產生，以成爲job產生的起點。Timer通過clock來作爲構建時間的依據。

val clock = {
  val clockClass = ssc.sc.conf.get(
    "spark.streaming.clock", "org.apache.spark.util.SystemClock")
  try {
    Utils.classForName(clockClass).newInstance().asInstanceOf[Clock]
  } catch {
    case e: ClassNotFoundException if clockClass.startsWith("org.apache.spark.streaming") =>
      val newClockClass = clockClass.replace("org.apache.spark.streaming", "org.apache.spark")
      Utils.classForName(newClockClass).newInstance().asInstanceOf[Clock]
  }
}

Clock的默認實現是SystemClock，其實現其實就是調用系統api獲得當前時間。

def getTimeMillis(): Long = System.currentTimeMillis()

並實現了waitTillTime()方法用來等待直到目標時間到達。

def waitTillTime(targetTime: Long): Long = {
  var currentTime = 0L
  currentTime = System.currentTimeMillis()

  var waitTime = targetTime - currentTime
  if (waitTime <= 0) {
    return currentTime
  }

  val pollTime = math.max(waitTime / 10.0, minPollTime).toLong

  while (true) {
    currentTime = System.currentTimeMillis()
    waitTime = targetTime - currentTime
    if (waitTime <= 0) {
      return currentTime
    }
    val sleepTime = math.min(waitTime, pollTime)
    Thread.sleep(sleepTime)
  }
  -1
}

此處實現的細節，並不是計算完畢當前的時間和系統事件的差距而直接進行sleep相應的時間，而是首先計算距離當前時間的十分之一（但不能小於0.025秒），作爲單次的最大休眠時間，該線程會不斷sleep最大休眠時間與距離目標時間差距時間的小者，以便在分段休眠中達到儘可能精確的目的。


private val timer = new RecurringTimer(clock, ssc.graph.batchDuration.milliseconds,
  longTime => eventLoop.post(GenerateJobs(new Time(longTime))), "JobGenerator")

而後在JobGenerator將會把clock作爲入參構建一個RecuringTimer，並在這裏實現了callback函數用來在達到時間的時候向eventloop發送GenerateJobs事件驅動job生成，生成job的時間間隔也就是在StreamingContext中配置的生成批的間隔。

private def triggerActionForNextInterval(): Unit = {
  clock.waitTillTime(nextTime)
  callback(nextTime)
  prevTime = nextTime
  nextTime += period
  logDebug("Callback for " + name + " called at time " + prevTime)
}

private def loop() {
  try {
    while (!stopped) {
      triggerActionForNextInterval()
    }
    triggerActionForNextInterval()
  } catch {
    case e: InterruptedException =>
  }
}

在timer的實現RecuringTimer中，實則實現了一個線程不斷調用loop方法，在loop方法中將會調用triggerActionForNextInterval()方法通過上述的clock的waitTillTime()方法在配置好的時間間隔之後觸發上文的作爲入參傳遞的callback函數向eventloop發送GenerateJobs事件驅動job生成。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

spark job生成的時間驅動

MySQL 核心模塊揭祕 | 18 期 | 鎖在內存里長什麼樣*

使用perf工具生成火焰圖

HttpSecurity 是如何組裝過濾器鏈的

數說海南——近6年海南各市縣人口簡單看

長序列中Transformers的高級注意力機制總結

WebStorm 創建 Vue 項目

大齡程序員思考

響應式界面控件DevExtreme * 更強的數據分析和可視化功能

spark反壓速率計算

spark閉包清理器ClosureCleaner

Java1.8HashMap一段註釋的解釋

spark job生成的時間驅動

spark RadixSort基數排序源碼實現

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結