SchedulerBackend詳解及源碼介紹

SchedulerBackend涉及到Netty的知識,筆者尚未理解,先寫一部分,後面會持續補充。

1 SchedulerBackend是什麼?

首先看SchedulerBackend在Spark中的使用。

如源碼1,最初,在SparkContext.scala中存在一個SchedulerBackend的實例,在createScheduler()方法中同時創建了兩個實例scheduler和backend,它們分別是TaskScheduler類、SchedulerBackend類的實例。
在不同的部署模式下,TaskScheduler的實現都相同,都是TaskSchedulerImpl,SchedulerBackend的實現不同,在這裏源碼中只展示了一種。
如源碼2,TaskScheduler通過start()方法啓動,而底層實際調用了Backend的start()方法。那麼爲什麼要調用Backend的start()方法呢?該方法有什麼功能呢?SchedulerBackend到底有什麼作用呢?現在前往第2章。

    //源碼1,來自:SparkContext.scala
    // Create and start the scheduler
    val (sched, ts) = SparkContext.createTaskScheduler(this, master, deployMode)
    _schedulerBackend = sched
    _taskScheduler = ts
    _dagScheduler = new DAGScheduler(this)
    _heartbeatReceiver.ask[Boolean](TaskSchedulerIsSet)

    // start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
    // constructor
    _taskScheduler.start()

/**
   * Create a task scheduler based on a given master URL.
   * Return a 2-tuple of the scheduler backend and the task scheduler.
   */
  private def createTaskScheduler(
      sc: SparkContext,
      master: String,
      deployMode: String): (SchedulerBackend, TaskScheduler) = {
    import SparkMasterRegex._
    ...
    master match {
      case "local" =>
        val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
        val backend = new LocalSchedulerBackend(sc.getConf, scheduler, 1)
        scheduler.initialize(backend)
        (backend, scheduler)
        ...
    }
    ...
  }

//源碼2,來自TaskSchedulerImpl.scala
  override def start() {
    backend.start()
    ...
  }

SchedulerBackend的作用

爲什麼要調用Backend的start()方法?該方法有什麼功能?SchedulerBackend到底有什麼作用?

首先來看SchedulerBackend的源碼,這是一個trait,會有不同的子類實現該trait,但所有子類的功能基本相同。即向當前等待分配計算資源的Task分配計算資源(即Executors),並在Executors上啓動Task,完成資源調度過程。

其中的start()、stop()、reviveOffers()等重要方法此處先不介紹,後面使用時逐個進行介紹。

/**
 * A backend interface for scheduling systems that allows plugging in different ones under
 * TaskSchedulerImpl. We assume a Mesos-like model where the application gets resource offers as
 * machines become available and can launch tasks on them.
 */
private[spark] trait SchedulerBackend {
  private val appId = "spark-application-" + System.currentTimeMillis

  def start(): Unit
  def stop(): Unit
  def reviveOffers(): Unit
  def defaultParallelism(): Int

  /**
   * Requests that an executor kills a running task.
   *
   * @param taskId Id of the task.
   * @param executorId Id of the executor the task is running on.
   * @param interruptThread Whether the executor should interrupt the task thread.
   * @param reason The reason for the task kill.
   */
  def killTask(
      taskId: Long,
      executorId: String,
      interruptThread: Boolean,
      reason: String): Unit =
    throw new UnsupportedOperationException

  def isReady(): Boolean = true

  /**
   * Get an application ID associated with the job.
   *
   * @return An application ID
   */
  def applicationId(): String = appId

  /**
   * Get the attempt ID for this run, if the cluster manager supports multiple
   * attempts. Applications run in client mode will not have attempt IDs.
   *
   * @return The application attempt id, if available.
   */
  def applicationAttemptId(): Option[String] = None

  /**
   * Get the URLs for the driver logs. These URLs are used to display the links in the UI
   * Executors tab for the driver.
   * @return Map containing the log names and their respective URLs
   */
  def getDriverLogUrls: Option[Map[String, String]] = None

  /**
   * Get the max number of tasks that can be concurrent launched currently.
   * Note that please don't cache the value returned by this method, because the number can change
   * due to add/remove executors.
   *
   * @return The max number of tasks that can be concurrent launched currently.
   */
  def maxNumConcurrentTasks(): Int

}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章