1 SchedulerBackend是什麼？

首先看SchedulerBackend在Spark中的使用。

如源碼1，最初，在SparkContext.scala中存在一個SchedulerBackend的實例，在createScheduler()方法中同時創建了兩個實例scheduler和backend，它們分別是TaskScheduler類、SchedulerBackend類的實例。
在不同的部署模式下，TaskScheduler的實現都相同，都是TaskSchedulerImpl，SchedulerBackend的實現不同，在這裏源碼中只展示了一種。
如源碼2，TaskScheduler通過start()方法啓動，而底層實際調用了Backend的start()方法。那麼爲什麼要調用Backend的start()方法呢？該方法有什麼功能呢？SchedulerBackend到底有什麼作用呢？現在前往第2章。

    //源碼1，來自：SparkContext.scala
    // Create and start the scheduler
    val (sched, ts) = SparkContext.createTaskScheduler(this, master, deployMode)
    _schedulerBackend = sched
    _taskScheduler = ts
    _dagScheduler = new DAGScheduler(this)
    _heartbeatReceiver.ask[Boolean](TaskSchedulerIsSet)

    // start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
    // constructor
    _taskScheduler.start()

/**
   * Create a task scheduler based on a given master URL.
   * Return a 2-tuple of the scheduler backend and the task scheduler.
   */
  private def createTaskScheduler(
      sc: SparkContext,
      master: String,
      deployMode: String): (SchedulerBackend, TaskScheduler) = {
    import SparkMasterRegex._
    ...
    master match {
      case "local" =>
        val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
        val backend = new LocalSchedulerBackend(sc.getConf, scheduler, 1)
        scheduler.initialize(backend)
        (backend, scheduler)
        ...
    }
    ...
  }

//源碼2，來自TaskSchedulerImpl.scala
  override def start() {
    backend.start()
    ...
  }

SchedulerBackend的作用

爲什麼要調用Backend的start()方法？該方法有什麼功能？SchedulerBackend到底有什麼作用？

首先來看SchedulerBackend的源碼，這是一個trait，會有不同的子類實現該trait，但所有子類的功能基本相同。即向當前等待分配計算資源的Task分配計算資源(即Executors)，並在Executors上啓動Task，完成資源調度過程。

其中的start()、stop()、reviveOffers()等重要方法此處先不介紹，後面使用時逐個進行介紹。

/**
 * A backend interface for scheduling systems that allows plugging in different ones under
 * TaskSchedulerImpl. We assume a Mesos-like model where the application gets resource offers as
 * machines become available and can launch tasks on them.
 */
private[spark] trait SchedulerBackend {
  private val appId = "spark-application-" + System.currentTimeMillis

  def start(): Unit
  def stop(): Unit
  def reviveOffers(): Unit
  def defaultParallelism(): Int

  /**
   * Requests that an executor kills a running task.
   *
   * @param taskId Id of the task.
   * @param executorId Id of the executor the task is running on.
   * @param interruptThread Whether the executor should interrupt the task thread.
   * @param reason The reason for the task kill.
   */
  def killTask(
      taskId: Long,
      executorId: String,
      interruptThread: Boolean,
      reason: String): Unit =
    throw new UnsupportedOperationException

  def isReady(): Boolean = true

  /**
   * Get an application ID associated with the job.
   *
   * @return An application ID
   */
  def applicationId(): String = appId

  /**
   * Get the attempt ID for this run, if the cluster manager supports multiple
   * attempts. Applications run in client mode will not have attempt IDs.
   *
   * @return The application attempt id, if available.
   */
  def applicationAttemptId(): Option[String] = None

  /**
   * Get the URLs for the driver logs. These URLs are used to display the links in the UI
   * Executors tab for the driver.
   * @return Map containing the log names and their respective URLs
   */
  def getDriverLogUrls: Option[Map[String, String]] = None

  /**
   * Get the max number of tasks that can be concurrent launched currently.
   * Note that please don't cache the value returned by this method, because the number can change
   * due to add/remove executors.
   *
   * @return The max number of tasks that can be concurrent launched currently.
   */
  def maxNumConcurrentTasks(): Int

}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

SchedulerBackend詳解及源碼介紹

1 SchedulerBackend是什麼？

SchedulerBackend的作用

Python 潮流週刊#52：Python 處理 Excel 的資源

TreeSet、TreeMap、Collections.sort()的區別，原理

HashMap、HashTable、ConcurrentHashMap的底層原理

工作中使用HQL踩得坑

tableau連接MySQL

大文件多路歸併排序

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結