創建Executor

    當sparkContext被創建後,worker就會分配executor,這個過程如下圖所示:



        如上圖所示,executor要經過很多個步驟纔會被創建。

  • SparkContext中有一個叫做createTaskScheduler()的函數,這個函數會根據master URL的類型,創建taskScheduler和相應的backend,其主要代碼如下:

private defcreateTaskScheduler(
sc: SparkContext,
master: String): (SchedulerBackend, TaskScheduler) = {master match {
case "local"=>
val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal =true)
val backend = new LocalBackend(sc.getConf, scheduler, 1)
scheduler.initialize(backend)
(backend, scheduler) case LOCAL_N_REGEX(threads) => ......
case LOCAL_N_FAILURES_REGEX(threads, maxFailures) => ......
case SPARK_REGEX(sparkUrl) => ......
case LOCAL_CLUSTER_REGEX(numSlaves, coresPerSlave, memoryPerSlave) => ......
case "yarn-standalone"| "yarn-cluster" => ......
case "yarn-client"=> ......
case MESOS_REGEX(mesosUrl) => ......
case SIMR_REGEX(simrUrl) => ......
case zkUrl if zkUrl.startsWith("zk://") => ......
case _ => ......}

SparkContext會調用這個函數創建taskScheduler,隨後start taskScheduler,代碼如下:

val (sched, ts) = SparkContext.createTaskScheduler(this, master)
_taskScheduler = ts
_dagScheduler = new DAGScheduler(this)

// start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
_taskScheduler.start()

需要注意的是,tasksheduler 開始之前,一定要設置DAGScheduler引用,在DAGScheduler.scala中用這一行代碼實現

taskScheduler.setDAGScheduler(this)
  • TaskScheduler中有一個start()方法,該方法會直接調用backend.start(),核心代碼如下:
override def start() {
  backend.start()
  ......
}
這個backend是怎麼來的呢?且看SparkContext中createTaskScheduler 函數的實現,這個backend是通過這句代碼來的
scheduler.initialize(backend)

  • 接下來看看backend中的start函數做了哪些事情(注:不同部署模式的backend有可能不一樣,以下代碼來自SparkDeploySchedulerBackend),主要代碼如下:
val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
  args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("")
val coresPerExecutor = conf.getOption("spark.executor.cores").map(_.toInt)
val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory,
  command, appUIAddress, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor)
client = new AppClient(sc.env.rpcEnv, masters, appDesc, this, conf)
client.start()

首先定義並初始化了command變量,因爲部署模式是local,傳入的參數是“*CoarseGrainedExecutorBackend”,CoarseGrainedExecutorBackend在後面創建了executor。創建appDesc的時候需要傳入command變量,而創建AppClient的時候又需要傳入appDesc。最後啓動了AppClient。

  • 這個AppClient是用來幹嘛的呢?它是用來向Master註冊Application。且看如下主要代碼(源代碼在AppClient.scala中):

private def tryRegisterAllMasters():
...
 masterRef.send(RegisterApplication(appDescription, self))
...
override def receive():
case RegisteredApplication(appId_, masterRef) => {...}
...

這個函數主要向Master發送註冊Application的信息,在Master.scala的receive函數中會接收這個註冊信息,所以真正創建Application的是Master,創建成功之後,master會把創建成功的信息傳回給AppClient,AppClient.scala的receive函數會接收這個信息。創建application的代碼如下:

override def receive():
...
case RegisterApplication(description, driver) => {  //接收到來自appClient發送的信息
...
val app = createApplication(description, driver)
 registerApplication(app) // 註冊創建好的application
driver.send(RegisteredApplication(app.id, self))  //把消息發送給AppClient,該消息會被函數receive()接收
...}

隨後Master會向worker發送創建executorRunner的請求:

private def launchExecutor:
 worker.endpoint.send(LaunchExecutor(...))

  • Worker接收到信息之後,會創建executorRunner:
override def receive:
case LaunchExecutor(...)=>{ val manager = new ExecutorRunner(...)}

  • executorRunner 會根據ApplicationDescription中的描述運行 executor:
private def fetchAndRunExecutor:
...
val builder = CommandUtils.buildProcessBuilder(appDesc.command,...)
val command = builder.command()
...


appDesc.command中的appDesc是哪裏來的呢?command又是哪裏來的呢?這兩個變量都是在SparkDeploySchedulerBackend.scala(部署模式爲local和standalone時)中的start()函數中創建的。正如前面提到的,在創建command時傳入了參數org.apache.spark.executor.CoarseGrainedExecutorBackend,executorRunner會啓動CoarseGrainedExecutorBackend。而CoarseGrainedExecutorBackend又會創建executor,主要代碼如下:

override def onStart:
...
ref.ask[RegisterExecutorResponse](
        RegisterExecutor(executorId, self, hostPort, cores, extractLogUrls))
...


override def receive:  
   case RegisteredExecutor(hostname) =>
   executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
...


至此,executor已經被創建了。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章