spark Executor执行结果的处理源码

从1.6后,Driver的BlockManagerMaster与BlockManager之间的通信不再使用AkkaUtil而是RpcEndpoint

Spark集群中有很多执行程序执行,需要很多Executor,CoarseGrainedExecutorBackend是Executor所在的进程,Executor需要CoarseGrainedExecutorBackend进行维护和管理。CoarseGrainedExecutorBackend启动时需要通过发送RegisterExecutor向Driver注册,注册内容是RegisterExecutor。RegisterExecutor是一个case class,源码如下:

// 向Driver注册的Executor信息
case class RegisterExecutor(executorId: String,executorRef: RpcEndpointRef,
    hostPort: String,cores: Int,logUrls: Map[String, String])
  extends CoarseGrainedClusterMessage

RegisterExecutor的注册信息的发送是由类CoarseGrainedExecutorBackend的onStart方法实现的。

override def onStart() {
  rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
    driver = Some(ref)
    ref.ask[RegisterExecutorResponse](
      RegisterExecutor(executorId, self, hostPort, cores, extractLogUrls))
  }(ThreadUtils.sameThread).onComplete {
    case Success(msg) => Utils.tryLogNonFatalError {
      Option(self).foreach(_.send(msg)) // msg must be RegisterExecutorResponse
    }
    case Failure(e) => {
      logError(s"Cannot register with driver: $driverUrl", e)
      System.exit(1)
    }
  }(ThreadUtils.sameThread)
}

CoarseGrainedExecutorBackend启动时向Driver发送RegisterExecutor消息进行注册;Driver收到RegisterExecutor消息,在Executor注册成功后返回消息RegisterExecutor给CoarseGrainedExecutorBackend。这里注册的Executor和真正工作的Executor没有任何关系,其实注册是RegisterExecutorBackend,可以将RegisterExecutor理解为RegisterExecutorBackend。

注意:1、CoarseGrainedExecutorBackend是Executor运行所在的进程名称,CoarseGrainedExecutorBackend本身不会完成任务的计算。2、Executor才是正在处理任务的对象,Executor内部是通过线程池的方式来完成Task的计算的。3、CoarseGrainedExecutorBackend与Executor是一一对应的。4、CoarseGrainedExecutorBackend是一个消息通信体,可接收Driver信息、发送给Driver信息等,继承与ThreadSafeRPCEndpoint。

CoarseGrainedExecutorBackend把RegisterExecutor消息发送给Driver,其中Driver在SparkDeploySchedulerBackend实现(在Spark-2.0后SparkDeploySchedulerBackend更名为StandaloneSchedulerBackend)。SparkDeploySchedulerBackend继承自CoarseGrainedSchedulerBackend,start启动时启动AppClient(Spark-2.0后,AppClient更名为StandaloneAppClient),SparkDeploySchedulerBackend的start方法的源码:

override def start() {  super.start()  //调用CoarseGrainedSchedulerBackend的start方法  launcherBackend.connect()  // The endpoint for executors to talk to us  val driverUrl = rpcEnv.uriOf(SparkEnv.driverActorSystemName,    RpcAddress(sc.conf.get("spark.driver.host"), sc.conf.get("spark.driver.port").toInt),    CoarseGrainedSchedulerBackend.ENDPOINT_NAME)  val args = Seq(    "--driver-url", driverUrl,    "--executor-id", "{{EXECUTOR_ID}}",    "--hostname", "{{HOSTNAME}}",    "--cores", "{{CORES}}",    "--app-id", "{{APP_ID}}",    "--worker-url", "{{WORKER_URL}}")  val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")    .map(Utils.splitCommandString).getOrElse(Seq.empty)  val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath")    .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)  val libraryPathEntries = sc.conf.getOption("spark.executor.extraLibraryPath")    .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)  val testingClassPath =    if (sys.props.contains("spark.testing")) {      sys.props("java.class.path").split(java.io.File.pathSeparator).toSeq    } else {Nil}  // 通过发送过来的注册信息启动Executors

  val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
  val javaOpts = sparkJavaOpts ++ extraJavaOpts
  val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
    args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
  val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("")
  val coresPerExecutor = conf.getOption("spark.executor.cores").map(_.toInt)
  val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory,
    command, appUIAddress, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor)
  client = new AppClient(sc.env.rpcEnv, masters, appDesc, this, conf)
  client.start() //启动Driver
  launcherBackend.setState(SparkAppHandle.State.SUBMITTED)
  waitForRegistration()
  launcherBackend.setState(SparkAppHandle.State.RUNNING)
}

在Driver进程中有两个很重要的Endpoint:

  1. ClientEndpoint:负责向Master注册当前的程序,是AppClient内部的成员类

  2. DriverEndpoint:是整个程序运行时的驱动器,接收到RegisterExecutor消息完成在Driver上的注册,是CoarseGrainedSchedulerBackend内部的成员类。

Eexecutor的RegisterExecutor注册消息提交给DriverEndPoint,通过DriverEndPoint写数据给CoarseGrainedSchedulerBackend里面的数据结构executorMapData,致使CoarseGrainedSchedulerBackend获取了当前程序分配的所有的ExecutorBackend进程,而在每个ExecutorBackend进行实例中,通过Executor对象负责具体任务的执行。Executor与CoarseGrainedSchedulerBackend之间RegisterExecutor消息的接收和发送是通过receiveAndReply方法实现的,receiveAndReply方法很重要。receiveAndReply方法中有RegisterExecutor注册的过程:

override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
  case RegisterExecutor(executorId, executorRef, hostPort, cores, logUrls) =>
    if (executorDataMap.contains(executorId)) { //RegisterExecutor的注册信息
      context.reply(RegisterExecutorFailed("Duplicate executor ID: " + executorId))
    } else {
      // address不为空。则executorRef.address作为executorAddress
      val executorAddress = if (executorRef.address != null) {
          executorRef.address
        } else { //若为空,使用sender的Address作为executorAddress
          context.senderAddress
        }
      addressToExecutorId(executorAddress) = executorId
      totalCoreCount.addAndGet(cores)
      totalRegisteredExecutors.addAndGet(1)
      val data = new ExecutorData(executorRef, executorRef.address, executorAddress.host,
        cores, cores, logUrls)
      CoarseGrainedSchedulerBackend.this.synchronized {
        executorDataMap.put(executorId, data)
        if (numPendingExecutors > 0) {
          numPendingExecutors -= 1}}
      context.reply(RegisteredExecutor(executorAddress.host))
      listenerBus.post(
        SparkListenerExecutorAdded(System.currentTimeMillis(), executorId, data))
      makeOffers()
    }
  case StopDriver =>
    context.reply(true)
    stop()
  case StopExecutors =>
    logInfo("Asking each executor to shut down")
    for ((_, executorData) <- executorDataMap) {
      executorData.executorEndpoint.send(StopExecutor)
    }
    context.reply(true)
  case RemoveExecutor(executorId, reason) =>
    removeExecutor(executorId, reason)
    context.reply(true)
  case RetrieveSparkProps =>
    context.reply(sparkProperties)
}

由CoarseGrainedSchedulerBackend的receiveAndReply方法可知RegisterExecutor消息的执行过程,即Executor的注册过程:

  1. 判断executorDataMap是否包含executorId,若已包含发送注册失败的消息RegisterExecutorFailed,因为已经有重复的executorId的Executor在运行。

  2. 进行Executor的注册,获取executorAddress

  3. 定义3个数据结构:addressToExecutorId(DriverEndPoint的数据结构,包含RPC地址主机名和端口与ExecutorId的对应关系)、totalCoreCount(集群的总核数)、totalRegisteredExecutors(当前注册的Executors总数。两者都为CoarseGrainedSchedulerBackend数据结构)

  4. 创建一个ExecutorData,提取executorRef、executorRef.address、hostname、cores等信息。

  5. 通过代码块CoarseGrainedSchedulerBackend.this.synchronized:集群中多个Executor向Driver注册,防止写冲突,设计为一个同步代码块。

  6. executor.send(RegisterExecutor)发消息RegisterExecutor给sender,sender是CoarseGrainedExecutorBackend,而CoarseGrainedExecutorBackend收到RegisterExecutor消息后,创建了Executor。而Executor是负责真正Task计算的。

  7. override def receive: PartialFunction[Any, Unit] = {  case RegisteredExecutor(hostname) =>    logInfo("Successfully registered with driver")    executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)  case RegisterExecutorFailed(message) =>    logError("Slave registration failed: " + message)    System.exit(1)  case LaunchTask(data) =>    if (executor == null) {      logError("Received LaunchTask command but executor was null")      System.exit(1)    } else {      val taskDesc = ser.deserialize[TaskDescription](data.value)      logInfo("Got assigned task " + taskDesc.taskId)      executor.launchTask(this, taskId = taskDesc.taskId, attemptNumber = taskDesc.attemptNumber,        taskDesc.name, taskDesc.serializedTask)    }
    ......................

    创建的threadPool中以多线程并发执行和线程复用的方式来高效执行spark发送过来的Task。线程池创建好后,等待Driver发送任务给CoarseGrainedExecutorBackend,不是直接给Executor,因为Executo不是一个消息循环体。

    Executor具体是如何工作的???

    Driver发送过来Task时,其实发送给了CoarseGrainedExecutorBackend这个RpcEndpoint,而不是直接发送给Executor(因为Executor不是消息循环体,永远无法直接接收远程发送过来的消息)。

    Driver向CoarseGrainedExecutorBackend发送LaunchTask,转过来交给线程池中的线程去执行。先判断Executor是否为空,为空则直接退出,不为空则反序列化任务调用Executor的launchTask,提交给Executor执行。

    launchTask接收到Task执行的命令后,首先将Task封装成TaskRunner里面,然后放到runningTasks,runningTasks是一个数据结构。

    Executor的run方法最终调用Task.run方法实现。Executor的run方法调用方法Task的run方法源码:

  8. ............................
    
    var threwException = true
    val (value, accumUpdates) = try {
      val res = task.run(
        taskAttemptId = taskId,
        attemptNumber = attemptNumber,
        metricsSystem = env.metricsSystem)
      threwException = false
      res
    }
    ...........................................

    类Task的run方法的源码:

  9. final def run(taskAttemptId: Long,attemptNumber: Int,metricsSystem: MetricsSystem)
    : (T, AccumulatorUpdates) = {
      context = new TaskContextImpl(
        stageId,partitionId,taskAttemptId,attemptNumber,taskMemoryManager,metricsSystem,
        internalAccumulators,runningLocally = false)
      TaskContext.setTaskContext(context)
      context.taskMetrics.setHostname(Utils.localHostName())
    context.taskMetrics.setAccumulatorsUpdater(context.collectInternalAccumulators)
      taskThread = Thread.currentThread()
      if (_killed) {
        kill(interruptThread = false)
      }
      try {//主要是调用runTask方法
        (runTask(context), context.collectAccumulators())
      } finally {
        context.markTaskCompleted()
    .................................

    Task是抽象类,Task的run方法是具体的实例方法,而Task的runTask方法是抽象方法,而Task的子类有ShuffleMapTask和ResultTask两个,根据任务的实际情况而定实现哪个子类的runTask方法。ShuffleMapTask与ResultTask的runTask方法的区别是对task是否进行shuffle操作(在runTask中是否执行shuffle的写操作)。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章