Spark On YARN啓動流程源碼分析

Sparn On Yarn啓動流程分析

1. YarnschedulerBackend啓動入口

YARN的啓動是在SparkContext初始化scheduler時啓動的,通過ClassLoader初始化YarnschedulerBackend和YARTaskscheduler。

    //scheduler的初始化, 調用createTaskScheduler()方法
    // Create and start the scheduler
    val (sched, ts) = SparkContext.createTaskScheduler(this, master, deployMode)
    _schedulerBackend = sched
    _taskScheduler = ts
    _dagScheduler = new DAGScheduler(this)
    _heartbeatReceiver.ask[Boolean](TaskSchedulerIsSet)

    // start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
    // constructor
    _taskScheduler.start()

    /**
   * Create a task scheduler based on a given master URL.
   * Return a 2-tuple of the scheduler backend and the task scheduler.
   */
   // 該方法根據master字符串進行匹配,如果是local/standalone模式,匹配響應的schedulerBackend和taskscheduler,
   // 如果是yarn,則走默認形式
  private def createTaskScheduler(
      sc: SparkContext,
      master: String,
      deployMode: String): (SchedulerBackend, TaskScheduler) = {
    import SparkMasterRegex._

    // When running locally, don't try to re-execute tasks on failure.
    val MAX_LOCAL_TASK_FAILURES = 1

    master match {
      case "local" =>
        val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
        val backend = new LocalSchedulerBackend(sc.getConf, scheduler, 1)
        scheduler.initialize(backend)
        (backend, scheduler)

      case LOCAL_N_REGEX(threads) =>
       ...
      case LOCAL_N_FAILURES_REGEX(threads, maxFailures) =>
        ...
      case SPARK_REGEX(sparkUrl) =>
        ...
      case LOCAL_CLUSTER_REGEX(numSlaves, coresPerSlave, memoryPerSlave) =>
       ...
      case masterUrl =>
         // 這個方法如何實現基於classLoader調用YarnClusterManager.class的(scala語法不熟,待考證)
        val cm = getClusterManager(masterUrl) match {
          case Some(clusterMgr) => clusterMgr
          case None => throw new SparkException("Could not parse Master URL: '" + master + "'")
        }
        try {
          val scheduler = cm.createTaskScheduler(sc, masterUrl)
          val backend = cm.createSchedulerBackend(sc, masterUrl, scheduler)
          cm.initialize(scheduler, backend)
          (backend, scheduler)
        } catch {
          case se: SparkException => throw se
          case NonFatal(e) =>
            throw new SparkException("External scheduler cannot be instantiated", e)
        }
    }
  }

  //getClusterManager()通過類加載,加載ExternalClusterManager類,同時過濾出可以構造出yarn類型的schedulerBackend和taskscheduler
   private def getClusterManager(url: String): Option[ExternalClusterManager] = {
    val loader = Utils.getContextOrSparkClassLoader
    val serviceLoaders =
      ServiceLoader.load(classOf[ExternalClusterManager], loader).asScala.filter(_.canCreate(url))
    if (serviceLoaders.size > 1) {
      throw new SparkException(
        s"Multiple external cluster managers registered for the url $url: $serviceLoaders")
    }
    serviceLoaders.headOption
  }


  // createTaskScheduler()函數真正返回的schedulerBackend和taskscheduler是通過下面這個class
  private[spark] class YarnClusterManager extends ExternalClusterManager{
  }

2. 創建ApplicationMaster

SparkContext初始化過程中,會向YARN集羣初始化Application(Master),流程如下:

 /**
   * Submit an application running our ApplicationMaster to the ResourceManager.
   *
   * The stable Yarn API provides a convenience method (YarnClient#createApplication) for
   * creating applications and setting up the application submission context. This was not
   * available in the alpha API.
   */
  def submitApplication(user: Option[String] = None): ApplicationId = {
    var appId: ApplicationId = null
    try {
      launcherBackend.connect()
      // Setup the credentials before doing anything else,
      // so we have don't have issues at any point.
      setupCredentials(user)
      yarnClient.init(yarnConf)
      yarnClient.start()
      sparkUser = user

      logInfo(s"[DEVELOP] [sparkUser:${sparkUser}] Requesting a new application " +
        s"from cluster with %d NodeManagers"
        .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))

      // Get a new application from our RM
      val newApp = yarnClient.createApplication()
      val newAppResponse = newApp.getNewApplicationResponse()
      appId = newAppResponse.getApplicationId()
      reportLauncherState(SparkAppHandle.State.SUBMITTED)
      launcherBackend.setAppId(appId.toString)

      new CallerContext("CLIENT", Option(appId.toString)).setCurrentContext()

      // Verify whether the cluster has enough resources for our AM
      verifyClusterResources(newAppResponse)

      // Set up the appropriate contexts to launch our AM

      // 關鍵是這兩個方法:
      // 1. 創建ApplicationMaster ContainerLaunch上下文,將ContainerLaunch命令、jar包、java變量等環境準備完畢;
      // 2. 創建Application提交至YARN的上下文,主要讀取配置文件設置調用YARN接口前的上下文變量。

      val containerContext = createContainerLaunchContext(newAppResponse)
      val appContext = createApplicationSubmissionContext(newApp, containerContext)

      // Finally, submit and monitor the application
      logInfo(s"Submitting application $appId to ResourceManager")
      yarnClient.submitApplication(appContext)
      appId
    } catch {
      case e: Throwable =>
        if (appId != null) {
          cleanupStagingDir(appId)
        }
        throw e
    }
  }

真正Application啓動是調用如下方法:

    val amClass =
      if (isClusterMode) {
        Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName
      } else {
        Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName
      }

3. 啓動ApplicationMaster

基於YARN-client的模式啓動,所以直接跳轉至org.apache.spark.deploy.yarn.ExecutorLauncher, 該類也是封裝在ApplicationMaseter中,順着main()函數往下走,調用ApplicationMaster.run()函數-> runExecutorLauncher(securityMgr)

  private def runExecutorLauncher(securityMgr: SecurityManager): Unit = {
    val port = sparkConf.getInt("spark.yarn.am.port", 0)

    // 創建RPCEndpoint同driver交互
    rpcEnv = RpcEnv.create("sparkYarnAM", Utils.localHostName, port, sparkConf, securityMgr,
      clientMode = true)
    val driverRef = waitForSparkDriver()
    // WHY?
    addAmIpFilter()

    // 關鍵函數,向Driver註冊AM
    registerAM(sparkConf, rpcEnv, driverRef, sparkConf.get("spark.driver.appUIAddress", ""),
      securityMgr)

    // In client mode the actor will stop the reporter thread.
    reporterThread.join()
  }



   private def registerAM(
      _sparkConf: SparkConf,
      _rpcEnv: RpcEnv,
      driverRef: RpcEndpointRef,
      uiAddress: String,
      securityMgr: SecurityManager) = {
    val appId = client.getAttemptId().getApplicationId().toString()
    val attemptId = client.getAttemptId().getAttemptId().toString()
    val historyAddress =
      _sparkConf.get(HISTORY_SERVER_ADDRESS)
        .map { text => SparkHadoopUtil.get.substituteHadoopVariables(text, yarnConf) }
        .map { address => s"${address}${HistoryServer.UI_PATH_PREFIX}/${appId}/${attemptId}" }
        .getOrElse("")

    val driverUrl = RpcEndpointAddress(
      _sparkConf.get("spark.driver.host"),
      _sparkConf.get("spark.driver.port").toInt,
      CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString

    // Before we initialize the allocator, let's log the information about how executors will
    // be run up front, to avoid printing this out for every single executor being launched.
    // Use placeholders for information that changes such as executor IDs.
    logInfo {
      val executorMemory = sparkConf.get(EXECUTOR_MEMORY).toInt
      val executorCores = sparkConf.get(EXECUTOR_CORES)

      //  申請Executor資源(debug log)
      val dummyRunner = new ExecutorRunnable(None, yarnConf, sparkConf, driverUrl, "<executorId>",
        "<hostname>", executorMemory, executorCores, appId, securityMgr, localResources)
      dummyRunner.launchContextDebugInfo()
    }

    //向RM註冊driver地址
    allocator = client.register(driverUrl,
      driverRef,
      yarnConf,
      _sparkConf,
      uiAddress,
      historyAddress,
      securityMgr,
      localResources)

    //申請Executor資源
    allocator.allocateResources()
    reporterThread = launchReporterThread()
  }

調用yarn RM接口完成資源申請,同時初始化ApplicationMaster容器:

 /**
   * Request resources such that, if YARN gives us all we ask for, we'll have a number of containers
   * equal to maxExecutors.
   *
   * Deal with any containers YARN has granted to us by possibly launching executors in them.
   *
   * This must be synchronized because variables read in this method are mutated by other methods.
   */
  def allocateResources(): Unit = synchronized {
    updateResourceRequests()

    val progressIndicator = 0.1f
    // Poll the ResourceManager. This doubles as a heartbeat if there are no pending container
    // requests.
    // 調用YARN接口,分配container
    val allocateResponse = amClient.allocate(progressIndicator)

     // 獲取分配container資源狀態
    val allocatedContainers = allocateResponse.getAllocatedContainers()

    if (allocatedContainers.size > 0) {
      logInfo("Allocated containers: %d. Current executor count: %d. Cluster resources: %s."
        .format(
          allocatedContainers.size,
          numExecutorsRunning,
          allocateResponse.getAvailableResources))

        // 當申請完畢資源後,處理函數:會初始化該executor環境,等待分配task       
       handleAllocatedContainers(allocatedContainers.asScala)
    }

    val completedContainers = allocateResponse.getCompletedContainersStatuses()
    if (completedContainers.size > 0) {
      logInfo("Completed %d containers".format(completedContainers.size))
      processCompletedContainers(completedContainers.asScala)
      logInfo("Finished processing %d completed containers. Current running executor count: %d."
        .format(completedContainers.size, numExecutorsRunning))
    }
  }

繼續往下走,當想RM申請完資源後,會調用ExecutorLaunch初始化Executor環境,具體如下:

/**
   * Handle containers granted by the RM by launching executors on them.
   *
   * Due to the way the YARN allocation protocol works, certain healthy race conditions can result
   * in YARN granting containers that we no longer need. In this case, we release them.
   *
   * Visible for testing.
   */
  def handleAllocatedContainers(allocatedContainers: Seq[Container]): Unit = {
    val containersToUse = new ArrayBuffer[Container](allocatedContainers.size)

    // Match incoming requests by host
    val remainingAfterHostMatches = new ArrayBuffer[Container]
    for (allocatedContainer <- allocatedContainers) {
      matchContainerToRequest(allocatedContainer, allocatedContainer.getNodeId.getHost,
        containersToUse, remainingAfterHostMatches)
    }

    // Match remaining by rack
    val remainingAfterRackMatches = new ArrayBuffer[Container]
    for (allocatedContainer <- remainingAfterHostMatches) {
      val rack = RackResolver.resolve(conf, allocatedContainer.getNodeId.getHost).getNetworkLocation
      matchContainerToRequest(allocatedContainer, rack, containersToUse,
        remainingAfterRackMatches)
    }

    // Assign remaining that are neither node-local nor rack-local
    val remainingAfterOffRackMatches = new ArrayBuffer[Container]
    for (allocatedContainer <- remainingAfterRackMatches) {
      matchContainerToRequest(allocatedContainer, ANY_HOST, containersToUse,
        remainingAfterOffRackMatches)
    }

    if (!remainingAfterOffRackMatches.isEmpty) {
      logDebug(s"Releasing ${remainingAfterOffRackMatches.size} unneeded containers that were " +
        s"allocated to us")
      for (container <- remainingAfterOffRackMatches) {
        internalReleaseContainer(container)
      }
    }

     // 以上執行爲剔除不可用的container之後最終執行可以使用的Container
    runAllocatedContainers(containersToUse)

    logInfo("Received %d containers from YARN, launching executors on %d of them."
      .format(allocatedContainers.size, containersToUse.size))
  }


  /**
   * Launches executors in the allocated containers.
   */
  private def runAllocatedContainers(containersToUse: ArrayBuffer[Container]): Unit = {
    for (container <- containersToUse) {
      executorIdCounter += 1
      val executorHostname = container.getNodeId.getHost
      val containerId = container.getId
      val executorId = executorIdCounter.toString

      assert(container.getResource.getMemory >= resource.getMemory)
      logInfo(s"Launching container $containerId on host $executorHostname")

      def updateInternalState(): Unit = synchronized {
        numExecutorsRunning += 1
        executorIdToContainer(executorId) = container
        containerIdToExecutorId(container.getId) = executorId

        val containerSet = allocatedHostToContainersMap.getOrElseUpdate(executorHostname,
          new HashSet[ContainerId])
        containerSet += containerId
        allocatedContainerToHostMap.put(containerId, executorHostname)
      }

      if (numExecutorsRunning < targetNumExecutors) {
        if (launchContainers) {
            // 將創建exector任務提交至線程池
          launcherPool.execute(new Runnable {

           // 真正完成executer初始化的是ExecutorRunnable()類
            override def run(): Unit = {
              try {
                new ExecutorRunnable(
                  Some(container),
                  conf,
                  sparkConf,
                  driverUrl,
                  executorId,
                  executorHostname,
                  executorMemory,
                  executorCores,
                  appAttemptId.getApplicationId.toString,
                  securityMgr,
                  localResources
                ).run()
                updateInternalState()
              } catch {
                case NonFatal(e) =>
                  logError(s"Failed to launch executor $executorId on container $containerId", e)
                  // Assigned container should be released immediately to avoid unnecessary resource
                  // occupation.
                  amClient.releaseAssignedContainer(containerId)
              }
            }
          })
        } else {
          // For test only
          updateInternalState()
        }
      } else {
        logInfo(("Skip launching executorRunnable as runnning Excecutors count: %d " +
          "reached target Executors count: %d.").format(numExecutorsRunning, targetNumExecutors))
      }
    }
  }

4. Executor的啓動

在ExecutorRunnable.run()方法中,會啓動executor的執行命令,具體如下:

private def prepareCommand(): List[String] = {
    // Extra options for the JVM
    val javaOpts = ListBuffer[String]()

    // java/spark  運行時環境變量
    ....

    YarnSparkHadoopUtil.addOutOfMemoryErrorArgument(javaOpts)

    // executor真正的啓動命令,真正調用的是`org.apache.spark.executor.CoarseGrainedExecutorBackend`

    val commands = prefixEnv ++ Seq(
      YarnSparkHadoopUtil.expandEnvironment(Environment.JAVA_HOME) + "/bin/java",
      "-server") ++
      javaOpts ++
      Seq("org.apache.spark.executor.CoarseGrainedExecutorBackend",
        "--driver-url", masterAddress,
        "--executor-id", executorId,
        "--hostname", hostname,
        "--cores", executorCores.toString,
        "--app-id", appId) ++
      userClassPath ++
      Seq(
        s"1>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stdout",
        s"2>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stderr")

    // TODO: it would be nicer to just make sure there are no null commands here
    commands.map(s => if (s == null) "null" else s).toList
  }

org.apache.spark.executor.CoarseGrainedExecutorBackend的實現邏輯比較簡單,在run()函數中創建了一個RPCEndPoint,等待LaunchTask(data)消息接受,接受之後,調用exector.launchTask()執行任務,執行任務的流程則是將task加入runningTasks,並調用threadPool進行execute。

二、運行結果

YARN集羣的日誌由於分散在多臺機器上,比較分散,所以想通過日誌來跟蹤啓動流程比較困難,但是如果集羣小的話,通過這個方式來驗證整個流程還是挺不錯的方式。

1. ApplicationMaster日誌

ApplicationMaster的執行日誌,可以看到最終調用的org.apache.spark.executor.CoarseGrainedExecutorBackend 來啓動executor。

17/05/05 16:54:58 INFO ApplicationMaster: Preparing Local resources
17/05/05 16:54:59 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
17/05/05 16:54:59 WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
17/05/05 16:54:59 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1493803865684_0180_000002
17/05/05 16:54:59 INFO SecurityManager: Changing view acls to: hzlishuming
17/05/05 16:54:59 INFO SecurityManager: Changing modify acls to: hzlishuming
17/05/05 16:54:59 INFO SecurityManager: Changing view acls groups to: 
17/05/05 16:54:59 INFO SecurityManager: Changing modify acls groups to: 
17/05/05 16:54:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hzlishuming); groups with view permissions: Set(); users  with modify permissions: Set(hzlishuming); groups with modify permissions: Set()
17/05/05 16:54:59 INFO AMCredentialRenewer: Scheduling login from keytab in 61745357 millis.
17/05/05 16:54:59 INFO ApplicationMaster: Waiting for Spark driver to be reachable.
17/05/05 16:54:59 INFO ApplicationMaster: Driver now available: xxxx:47065
17/05/05 16:54:59 INFO TransportClientFactory: Successfully created connection to /xxxx:47065 after 110 ms (0 ms spent in bootstraps)
17/05/05 16:54:59 INFO ApplicationMaster$AMEndpoint: Add WebUI Filter. AddWebUIFilter(org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,Map(PROXY_HOSTS -> ....)
17/05/05 16:55:00 INFO ApplicationMaster: 
===============================================================================
YARN executor launch context:
  env:
    CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
    SPARK_YARN_STAGING_DIR -> hdfs://hz-test01/user/hzlishuming/.sparkStaging/application_1493803865684_0180
    SPARK_USER -> hzlishuming
    SPARK_YARN_MODE -> true

  command:
    {{JAVA_HOME}}/bin/java \ 
      -server \ 
      -Xmx4096m \ 
      '-XX:PermSize=1024m' \ 
      '-XX:MaxPermSize=1024m' \ 
      '-verbose:gc' \ 
      '-XX:+PrintGCDetails' \ 
      '-XX:+PrintGCDateStamps' \ 
      '-XX:+PrintTenuringDistribution' \ 
      -Djava.io.tmpdir={{PWD}}/tmp \ 
      '-Dspark.driver.port=47065' \ 
      -Dspark.yarn.app.container.log.dir=<LOG_DIR> \ 
      -XX:OnOutOfMemoryError='kill %p' \ 
      org.apache.spark.executor.CoarseGrainedExecutorBackend \ 
      --driver-url \ 
      spark://CoarseGrainedScheduler@....:47065 \ 
      --executor-id \ 
      <executorId> \ 
      --hostname \ 
      <hostname> \ 
      --cores 

2. Driver日誌

在Driver端,註冊完executor之後留下日誌如下:

 433 17/05/05 16:04:59 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) () with ID 1
 434 17/05/05 16:04:59 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) () with ID 2
 435 17/05/05 16:04:59 INFO BlockManagerMasterEndpoint: Registering block manager xxxx with 2004.6 MB RAM, BlockManagerId(1, h, 54063, None)
 436 17/05/05 16:04:59 INFO BlockManagerMasterEndpoint: Registering block manager xxxx with 2004.6 MB RAM, BlockManagerId(2, xxx, 42904, None)

3. Executor日誌

executor的啓動日誌,可以通過SparkUI上查看,處理流程上面已經交代,執行的爲 org.apache.spark.executor.CoarseGrainedExecutorBackend邏輯。

17/05/05 16:55:15 INFO MemoryStore: MemoryStore started with capacity 2004.6 MB
17/05/05 16:55:16 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@xxx.35:47065
17/05/05 16:55:16 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
17/05/05 16:55:16 INFO Executor: Starting executor ID 4 on host hadoop694.lt.163.org
17/05/05 16:55:16 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40418.
17/05/05 16:55:16 INFO NettyBlockTransferService: Server created on xxx:40418
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章