Spark任務提交流程

Spark的任務, 生產環境中一般提交到Yarn上執行. 具體流程如下圖所示

1、client提交任務到RM.

2、RM啓動AM.

3、AM啓動Driver線程, 並向RM申請資源.

4、RM返回可用資源列表.

5、AM通過nmClient啓動Container, 並且啓動CoraseGrainedExecutorBackend後臺進程.

6、Executor反向註冊給Driver

7、Executor啓動任務

具體用法

spark-submit.sh內部是執行了org.apache.spark.deploy.SparkSubmit這個類.(不再贅述, 感興趣的同學可以vim看下)

我們在idea中找到這個類, 並定位main函數, 得到以下代碼.

  override def main(args: Array[String]): Unit = {
    appArgs.action match {
      case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
    }
  }

appArgs.action, 初始化的時候有賦值,

// Action should be SUBMIT unless otherwise specified
action = Option(action).getOrElse(SUBMIT)

我們直接點擊submit(appArgs, uninitLog), 跳轉到對應的方法.

private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
    val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)

    def doRunMain(): Unit = {
      if (args.proxyUser != null) {
        val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser,
          UserGroupInformation.getCurrentUser())
        try {
          proxyUser.doAs(new PrivilegedExceptionAction[Unit]() {
            override def run(): Unit = {
              runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose)
            }
          })
        } catch {
          case e: Exception =>
            // Hadoop's AuthorizationException suppresses the exception's stack trace, which
            // makes the message printed to the output by the JVM not very helpful. Instead,
            // detect exceptions with empty stack traces here, and treat them differently.
            if (e.getStackTrace().length == 0) {
              // scalastyle:off println
              printStream.println(s"ERROR: ${e.getClass().getName()}: ${e.getMessage()}")
              // scalastyle:on println
              exitFn(1)
            } else {
              throw e
            }
        }
      } else {
        runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose)
      }
    }

    // Let the main class re-initialize the logging system once it starts.
    if (uninitLog) {
      Logging.uninitialize()
    }

    // In standalone cluster mode, there are two submission gateways:
    //   (1) The traditional RPC gateway using o.a.s.deploy.Client as a wrapper
    //   (2) The new REST-based gateway introduced in Spark 1.3
    // The latter is the default behavior as of Spark 1.3, but Spark submit will fail over
    // to use the legacy gateway if the master endpoint turns out to be not a REST server.
    if (args.isStandaloneCluster && args.useRest) {
      try {
        // scalastyle:off println
        printStream.println("Running Spark using the REST application submission protocol.")
        // scalastyle:on println
        doRunMain()
      } catch {
        // Fail over to use the legacy submission gateway
        case e: SubmitRestConnectionException =>
          printWarning(s"Master endpoint ${args.master} was not a REST server. " +
            "Falling back to legacy submission gateway instead.")
          args.useRest = false
          submit(args, false)
      }
    // In all other modes, just run the main class as prepared
    } else {
      doRunMain()
    }
  }

我們主要關心兩個,

其一是val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args), 這是一個模式匹配, childMainClass在Yarn模式下, 對應的類是"org.apache.spark.deploy.yarn.YarnClusterApplication", 任務提交其實就是提交基於這個類的java進程command命令.

其二是runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose), 這是我們下一步需要點擊的, 這個類中, 我們需要關注

val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) {
  mainClass.newInstance().asInstanceOf[SparkApplication]
}

這裏面的mainClass就是上面的"org.apache.spark.deploy.yarn.YarnClusterApplication", 我們通過反射的方法, 獲得這個類的實例, 然後執行

app.start(childArgs.toArray, sparkConf)

我們進入"org.apache.spark.deploy.yarn.YarnClusterApplication"類.

private[spark] class YarnClusterApplication extends SparkApplication {

  override def start(args: Array[String], conf: SparkConf): Unit = {
    // SparkSubmit would use yarn cache to distribute files & jars in yarn mode,
    // so remove them from sparkConf here for yarn mode.
    conf.remove("spark.jars")
    conf.remove("spark.files")

    new Client(new ClientArguments(args), conf).run()
  }

}

這個類重寫了start方法, 然後new Client(new ClientArguments(args), conf).run()

run方法中有一行代碼

this.appId = submitApplication()

這個方法主要實現了submit的提交, 例如實例化一個yarnClient, 設置啓動AM的參數(command命令), 真正提交的是這些命令.

其中集羣模式下, 對應的是"org.apache.spark.deploy.yarn.ApplicationMaster"類, 然後提交yarnClient.submitApplication(appContext).

  def submitApplication(): ApplicationId = {
    var appId: ApplicationId = null
    try {
      launcherBackend.connect()
      // Setup the credentials before doing anything else,
      // so we have don't have issues at any point.
      setupCredentials()
      yarnClient.init(hadoopConf)
      yarnClient.start()

      logInfo("Requesting a new application from cluster with %d NodeManagers"
        .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))

      // Get a new application from our RM
      val newApp = yarnClient.createApplication()
      val newAppResponse = newApp.getNewApplicationResponse()
      appId = newAppResponse.getApplicationId()

      new CallerContext("CLIENT", sparkConf.get(APP_CALLER_CONTEXT),
        Option(appId.toString)).setCurrentContext()

      // Verify whether the cluster has enough resources for our AM
      verifyClusterResources(newAppResponse)

      // Set up the appropriate contexts to launch our AM
      val containerContext = createContainerLaunchContext(newAppResponse)
      val appContext = createApplicationSubmissionContext(newApp, containerContext)

      // Finally, submit and monitor the application
      logInfo(s"Submitting application $appId to ResourceManager")
      yarnClient.submitApplication(appContext)
      launcherBackend.setAppId(appId.toString)
      reportLauncherState(SparkAppHandle.State.SUBMITTED)

      appId
    } catch {
      case e: Throwable =>
        if (appId != null) {
          cleanupStagingDir(appId)
        }
        throw e
    }
  }

我們進入"org.apache.spark.deploy.yarn.ApplicationMaster"類, 找到main函數

  def main(args: Array[String]): Unit = {
    SignalUtils.registerLogger(log)
    val amArgs = new ApplicationMasterArguments(args)
    master = new ApplicationMaster(amArgs)
    System.exit(master.run())
  }

我們順着master.run, 一路點擊, 找到, 這裏有幾個關鍵的地方,

其一是userClassThread = startUserApplication(), 這就是啓動我們的driver線程, 通過反射啓動我們的--class 後面對應的類, 並且給這個線程命名爲"Driver".

其二是registerAM(sc.getConf, rpcEnv, driverRef, sc.ui.map(_.webUrl)), 這是註冊AM, 即AM向RM註冊, 得到可用的資源用於run後續程序.

其三是allocator.allocateResources(), 這個操作是申請資源後, 實行本地化策略, 即節點本地化, 機架本地化等.

其四是runAllocatedContainers(containersToUse), 即獲得可用資源後, 啓動NM以及對應的container

private def runDriver(): Unit = {
    addAmIpFilter(None)
    userClassThread = startUserApplication()

    // This a bit hacky, but we need to wait until the spark.driver.port property has
    // been set by the Thread executing the user class.
    logInfo("Waiting for spark context initialization...")
    val totalWaitTime = sparkConf.get(AM_MAX_WAIT_TIME)
    try {
      val sc = ThreadUtils.awaitResult(sparkContextPromise.future,
        Duration(totalWaitTime, TimeUnit.MILLISECONDS))
      if (sc != null) {
        rpcEnv = sc.env.rpcEnv
        val driverRef = createSchedulerRef(
          sc.getConf.get("spark.driver.host"),
          sc.getConf.get("spark.driver.port"))
        registerAM(sc.getConf, rpcEnv, driverRef, sc.ui.map(_.webUrl))
        registered = true
      } else {
        // Sanity check; should never happen in normal operation, since sc should only be null
        // if the user app did not create a SparkContext.
        throw new IllegalStateException("User did not initialize spark context!")
      }
      resumeDriver()
      userClassThread.join()
    } catch {
      case e: SparkException if e.getCause().isInstanceOf[TimeoutException] =>
        logError(
          s"SparkContext did not initialize after waiting for $totalWaitTime ms. " +
           "Please check earlier log output for errors. Failing the application.")
        finish(FinalApplicationStatus.FAILED,
          ApplicationMaster.EXIT_SC_NOT_INITED,
          "Timed out waiting for SparkContext.")
    } finally {
      resumeDriver()
    }
  }

我們看下"runAllocatedContainers(containersToUse)"的實現

      if (runningExecutors.size() < targetNumExecutors) {
        numExecutorsStarting.incrementAndGet()
        if (launchContainers) {
          launcherPool.execute(new Runnable {
            override def run(): Unit = {
              try {
                new ExecutorRunnable(
                  Some(container),
                  conf,
                  sparkConf,
                  driverUrl,
                  executorId,
                  executorHostname,
                  executorMemory,
                  executorCores,
                  appAttemptId.getApplicationId.toString,
                  securityMgr,
                  localResources
                ).run()
                updateInternalState()
              } catch {
                case e: Throwable =>
                  numExecutorsStarting.decrementAndGet()
                  if (NonFatal(e)) {
                    logError(s"Failed to launch executor $executorId on container $containerId", e)
                    // Assigned container should be released immediately
                    // to avoid unnecessary resource occupation.
                    amClient.releaseAssignedContainer(containerId)
                  } else {
                    throw e
                  }
              }
            }
          })
        } else {
          // For test only
          updateInternalState()
        }
      }

再看下new ExecutorRunnable().run的實現

  def run(): Unit = {
    logDebug("Starting Executor Container")
    nmClient = NMClient.createNMClient()
    nmClient.init(conf)
    nmClient.start()
    startContainer()
  }

我們分析出, 獲得資源後, 首先在資源本地啓動一個NMClient, 然後創建container.

startContainer()代碼太長, 其內部主要就是把"org.apache.spark.executor.CoarseGrainedExecutorBackend"啓動封裝一個command命令, 通過nmClient.startContainer(container.get, ctx)啓動這個進程.(ctx封裝了上訴類)

我們切入"org.apache.spark.executor.CoarseGrainedExecutorBackend"類, 找到main函數.

def main(args: Array[String]) {
    run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)
    System.exit(0)
  }

private def run(
      driverUrl: String,
      executorId: String,
      hostname: String,
      cores: Int,
      appId: String,
      workerUrl: Option[String],
      userClassPath: Seq[URL]) {


    SparkHadoopUtil.get.runAsSparkUser { () =>

      val env = SparkEnv.createExecutorEnv(
        driverConf, executorId, hostname, cores, cfg.ioEncryptionKey, isLocal = false)

      env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
        env.rpcEnv, driverUrl, executorId, hostname, cores, userClassPath, env))
      workerUrl.foreach { url =>
        env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))
      }
      env.rpcEnv.awaitTermination()
    }
  }

run函數中, 主要是啓動一個名爲"Executore"的終端, 這個一個CoarseGrainedExecutorBackend類.

CoarseGrainedExecutorBackend是一個RPC的終端, 它的生命週期是constructor -> onStart -> receive* -> onStop.

所以我們先看下onStart方法,

主要是向Driver註冊

  override def onStart() {
    rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
      // This is a very fast action so we can use "ThreadUtils.sameThread"
      driver = Some(ref)
      ref.ask[Boolean](RegisterExecutor(executorId, self, hostname, cores, extractLogUrls))
    }(ThreadUtils.sameThread).onComplete {
      // This is a very fast action so we can use "ThreadUtils.sameThread"
      case Success(msg) =>
        // Always receive `true`. Just ignore it
      case Failure(e) =>
        exitExecutor(1, s"Cannot register with driver: $driverUrl", e, notifyDriver = false)
    }(ThreadUtils.sameThread)
  }

receive方法

剛註冊成功, 得到反饋消息, 實例化Executor.

模式匹配是LaunchTask(data)的時候, 上訴的executor開始執行任務

override def receive: PartialFunction[Any, Unit] = {
    case RegisteredExecutor =>
      try {
        executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
      } 

    case LaunchTask(data) =>
      if (executor == null) {
        exitExecutor(1, "Received LaunchTask command but executor was null")
      } else {
        val taskDesc = TaskDescription.decode(data.value)
        logInfo("Got assigned task " + taskDesc.taskId)
        executor.launchTask(this, taskDesc)
      }
  }

上訴流程基本就是Spark的任務提交流程總覽.

我們再總結下Yarn集羣模式下, 使用spark-submit提交的任務流程.

1、client提交任務到RM.

2、RM啓動AM.

3、AM啓動Driver線程, 並向RM申請資源.

4、RM返回可用資源列表.

5、AM通過nmClient啓動Container, 並且啓動CoraseGrainedExecutorBackend後臺進程.

6、Executor反向註冊給Driver

7、Executor啓動任務

Spark任務提交流程

通過HPA+CronHPA組合應對業務複雜彈性伸縮場景

Spark任務提交流程(整理版)

Spark任務提交流程

SparkSQL優化之輸入小文件是否需要合併?

Spark-Core、Spark-SQL的內核機制

SparkSQL 如何把sqlText轉化成RDD可以執行的tasks 系列

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結