Spark源碼剖析——SparkSubmit提交流程

Spark源碼剖析——SparkSubmit提交流程

當前環境與版本

環境 版本
JDK java version “1.8.0_231” (HotSpot)
Scala Scala-2.11.12
Spark spark-2.4.4

前言

  • 運行Spark應用時,通常我們會利用./bin/spark-submit進行提交任務,例如
    spark-submit \
    --master yarn --deploy-mode cluster \
    --num-executors 10 --executor-memory 8G --executor-cores 4 \
    --driver-memory 4G \
    --conf spark.network.timeout=300 \
    --class com.skey.spark.app.MyApp /home/jerry/spark-demo.jar
    
  • 在運行spark-submit後,程序將爲我們解析參數,根據不同的部署模式,採用不同的方法提交Spark應用到集羣中,例如
    • Standalone
      • client -> 在本地運行用戶編寫的類的Main方法,即在本地啓動Driver
      • cluster -> 利用ClientApp向集羣申請節點,用於啓動Driver
    • ON YARN
      • client -> 在本地運行,同Standalone
      • cluster -> 利用YarnClusterApplication向集羣申請節點,用於啓動Driver
  • SparkSubmit的整體的提交流程圖如下
    SparkSubmit流程圖
  • 下面我們就來看看SparkSubmit任務提交流程的源碼

Shell命令部分

  • 首先,我們會調用./bin/spark-submit的shell命令,傳參並進行提交,而其中首先調用了./bin/spark-class,主要的代碼部分如下
    exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"
    
  • 要注意的是,此處向spark-class傳入了參數org.apache.spark.deploy.SparkSubmit
  • ./bin/spark-class首先會利用/bin/load-spark-env.sh加載環境變量,代碼如下
    . "${SPARK_HOME}"/bin/load-spark-env.sh
    
  • 其中重要的是/bin/load-spark-env.sh調用了./conf/spark-env.sh,也就是我們平常配置的各種默認環境參數,例如SPARK_MASTER_HOST、SPARK_WORKER_MEMORY、HADOOP_CONF_DIR等等。因此,我們可以知道每一個應用提交時都會重新讀取該配置文件。
  • 接着,./bin/spark-class會去尋找Java命令、Jar文件、啓動Java進程等,其中最主要的代碼如下
    build_command() {
      # 此RUNNER是前面解析的java命令
      # LAUNCH_CLASSPATH一般是SPARK_HOME/jars/*
      # org.apache.spark.launcher.Main將會按Null字符('\0')分隔打印解析的參數
      # "$@" 此處既是前面spark-submit傳入的參數,需要注意的是第一個參數是 org.apache.spark.deploy.SparkSubmit
      "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"
      printf "%d\0" $?
    }
    
    set +o posix
    CMD=()
    # 執行(build_command "$@"),將結果重定向到while循環中
    # while中由read命令處理接收到的字符串(按Null字符分隔)
    # 將結果添加到CMD列表中
    while IFS= read -d '' -r ARG; do
      CMD+=("$ARG")
    done < <(build_command "$@")
    
    // 省略部分代碼……
    # 執行命令
    CMD=("${CMD[@]:0:$LAST}")
    exec "${CMD[@]}"
    

參數解析 Main

  • org.apache.spark.launcher.Main
  • 此類主要用於參數解析,以及按不同的模式輸出運行命令,代碼不多,如下
    class Main {
    
      public static void main(String[] argsArray) throws Exception {
        checkArgument(argsArray.length > 0, "Not enough arguments: missing class name.");
    
        List<String> args = new ArrayList<>(Arrays.asList(argsArray));
        // 獲取接收的第一個參數,即前面傳入的org.apache.spark.deploy.SparkSubmit
        String className = args.remove(0);
    
        boolean printLaunchCommand = !isEmpty(System.getenv("SPARK_PRINT_LAUNCH_COMMAND"));
        Map<String, String> env = new HashMap<>();
        List<String> cmd;
        if (className.equals("org.apache.spark.deploy.SparkSubmit")) {
          try {
            // 解析參數,例如--class、--conf等
            // 並構建命令
            AbstractCommandBuilder builder = new SparkSubmitCommandBuilder(args);
            // cmd中主要添加了java -cp classpath org.apache.spark.deploy.SparkSubmit
            cmd = buildCommand(builder, env, printLaunchCommand);
          } catch (IllegalArgumentException e) {
            // 省略部分代碼
          }
        } else {
          // 如果自定義了SparkSubmit,則走此部分
          AbstractCommandBuilder builder = new SparkClassCommandBuilder(className, args);
          cmd = buildCommand(builder, env, printLaunchCommand);
        }
    
    	// 在不同操作系統環境下,按不同方式打印輸出
        if (isWindows()) {
          System.out.println(prepareWindowsCommand(cmd, env));
        } else {
          List<String> bashCmd = prepareBashCommand(cmd, env);
          for (String c : bashCmd) {
            System.out.print(c);
            System.out.print('\0'); // 使用Null字符進行分隔
          }
        }
      }
      
      // 省略部分代碼
    }
    
  • 最後,打印的參數(主要包括java -cp classpath org.apache.spark.deploy.SparkSubmit等)將被./bin/spark-class中的CMD列表接收,並使用exec執行。

SparkSubmit

  • org.apache.spark.deploy.SparkSubmit

  • 此類就是真正進行Spark應用提交的類了,正如前面部分所說,此類對接收的參數進行解析,並根據不同的模式進行應用提交。

  • SparkSubmit有一個class以及一個伴生對象,首先我們看到其伴生對象的main方法中,此處即是java進程的入口

    override def main(args: Array[String]): Unit = {
    // 實例化SparkSubmit,並重寫部分方法
    val submit = new SparkSubmit() {
      self => // 爲this定義一個別名,方便傳入SparkSubmitArguments
    
      override protected def parseArguments(args: Array[String]): SparkSubmitArguments = {
        // 重寫SparkSubmitArguments的日誌打印方法
        // 使其打印時調用SparkSubmit的logInfo、logWarning
        new SparkSubmitArguments(args) {
          override protected def logInfo(msg: => String): Unit = self.logInfo(msg)
    
          override protected def logWarning(msg: => String): Unit = self.logWarning(msg)
        }
      }
    
      override protected def logInfo(msg: => String): Unit = printMessage(msg)
    
      override protected def logWarning(msg: => String): Unit = printMessage(s"Warning: $msg")
    
      override def doSubmit(args: Array[String]): Unit = {
        try {
          // 此處還是調用的父類的doSubmit,只不過後面添加了對異常的處理
          super.doSubmit(args)
        } catch {
          case e: SparkUserAppException =>
            exitFn(e.exitCode)
        }
      }
    
    }
    // 調用 SparkSubmit的doSubmit進行任務提交
    submit.doSubmit(args)
    }
    
  • 此處代碼最終會調用SparkSubmit的doSubmit,其代碼如下

    def doSubmit(args: Array[String]): Unit = {
    // Initialize logging if it hasn't been done yet. Keep track of whether logging needs to
    // be reset before the application starts.
    val uninitLog = initializeLogIfNecessary(true, silent = true)
    
    // parseArguments會實例化SparkSubmitArguments
    // 需要注意的是前面的伴生對象中已經重寫過SparkSubmitArguments中的日誌方法
    val appArgs = parseArguments(args)
    if (appArgs.verbose) {
      logInfo(appArgs.toString)
    }
    // 提交任務時,現在action是走的SparkSubmitAction.SUBMIT
    // 有興趣的朋友可以看看,SUBMIT由SparkSubmitArguments中的loadEnvironmentArguments方法解析得到
    appArgs.action match {
      case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
      case SparkSubmitAction.KILL => kill(appArgs)
      case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
      case SparkSubmitAction.PRINT_VERSION => printVersion()
    }
    }
    
  • 由於我們現在是提交任務,此部分代碼將會接着調用submit,代碼如下

    private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
      // 定義了一個doRunMain,等下會被調用
      // 最終會調用runMain(...)方法
      def doRunMain(): Unit = {
        // proxyUser是指的代理用戶,由--proxy-user指定
        // 主要用於冒充其他用戶的名稱,例如本用戶是jerry,但是你可以冒充爲tom,越過用戶權限,處理tom的文件
        if (args.proxyUser != null) {
          val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser,
            UserGroupInformation.getCurrentUser())
          try {
            proxyUser.doAs(new PrivilegedExceptionAction[Unit]() {
              override def run(): Unit = {
                runMain(args, uninitLog)
              }
            })
          } catch {
            // 省略部分代碼
          }
        } else {
          runMain(args, uninitLog)
        }
      }
    
      // 判斷啓動模式,不過最終都會調用doRunMain()方法
      if (args.isStandaloneCluster && args.useRest) {
        try {
          logInfo("Running Spark using the REST application submission protocol.")
          doRunMain()
        } catch {
          // 省略部分代碼
        }
      } else {
        doRunMain()
      }
    }
    
  • 可以看到submit主要是針對是否使用代理用戶進行了處理,最後調用了runMain(…)方法,此方法就是SparkSubmit的核心了,其代碼如下

    private def runMain(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
      // prepareSubmitEnvironment用於解析參數,主要決定了啓動的模式
      val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)
      
      // 省略部分代碼
    
       // 決定使用的ClassLoader,由參數spark.driver.userClassPathFirst決定,默認爲false
      val loader =
        if (sparkConf.get(DRIVER_USER_CLASS_PATH_FIRST)) {
          // 該ClassLoader會優先使用用戶提供的jar包
          new ChildFirstURLClassLoader(new Array[URL](0),
            Thread.currentThread.getContextClassLoader)
        } else {
            // 默認的ClassLoader
            new MutableURLClassLoader(new Array[URL](0),
            Thread.currentThread.getContextClassLoader)
        }
      Thread.currentThread.setContextClassLoader(loader)
    
      for (jar <- childClasspath) {
        addJarToClasspath(jar, loader)
      }
    
      var mainClass: Class[_] = null
    
      try {
        // 根據參數childMainClass獲取類對象
        mainClass = Utils.classForName(childMainClass)
      } catch {
        // 省略部分代碼
      }
     
      val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) {
        // 如果mainClass是SparkApplication,那就實例化它
        mainClass.newInstance().asInstanceOf[SparkApplication]
      } else {
        // 否則使用JavaMainApplication包裝它
        if (classOf[scala.App].isAssignableFrom(mainClass)) {
          logWarning("Subclasses of scala.App may not work correctly. Use a main() method instead.")
        }
        new JavaMainApplication(mainClass)
      }
    
      // 省略部分代碼
    
      try {
        // 調用start方法
        // start中對class進行了反射,並調用了其main方法
        app.start(childArgs.toArray, sparkConf)
      } catch {
        case t: Throwable =>
          throw findCause(t)
      }
    }
    
  • 顯然,runMain(…)中最爲重要的變量是childMainClass,因爲它決定了接下來要運行的類。爲了看看它到底是什麼類,我們追蹤到prepareSubmitEnvironment(…)方法中來看,其中也有一個變量childMainClass,它就是等要返回的變量。下面我們將圍繞childMainClass進行分析。

    • 判斷Client模式
      if (deployMode == CLIENT) {
        // 如果CLIENT模式,直接將args.mainClass賦值給childMainClass
        // args.mainClass也就是我們提交時--class指定的類
        childMainClass = args.mainClass
        if (localPrimaryResource != null && isUserJar(localPrimaryResource)) {
          childClasspath += localPrimaryResource
        }
        if (localJars != null) { childClasspath ++= localJars.split(",") }
      }
      
    • 判斷StandaloneCluster模式
      // 先看是否是StandaloneCluster模式
      if (args.isStandaloneCluster) {
        if (args.useRest) {
           // 如果是REST,那就使用 org.apache.spark.deploy.rest.RestSubmissionClientApp
          childMainClass = REST_CLUSTER_SUBMIT_CLASS
          // 傳入 args.mainClass
          childArgs += (args.primaryResource, args.mainClass)
        } else {
          // 否則,使用 org.apache.spark.deploy.ClientApp
          childMainClass = STANDALONE_CLUSTER_SUBMIT_CLASS
          if (args.supervise) { childArgs += "--supervise" }
          Option(args.driverMemory).foreach { m => childArgs += ("--memory", m) }
          Option(args.driverCores).foreach { c => childArgs += ("--cores", c) }
          childArgs += "launch"
          // 傳入 args.mainClass
          childArgs += (args.master, args.primaryResource, args.mainClass)
        }
        if (args.childArgs != null) {
          childArgs ++= args.childArgs
        }
      }
      
      • 判斷YarnCluster模式
       if (isYarnCluster) {
         // 如果是YarnCluster模式,使用org.apache.spark.deploy.yarn.YarnClusterApplication
         childMainClass = YARN_CLUSTER_SUBMIT_CLASS
         if (args.isPython) {
           childArgs += ("--primary-py-file", args.primaryResource)
           childArgs += ("--class", "org.apache.spark.deploy.PythonRunner")
         } else if (args.isR) {
           val mainFile = new Path(args.primaryResource).getName
           childArgs += ("--primary-r-file", mainFile)
           childArgs += ("--class", "org.apache.spark.deploy.RRunner")
         } else {
           if (args.primaryResource != SparkLauncher.NO_RESOURCE) {
             childArgs += ("--jar", args.primaryResource)
           }
           // 傳入 args.mainClass
           childArgs += ("--class", args.mainClass)
         }
         if (args.childArgs != null) {
           args.childArgs.foreach { arg => childArgs += ("--arg", arg) }
         }
       }
      
      • 其他模式自行查看即可(MesosCluster、KubernetesCluster)
  • 我們可以看到,如果是Client模式,那麼就會調用用戶編寫的class的main方法。如果是cluster模式,根據不同的部署情況會分別調用RestSubmissionClientApp、ClientApp、YarnClusterApplication、KubernetesClientApplication等,進行下一步處理。

  • 而在Cluster模式下,這幾個SparkApplication在本地啓動,分別都會去申請節點,並在申請的節點處啓動Driver(調用用戶編寫的class的main方法),下面我們來看看這幾個SparkApplication的源碼。

Standalone模式的ClientApp

  • org.apache.spark.deploy.ClientApp
  • 其代碼如下
    private[spark] class ClientApp extends SparkApplication {
    
      override def start(args: Array[String], conf: SparkConf): Unit = {
        // ClientArguments內部會調用parse(args.toList)進行解析
        val driverArgs = new ClientArguments(args)
    
        if (!conf.contains("spark.rpc.askTimeout")) {
          conf.set("spark.rpc.askTimeout", "10s")
        }
        Logger.getRootLogger.setLevel(driverArgs.logLevel)
    	// 創建NettyRpcEnv
        val rpcEnv =
          RpcEnv.create("driverClient", Utils.localHostName(), 0, conf, new SecurityManager(conf))
        // 利用Master的URL地址,獲取到其RpcEndpointRef
        val masterEndpoints = driverArgs.masters.map(RpcAddress.fromSparkURL).
          map(rpcEnv.setupEndpointRef(_, Master.ENDPOINT_NAME))
        // 實例化ClientEndpoint,並註冊
        rpcEnv.setupEndpoint("client", new ClientEndpoint(rpcEnv, driverArgs, masterEndpoints, conf))
    
        rpcEnv.awaitTermination()
      }
    
    }
    
  • 此部分代碼又涉及到了我們在前面Spark源碼剖析——RpcEndpoint、RpcEnv所講的通信過程,不太瞭解的朋友請先看看。
  • ClientEndpoint被實例化後,其onStart方法會被調用,代碼如下
    override def onStart(): Unit = {
      driverArgs.cmd match {
        case "launch" =>
          //  記住該類DriverWrapper,後面會啓動它,並調用用戶編寫的class的main方法
          val mainClass = "org.apache.spark.deploy.worker.DriverWrapper"
    
          // 省略部分代碼
    
          // 構建Command
          // 此處傳入的driverArgs.mainClass就是用戶編寫的class
          val command = new Command(mainClass,
            Seq("{{WORKER_URL}}", "{{USER_JAR}}", driverArgs.mainClass) ++ driverArgs.driverOptions,
            sys.env, classPathEntries, libraryPathEntries, javaOpts)
    
          val driverDescription = new DriverDescription(
            driverArgs.jarUrl,
            driverArgs.memory,
            driverArgs.cores,
            driverArgs.supervise,
            command)
          // 向Master發送RequestSubmitDriver消息
          asyncSendToMasterAndForwardReply[SubmitDriverResponse](
            RequestSubmitDriver(driverDescription))
    
        case "kill" =>
          val driverId = driverArgs.driverId
          asyncSendToMasterAndForwardReply[KillDriverResponse](RequestKillDriver(driverId))
      }
    }
    
  • 接着,我們來看Master的receiveAndReply中接收到RequestSubmitDriver消息會做什麼
    override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
      case RequestSubmitDriver(description) =>
        // 高可用模式下,Master會有多個,因此有不同狀態
        if (state != RecoveryState.ALIVE) {
          // 如果不是ALIVE,那麼回覆攜帶失敗消息的SubmitDriverResponse
          val msg = s"${Utils.BACKUP_STANDALONE_MASTER_PREFIX}: $state. " +
            "Can only accept driver submissions in ALIVE state."
          context.reply(SubmitDriverResponse(self, false, None, msg))
        } else {
          logInfo("Driver submitted " + description.command.mainClass)
          // 根據傳過來的description創建Driver
          val driver = createDriver(description) // 此處主要構建了DriverInfo
          persistenceEngine.addDriver(driver)
          waitingDrivers += driver
          drivers.add(driver)
          // 實際啓動Driver的代碼
          schedule()
    
          // 回覆包含driver.id的消息SubmitDriverResponse
          context.reply(SubmitDriverResponse(self, true, Some(driver.id),
            s"Driver successfully submitted as ${driver.id}"))
        }
    
    	// 省略部分代碼
    }
    
  • receiveAndReply中接收到RequestSubmitDriver消息後,會構建一個DriverInfo,並調用schedule()方法創建Driver。接着來看實際調用啓動Driver的代碼schedule()
    private def schedule(): Unit = {
      if (state != RecoveryState.ALIVE) {
        return
      }
      // 取出可用的workers,並隨機打亂順序
      // 打亂順序主要是防止部分worker上啓動太多的driver,這樣做可以使drveir均勻的分佈在集羣中
      val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
      val numWorkersAlive = shuffledAliveWorkers.size
      var curPos = 0
      // 遍歷輪詢waitingDrivers,即前面構建的DriverInfo
      for (driver <- waitingDrivers.toList) {
        var launched = false
        var numWorkersVisited = 0
        // 必須要少於可用的數量,且並且沒啓動
        while (numWorkersVisited < numWorkersAlive && !launched) {
          // 從前面的隨機Seq中,去一個worker節點
          val worker = shuffledAliveWorkers(curPos)
          numWorkersVisited += 1
          // worker空閒的內存要大於申請的內存
          // worker空閒的core數要大於申請的core數
          if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
            // 如果都滿足,那麼在該worker節點啓動driver
            launchDriver(worker, driver)
            // 刪除已經啓動的driver記錄,防止重複啓動
            waitingDrivers -= driver
            launched = true
          }
          curPos = (curPos + 1) % numWorkersAlive
        }
      }
      // 此處是啓動Executor的代碼,暫時不管
      startExecutorsOnWorkers()
    }
    
  • 此部分代碼中,先獲取了可用的worker,並隨機打亂其順序。接着遍歷之前構建的DriverInfo,取出worker,比較是否有符合條件(memory、cores)的worker。如果有,那麼調用launchDriver(…)方法,在該worker啓動Driver。
    private def launchDriver(worker: WorkerInfo, driver: DriverInfo) {
      logInfo("Launching driver " + driver.id + " on worker " + worker.id)
      worker.addDriver(driver)
      driver.worker = Some(worker)
      // 向worker節點發送LaunchDriver信息
      worker.endpoint.send(LaunchDriver(driver.id, driver.desc))
      driver.state = DriverState.RUNNING
    }
    
  • Master調用launchDriver,會向worker發送一條LaunchDriver信息,我們來看Worker接收到消息後做了什麼。
    override def receive: PartialFunction[Any, Unit] = synchronized {
      // 省略部分代碼
    
      case LaunchDriver(driverId, driverDesc) =>
        logInfo(s"Asked to launch driver $driverId")
        // 構建DriverRunner,並調用start啓動
        val driver = new DriverRunner(
          conf,
          driverId,
          workDir,
          sparkHome,
          driverDesc.copy(command = Worker.maybeUpdateSSLSettings(driverDesc.command, conf)),
          self,
          workerUri,
          securityMgr)
        drivers(driverId) = driver
        driver.start()
    
       // 更新本worker節點已用的資源
        coresUsed += driverDesc.cores
        memoryUsed += driverDesc.mem
    
      // 省略部分代碼
    }
    
  • 在Worker節點,接收到LaunchDriver消息後,會構建一個DriverRunner,並調用其start方法啓動。start方法中新建了一個Thread,並啓動,其中最主要的是調用prepareAndRunDriver(),利用ProcessBuilder啓動一個新進程,並阻塞在此處。
  • 啓動該進程的命令則是實例化DriverRunner傳入的driverDesc中的command。該command由最開始在AppClient中的ClientEndpoint發送至Master,再由Master發送至Worker得到。也就是前面讓大家記住的org.apache.spark.deploy.worker.DriverWrapper(你可以回去看看ClientEndpoint中的onStart方法)。並且該command還攜帶了用戶的class。
  • 最後,我們來到DriverWrapper的main方法中
    def main(args: Array[String]) {
      args.toList match {
        case workerUrl :: userJar :: mainClass :: extraArgs =>
          val conf = new SparkConf()
          val host: String = Utils.localHostName()
          val port: Int = sys.props.getOrElse("spark.driver.port", "0").toInt
          // 創建NettyRpcENv
          val rpcEnv = RpcEnv.create("Driver", host, port, conf, new SecurityManager(conf))
          logInfo(s"Driver address: ${rpcEnv.address}")
          // 實例化WorkerWatcher,並註冊
          rpcEnv.setupEndpoint("workerWatcher", new WorkerWatcher(rpcEnv, workerUrl))
    
          val currentLoader = Thread.currentThread.getContextClassLoader
          val userJarUrl = new File(userJar).toURI().toURL()
          // 此處同SparkSubmit的runMain()
          // 根據spark.driver.userClassPathFirst選擇ClassLoader
          val loader =
            if (sys.props.getOrElse("spark.driver.userClassPathFirst", "false").toBoolean) {
              // 該ClassLoader優先使用用戶提供的jar包
              new ChildFirstURLClassLoader(Array(userJarUrl), currentLoader)
            } else {
              new MutableURLClassLoader(Array(userJarUrl), currentLoader)
            }
          Thread.currentThread.setContextClassLoader(loader)
          setupDependencies(loader, userJar)
    
          // 反射調用用戶編寫的class的main方法
          val clazz = Utils.classForName(mainClass)
          val mainMethod = clazz.getMethod("main", classOf[Array[String]])
          mainMethod.invoke(null, extraArgs.toArray[String])
    
          rpcEnv.shutdown()
    
        case _ =>
          // scalastyle:off println
          System.err.println("Usage: DriverWrapper <workerUrl> <userJar> <driverMainClass> [options]")
          // scalastyle:on println
          System.exit(-1)
      }
    }
    
  • 至此,DriverWrapper在Worker中完成了啓動,並且在最後利用反射的方式調用了用戶編寫的class中的main方法,後面的就和Client模式後面一樣了。

ON YARN模式的YarnClusterApplication

  • org.apache.spark.deploy.yarn.YarnClusterApplication
  • 該類的start方法非常簡單,主要就是在本地實例化了一個Client,並調用其run方法。
  • 我們來看org.apache.spark.deploy.yarn.Client的代碼
    def run(): Unit = {
      // 向ResourceManager提交應用
      this.appId = submitApplication()
      
      if (!launcherBackend.isConnected() && fireAndForget) {
        // 如果失敗,就拋出異常SparkException
        val report = getApplicationReport(appId)
        val state = report.getYarnApplicationState
        logInfo(s"Application report for $appId (state: $state)")
        logInfo(formatReportDetails(report))
        if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) {
          throw new SparkException(s"Application $appId finished with status: $state")
        }
      } else {
        // 如果沒問題,就監控該應用的狀態直到應用關閉
        val YarnAppReport(appState, finalState, diags) = monitorApplication(appId)
        if (appState == YarnApplicationState.FAILED || finalState == FinalApplicationStatus.FAILED) {
          diags.foreach { err =>
            logError(s"Application diagnostics message: $err")
          }
          throw new SparkException(s"Application $appId finished with failed status")
        }
        if (appState == YarnApplicationState.KILLED || finalState == FinalApplicationStatus.KILLED) {
          throw new SparkException(s"Application $appId is killed")
        }
        if (finalState == FinalApplicationStatus.UNDEFINED) {
          throw new SparkException(s"The final status of application $appId is undefined")
        }
      }
    }
    
  • 此部分最主要的是調用了submitApplication方法,向YARN的ResourceManager申請資源運行ApplicationMaster。其代碼如下
    def submitApplication(): ApplicationId = {
      var appId: ApplicationId = null
      try {
        // 該launcherBackend在Client實例化時被實例化
        launcherBackend.connect()
        // yarnClient在Client實例化時調用YarnClient.createYarnClient被創建
        yarnClient.init(hadoopConf)
        yarnClient.start()
    
        logInfo("Requesting a new application from cluster with %d NodeManagers"
          .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))
    
        // 向ResourceManager申請創建應用
        val newApp = yarnClient.createApplication()
        val newAppResponse = newApp.getNewApplicationResponse()
        appId = newAppResponse.getApplicationId()
    
        new CallerContext("CLIENT", sparkConf.get(APP_CALLER_CONTEXT),
          Option(appId.toString)).setCurrentContext()
    
        // 驗證YARN集羣中是否有足夠的資源運行ApplicationMaster
        verifyClusterResources(newAppResponse)
    
        // 設置用於啓動ApplicationMaster的Context
        // 主要包含了:
        // 1. 對部分參數的解析,例如--class、--jar等
        // 2. 構建包含java的commands,調用amContainer.setCommands傳入ApplicationMaster中,用於啓動用戶編寫的class文件
        // 3. ApplicationMaster的申請配置
        val containerContext = createContainerLaunchContext(newAppResponse)
        val appContext = createApplicationSubmissionContext(newApp, containerContext)
    
        logInfo(s"Submitting application $appId to ResourceManager")
        // 啓動ApplicationMaster
        yarnClient.submitApplication(appContext)
        launcherBackend.setAppId(appId.toString)
        reportLauncherState(SparkAppHandle.State.SUBMITTED)
    
        appId
      } catch {
        // 省略部分代碼
      }
    }
    
  • submitApplication()方法中連接了YARN,申請並創建了ApplicationMaster。其中傳入了用於調用java啓動用戶編寫的類的命令,後續在ApplicationMaster中將會啓動該命令,後面的就和Client模式後面一樣了。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章