文章目錄
Spark源碼剖析——SparkSubmit提交流程
當前環境與版本
環境 | 版本 |
---|---|
JDK | java version “1.8.0_231” (HotSpot) |
Scala | Scala-2.11.12 |
Spark | spark-2.4.4 |
前言
- 運行Spark應用時,通常我們會利用
./bin/spark-submit
進行提交任務,例如spark-submit \ --master yarn --deploy-mode cluster \ --num-executors 10 --executor-memory 8G --executor-cores 4 \ --driver-memory 4G \ --conf spark.network.timeout=300 \ --class com.skey.spark.app.MyApp /home/jerry/spark-demo.jar
- 在運行spark-submit後,程序將爲我們解析參數,根據不同的部署模式,採用不同的方法提交Spark應用到集羣中,例如
- Standalone
- client -> 在本地運行用戶編寫的類的Main方法,即在本地啓動Driver
- cluster -> 利用ClientApp向集羣申請節點,用於啓動Driver
- ON YARN
- client -> 在本地運行,同Standalone
- cluster -> 利用YarnClusterApplication向集羣申請節點,用於啓動Driver
- Standalone
- SparkSubmit的整體的提交流程圖如下
- 下面我們就來看看SparkSubmit任務提交流程的源碼
Shell命令部分
- 首先,我們會調用
./bin/spark-submit
的shell命令,傳參並進行提交,而其中首先調用了./bin/spark-class
,主要的代碼部分如下exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"
- 要注意的是,此處向spark-class傳入了參數
org.apache.spark.deploy.SparkSubmit
- 而
./bin/spark-class
首先會利用/bin/load-spark-env.sh
加載環境變量,代碼如下. "${SPARK_HOME}"/bin/load-spark-env.sh
- 其中重要的是
/bin/load-spark-env.sh
調用了./conf/spark-env.sh
,也就是我們平常配置的各種默認環境參數,例如SPARK_MASTER_HOST、SPARK_WORKER_MEMORY、HADOOP_CONF_DIR等等。因此,我們可以知道每一個應用提交時都會重新讀取該配置文件。 - 接着,
./bin/spark-class
會去尋找Java命令、Jar文件、啓動Java進程等,其中最主要的代碼如下build_command() { # 此RUNNER是前面解析的java命令 # LAUNCH_CLASSPATH一般是SPARK_HOME/jars/* # org.apache.spark.launcher.Main將會按Null字符('\0')分隔打印解析的參數 # "$@" 此處既是前面spark-submit傳入的參數,需要注意的是第一個參數是 org.apache.spark.deploy.SparkSubmit "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@" printf "%d\0" $? } set +o posix CMD=() # 執行(build_command "$@"),將結果重定向到while循環中 # while中由read命令處理接收到的字符串(按Null字符分隔) # 將結果添加到CMD列表中 while IFS= read -d '' -r ARG; do CMD+=("$ARG") done < <(build_command "$@") // 省略部分代碼…… # 執行命令 CMD=("${CMD[@]:0:$LAST}") exec "${CMD[@]}"
參數解析 Main
- org.apache.spark.launcher.Main
- 此類主要用於參數解析,以及按不同的模式輸出運行命令,代碼不多,如下
class Main { public static void main(String[] argsArray) throws Exception { checkArgument(argsArray.length > 0, "Not enough arguments: missing class name."); List<String> args = new ArrayList<>(Arrays.asList(argsArray)); // 獲取接收的第一個參數,即前面傳入的org.apache.spark.deploy.SparkSubmit String className = args.remove(0); boolean printLaunchCommand = !isEmpty(System.getenv("SPARK_PRINT_LAUNCH_COMMAND")); Map<String, String> env = new HashMap<>(); List<String> cmd; if (className.equals("org.apache.spark.deploy.SparkSubmit")) { try { // 解析參數,例如--class、--conf等 // 並構建命令 AbstractCommandBuilder builder = new SparkSubmitCommandBuilder(args); // cmd中主要添加了java -cp classpath org.apache.spark.deploy.SparkSubmit cmd = buildCommand(builder, env, printLaunchCommand); } catch (IllegalArgumentException e) { // 省略部分代碼 } } else { // 如果自定義了SparkSubmit,則走此部分 AbstractCommandBuilder builder = new SparkClassCommandBuilder(className, args); cmd = buildCommand(builder, env, printLaunchCommand); } // 在不同操作系統環境下,按不同方式打印輸出 if (isWindows()) { System.out.println(prepareWindowsCommand(cmd, env)); } else { List<String> bashCmd = prepareBashCommand(cmd, env); for (String c : bashCmd) { System.out.print(c); System.out.print('\0'); // 使用Null字符進行分隔 } } } // 省略部分代碼 }
- 最後,打印的參數(主要包括java -cp classpath org.apache.spark.deploy.SparkSubmit等)將被
./bin/spark-class
中的CMD列表接收,並使用exec執行。
SparkSubmit
-
org.apache.spark.deploy.SparkSubmit
-
此類就是真正進行Spark應用提交的類了,正如前面部分所說,此類對接收的參數進行解析,並根據不同的模式進行應用提交。
-
SparkSubmit有一個class以及一個伴生對象,首先我們看到其伴生對象的main方法中,此處即是java進程的入口
override def main(args: Array[String]): Unit = { // 實例化SparkSubmit,並重寫部分方法 val submit = new SparkSubmit() { self => // 爲this定義一個別名,方便傳入SparkSubmitArguments override protected def parseArguments(args: Array[String]): SparkSubmitArguments = { // 重寫SparkSubmitArguments的日誌打印方法 // 使其打印時調用SparkSubmit的logInfo、logWarning new SparkSubmitArguments(args) { override protected def logInfo(msg: => String): Unit = self.logInfo(msg) override protected def logWarning(msg: => String): Unit = self.logWarning(msg) } } override protected def logInfo(msg: => String): Unit = printMessage(msg) override protected def logWarning(msg: => String): Unit = printMessage(s"Warning: $msg") override def doSubmit(args: Array[String]): Unit = { try { // 此處還是調用的父類的doSubmit,只不過後面添加了對異常的處理 super.doSubmit(args) } catch { case e: SparkUserAppException => exitFn(e.exitCode) } } } // 調用 SparkSubmit的doSubmit進行任務提交 submit.doSubmit(args) }
-
此處代碼最終會調用SparkSubmit的doSubmit,其代碼如下
def doSubmit(args: Array[String]): Unit = { // Initialize logging if it hasn't been done yet. Keep track of whether logging needs to // be reset before the application starts. val uninitLog = initializeLogIfNecessary(true, silent = true) // parseArguments會實例化SparkSubmitArguments // 需要注意的是前面的伴生對象中已經重寫過SparkSubmitArguments中的日誌方法 val appArgs = parseArguments(args) if (appArgs.verbose) { logInfo(appArgs.toString) } // 提交任務時,現在action是走的SparkSubmitAction.SUBMIT // 有興趣的朋友可以看看,SUBMIT由SparkSubmitArguments中的loadEnvironmentArguments方法解析得到 appArgs.action match { case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog) case SparkSubmitAction.KILL => kill(appArgs) case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs) case SparkSubmitAction.PRINT_VERSION => printVersion() } }
-
由於我們現在是提交任務,此部分代碼將會接着調用submit,代碼如下
private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = { // 定義了一個doRunMain,等下會被調用 // 最終會調用runMain(...)方法 def doRunMain(): Unit = { // proxyUser是指的代理用戶,由--proxy-user指定 // 主要用於冒充其他用戶的名稱,例如本用戶是jerry,但是你可以冒充爲tom,越過用戶權限,處理tom的文件 if (args.proxyUser != null) { val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser, UserGroupInformation.getCurrentUser()) try { proxyUser.doAs(new PrivilegedExceptionAction[Unit]() { override def run(): Unit = { runMain(args, uninitLog) } }) } catch { // 省略部分代碼 } } else { runMain(args, uninitLog) } } // 判斷啓動模式,不過最終都會調用doRunMain()方法 if (args.isStandaloneCluster && args.useRest) { try { logInfo("Running Spark using the REST application submission protocol.") doRunMain() } catch { // 省略部分代碼 } } else { doRunMain() } }
-
可以看到submit主要是針對是否使用代理用戶進行了處理,最後調用了runMain(…)方法,此方法就是SparkSubmit的核心了,其代碼如下
private def runMain(args: SparkSubmitArguments, uninitLog: Boolean): Unit = { // prepareSubmitEnvironment用於解析參數,主要決定了啓動的模式 val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args) // 省略部分代碼 // 決定使用的ClassLoader,由參數spark.driver.userClassPathFirst決定,默認爲false val loader = if (sparkConf.get(DRIVER_USER_CLASS_PATH_FIRST)) { // 該ClassLoader會優先使用用戶提供的jar包 new ChildFirstURLClassLoader(new Array[URL](0), Thread.currentThread.getContextClassLoader) } else { // 默認的ClassLoader new MutableURLClassLoader(new Array[URL](0), Thread.currentThread.getContextClassLoader) } Thread.currentThread.setContextClassLoader(loader) for (jar <- childClasspath) { addJarToClasspath(jar, loader) } var mainClass: Class[_] = null try { // 根據參數childMainClass獲取類對象 mainClass = Utils.classForName(childMainClass) } catch { // 省略部分代碼 } val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) { // 如果mainClass是SparkApplication,那就實例化它 mainClass.newInstance().asInstanceOf[SparkApplication] } else { // 否則使用JavaMainApplication包裝它 if (classOf[scala.App].isAssignableFrom(mainClass)) { logWarning("Subclasses of scala.App may not work correctly. Use a main() method instead.") } new JavaMainApplication(mainClass) } // 省略部分代碼 try { // 調用start方法 // start中對class進行了反射,並調用了其main方法 app.start(childArgs.toArray, sparkConf) } catch { case t: Throwable => throw findCause(t) } }
-
顯然,runMain(…)中最爲重要的變量是childMainClass,因爲它決定了接下來要運行的類。爲了看看它到底是什麼類,我們追蹤到prepareSubmitEnvironment(…)方法中來看,其中也有一個變量childMainClass,它就是等要返回的變量。下面我們將圍繞childMainClass進行分析。
- 判斷Client模式
if (deployMode == CLIENT) { // 如果CLIENT模式,直接將args.mainClass賦值給childMainClass // args.mainClass也就是我們提交時--class指定的類 childMainClass = args.mainClass if (localPrimaryResource != null && isUserJar(localPrimaryResource)) { childClasspath += localPrimaryResource } if (localJars != null) { childClasspath ++= localJars.split(",") } }
- 判斷StandaloneCluster模式
// 先看是否是StandaloneCluster模式 if (args.isStandaloneCluster) { if (args.useRest) { // 如果是REST,那就使用 org.apache.spark.deploy.rest.RestSubmissionClientApp childMainClass = REST_CLUSTER_SUBMIT_CLASS // 傳入 args.mainClass childArgs += (args.primaryResource, args.mainClass) } else { // 否則,使用 org.apache.spark.deploy.ClientApp childMainClass = STANDALONE_CLUSTER_SUBMIT_CLASS if (args.supervise) { childArgs += "--supervise" } Option(args.driverMemory).foreach { m => childArgs += ("--memory", m) } Option(args.driverCores).foreach { c => childArgs += ("--cores", c) } childArgs += "launch" // 傳入 args.mainClass childArgs += (args.master, args.primaryResource, args.mainClass) } if (args.childArgs != null) { childArgs ++= args.childArgs } }
- 判斷YarnCluster模式
if (isYarnCluster) { // 如果是YarnCluster模式,使用org.apache.spark.deploy.yarn.YarnClusterApplication childMainClass = YARN_CLUSTER_SUBMIT_CLASS if (args.isPython) { childArgs += ("--primary-py-file", args.primaryResource) childArgs += ("--class", "org.apache.spark.deploy.PythonRunner") } else if (args.isR) { val mainFile = new Path(args.primaryResource).getName childArgs += ("--primary-r-file", mainFile) childArgs += ("--class", "org.apache.spark.deploy.RRunner") } else { if (args.primaryResource != SparkLauncher.NO_RESOURCE) { childArgs += ("--jar", args.primaryResource) } // 傳入 args.mainClass childArgs += ("--class", args.mainClass) } if (args.childArgs != null) { args.childArgs.foreach { arg => childArgs += ("--arg", arg) } } }
- 其他模式自行查看即可(MesosCluster、KubernetesCluster)
- 判斷Client模式
-
我們可以看到,如果是Client模式,那麼就會調用用戶編寫的class的main方法。如果是cluster模式,根據不同的部署情況會分別調用RestSubmissionClientApp、ClientApp、YarnClusterApplication、KubernetesClientApplication等,進行下一步處理。
-
而在Cluster模式下,這幾個SparkApplication在本地啓動,分別都會去申請節點,並在申請的節點處啓動Driver(調用用戶編寫的class的main方法),下面我們來看看這幾個SparkApplication的源碼。
Standalone模式的ClientApp
- org.apache.spark.deploy.ClientApp
- 其代碼如下
private[spark] class ClientApp extends SparkApplication { override def start(args: Array[String], conf: SparkConf): Unit = { // ClientArguments內部會調用parse(args.toList)進行解析 val driverArgs = new ClientArguments(args) if (!conf.contains("spark.rpc.askTimeout")) { conf.set("spark.rpc.askTimeout", "10s") } Logger.getRootLogger.setLevel(driverArgs.logLevel) // 創建NettyRpcEnv val rpcEnv = RpcEnv.create("driverClient", Utils.localHostName(), 0, conf, new SecurityManager(conf)) // 利用Master的URL地址,獲取到其RpcEndpointRef val masterEndpoints = driverArgs.masters.map(RpcAddress.fromSparkURL). map(rpcEnv.setupEndpointRef(_, Master.ENDPOINT_NAME)) // 實例化ClientEndpoint,並註冊 rpcEnv.setupEndpoint("client", new ClientEndpoint(rpcEnv, driverArgs, masterEndpoints, conf)) rpcEnv.awaitTermination() } }
- 此部分代碼又涉及到了我們在前面Spark源碼剖析——RpcEndpoint、RpcEnv所講的通信過程,不太瞭解的朋友請先看看。
- ClientEndpoint被實例化後,其onStart方法會被調用,代碼如下
override def onStart(): Unit = { driverArgs.cmd match { case "launch" => // 記住該類DriverWrapper,後面會啓動它,並調用用戶編寫的class的main方法 val mainClass = "org.apache.spark.deploy.worker.DriverWrapper" // 省略部分代碼 // 構建Command // 此處傳入的driverArgs.mainClass就是用戶編寫的class val command = new Command(mainClass, Seq("{{WORKER_URL}}", "{{USER_JAR}}", driverArgs.mainClass) ++ driverArgs.driverOptions, sys.env, classPathEntries, libraryPathEntries, javaOpts) val driverDescription = new DriverDescription( driverArgs.jarUrl, driverArgs.memory, driverArgs.cores, driverArgs.supervise, command) // 向Master發送RequestSubmitDriver消息 asyncSendToMasterAndForwardReply[SubmitDriverResponse]( RequestSubmitDriver(driverDescription)) case "kill" => val driverId = driverArgs.driverId asyncSendToMasterAndForwardReply[KillDriverResponse](RequestKillDriver(driverId)) } }
- 接着,我們來看Master的receiveAndReply中接收到RequestSubmitDriver消息會做什麼
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = { case RequestSubmitDriver(description) => // 高可用模式下,Master會有多個,因此有不同狀態 if (state != RecoveryState.ALIVE) { // 如果不是ALIVE,那麼回覆攜帶失敗消息的SubmitDriverResponse val msg = s"${Utils.BACKUP_STANDALONE_MASTER_PREFIX}: $state. " + "Can only accept driver submissions in ALIVE state." context.reply(SubmitDriverResponse(self, false, None, msg)) } else { logInfo("Driver submitted " + description.command.mainClass) // 根據傳過來的description創建Driver val driver = createDriver(description) // 此處主要構建了DriverInfo persistenceEngine.addDriver(driver) waitingDrivers += driver drivers.add(driver) // 實際啓動Driver的代碼 schedule() // 回覆包含driver.id的消息SubmitDriverResponse context.reply(SubmitDriverResponse(self, true, Some(driver.id), s"Driver successfully submitted as ${driver.id}")) } // 省略部分代碼 }
- receiveAndReply中接收到RequestSubmitDriver消息後,會構建一個DriverInfo,並調用schedule()方法創建Driver。接着來看實際調用啓動Driver的代碼schedule()
private def schedule(): Unit = { if (state != RecoveryState.ALIVE) { return } // 取出可用的workers,並隨機打亂順序 // 打亂順序主要是防止部分worker上啓動太多的driver,這樣做可以使drveir均勻的分佈在集羣中 val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE)) val numWorkersAlive = shuffledAliveWorkers.size var curPos = 0 // 遍歷輪詢waitingDrivers,即前面構建的DriverInfo for (driver <- waitingDrivers.toList) { var launched = false var numWorkersVisited = 0 // 必須要少於可用的數量,且並且沒啓動 while (numWorkersVisited < numWorkersAlive && !launched) { // 從前面的隨機Seq中,去一個worker節點 val worker = shuffledAliveWorkers(curPos) numWorkersVisited += 1 // worker空閒的內存要大於申請的內存 // worker空閒的core數要大於申請的core數 if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) { // 如果都滿足,那麼在該worker節點啓動driver launchDriver(worker, driver) // 刪除已經啓動的driver記錄,防止重複啓動 waitingDrivers -= driver launched = true } curPos = (curPos + 1) % numWorkersAlive } } // 此處是啓動Executor的代碼,暫時不管 startExecutorsOnWorkers() }
- 此部分代碼中,先獲取了可用的worker,並隨機打亂其順序。接着遍歷之前構建的DriverInfo,取出worker,比較是否有符合條件(memory、cores)的worker。如果有,那麼調用launchDriver(…)方法,在該worker啓動Driver。
private def launchDriver(worker: WorkerInfo, driver: DriverInfo) { logInfo("Launching driver " + driver.id + " on worker " + worker.id) worker.addDriver(driver) driver.worker = Some(worker) // 向worker節點發送LaunchDriver信息 worker.endpoint.send(LaunchDriver(driver.id, driver.desc)) driver.state = DriverState.RUNNING }
- Master調用launchDriver,會向worker發送一條LaunchDriver信息,我們來看Worker接收到消息後做了什麼。
override def receive: PartialFunction[Any, Unit] = synchronized { // 省略部分代碼 case LaunchDriver(driverId, driverDesc) => logInfo(s"Asked to launch driver $driverId") // 構建DriverRunner,並調用start啓動 val driver = new DriverRunner( conf, driverId, workDir, sparkHome, driverDesc.copy(command = Worker.maybeUpdateSSLSettings(driverDesc.command, conf)), self, workerUri, securityMgr) drivers(driverId) = driver driver.start() // 更新本worker節點已用的資源 coresUsed += driverDesc.cores memoryUsed += driverDesc.mem // 省略部分代碼 }
- 在Worker節點,接收到LaunchDriver消息後,會構建一個DriverRunner,並調用其start方法啓動。start方法中新建了一個Thread,並啓動,其中最主要的是調用prepareAndRunDriver(),利用ProcessBuilder啓動一個新進程,並阻塞在此處。
- 啓動該進程的命令則是實例化DriverRunner傳入的driverDesc中的command。該command由最開始在AppClient中的ClientEndpoint發送至Master,再由Master發送至Worker得到。也就是前面讓大家記住的org.apache.spark.deploy.worker.DriverWrapper(你可以回去看看ClientEndpoint中的onStart方法)。並且該command還攜帶了用戶的class。
- 最後,我們來到DriverWrapper的main方法中
def main(args: Array[String]) { args.toList match { case workerUrl :: userJar :: mainClass :: extraArgs => val conf = new SparkConf() val host: String = Utils.localHostName() val port: Int = sys.props.getOrElse("spark.driver.port", "0").toInt // 創建NettyRpcENv val rpcEnv = RpcEnv.create("Driver", host, port, conf, new SecurityManager(conf)) logInfo(s"Driver address: ${rpcEnv.address}") // 實例化WorkerWatcher,並註冊 rpcEnv.setupEndpoint("workerWatcher", new WorkerWatcher(rpcEnv, workerUrl)) val currentLoader = Thread.currentThread.getContextClassLoader val userJarUrl = new File(userJar).toURI().toURL() // 此處同SparkSubmit的runMain() // 根據spark.driver.userClassPathFirst選擇ClassLoader val loader = if (sys.props.getOrElse("spark.driver.userClassPathFirst", "false").toBoolean) { // 該ClassLoader優先使用用戶提供的jar包 new ChildFirstURLClassLoader(Array(userJarUrl), currentLoader) } else { new MutableURLClassLoader(Array(userJarUrl), currentLoader) } Thread.currentThread.setContextClassLoader(loader) setupDependencies(loader, userJar) // 反射調用用戶編寫的class的main方法 val clazz = Utils.classForName(mainClass) val mainMethod = clazz.getMethod("main", classOf[Array[String]]) mainMethod.invoke(null, extraArgs.toArray[String]) rpcEnv.shutdown() case _ => // scalastyle:off println System.err.println("Usage: DriverWrapper <workerUrl> <userJar> <driverMainClass> [options]") // scalastyle:on println System.exit(-1) } }
- 至此,DriverWrapper在Worker中完成了啓動,並且在最後利用反射的方式調用了用戶編寫的class中的main方法,後面的就和Client模式後面一樣了。
ON YARN模式的YarnClusterApplication
- org.apache.spark.deploy.yarn.YarnClusterApplication
- 該類的start方法非常簡單,主要就是在本地實例化了一個Client,並調用其run方法。
- 我們來看org.apache.spark.deploy.yarn.Client的代碼
def run(): Unit = { // 向ResourceManager提交應用 this.appId = submitApplication() if (!launcherBackend.isConnected() && fireAndForget) { // 如果失敗,就拋出異常SparkException val report = getApplicationReport(appId) val state = report.getYarnApplicationState logInfo(s"Application report for $appId (state: $state)") logInfo(formatReportDetails(report)) if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) { throw new SparkException(s"Application $appId finished with status: $state") } } else { // 如果沒問題,就監控該應用的狀態直到應用關閉 val YarnAppReport(appState, finalState, diags) = monitorApplication(appId) if (appState == YarnApplicationState.FAILED || finalState == FinalApplicationStatus.FAILED) { diags.foreach { err => logError(s"Application diagnostics message: $err") } throw new SparkException(s"Application $appId finished with failed status") } if (appState == YarnApplicationState.KILLED || finalState == FinalApplicationStatus.KILLED) { throw new SparkException(s"Application $appId is killed") } if (finalState == FinalApplicationStatus.UNDEFINED) { throw new SparkException(s"The final status of application $appId is undefined") } } }
- 此部分最主要的是調用了submitApplication方法,向YARN的ResourceManager申請資源運行ApplicationMaster。其代碼如下
def submitApplication(): ApplicationId = { var appId: ApplicationId = null try { // 該launcherBackend在Client實例化時被實例化 launcherBackend.connect() // yarnClient在Client實例化時調用YarnClient.createYarnClient被創建 yarnClient.init(hadoopConf) yarnClient.start() logInfo("Requesting a new application from cluster with %d NodeManagers" .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers)) // 向ResourceManager申請創建應用 val newApp = yarnClient.createApplication() val newAppResponse = newApp.getNewApplicationResponse() appId = newAppResponse.getApplicationId() new CallerContext("CLIENT", sparkConf.get(APP_CALLER_CONTEXT), Option(appId.toString)).setCurrentContext() // 驗證YARN集羣中是否有足夠的資源運行ApplicationMaster verifyClusterResources(newAppResponse) // 設置用於啓動ApplicationMaster的Context // 主要包含了: // 1. 對部分參數的解析,例如--class、--jar等 // 2. 構建包含java的commands,調用amContainer.setCommands傳入ApplicationMaster中,用於啓動用戶編寫的class文件 // 3. ApplicationMaster的申請配置 val containerContext = createContainerLaunchContext(newAppResponse) val appContext = createApplicationSubmissionContext(newApp, containerContext) logInfo(s"Submitting application $appId to ResourceManager") // 啓動ApplicationMaster yarnClient.submitApplication(appContext) launcherBackend.setAppId(appId.toString) reportLauncherState(SparkAppHandle.State.SUBMITTED) appId } catch { // 省略部分代碼 } }
- submitApplication()方法中連接了YARN,申請並創建了ApplicationMaster。其中傳入了用於調用java啓動用戶編寫的類的命令,後續在ApplicationMaster中將會啓動該命令,後面的就和Client模式後面一樣了。