生產環境客戶端提交spark程序,基於腳本提交的spark-submit
spark.version:2.4.0
scala.version:2.12
源碼解析:
spark-submit:
--main()
new SparkSubmit()
解析參數
->parseArguments->SparkSubmitArguments
模式匹配submit
action.match()->SUBMIT->submit()
準備提交環境
prepareSubmitEnvironment
childMainClass = YARN_CLUSTER_SUBMIT_CLASS= "org.apache.spark.deploy.yarn.YarnClusterApplication"
runmain()->mainClass = Utils.classForName(childMainClass) # 進入yarn
->app.start()->mainMethod.invoke()類反射調用
yarn :org.apache.spark.deploy.yarn.YarnClusterApplication
對SparkApplicationstart方法進行重寫
class YarnClusterApplication extends SparkApplication
# 啓動yarn客戶端,yarnClient此時應該不在yarn中,和sparksubmit同級
start()->new Client(new ClientArguments(args), conf).run()
new ClientArguments(args) 準備資源配置
new Client()->
準備yarn客戶端
val yarnClient = YarnClient.createYarnClient ->YarnClient client = new YarnClientImpl()
集羣的一些配置信息
run()->submitApplication()[啓動後端連接,yarn客戶端初始化,yarn客戶端start]
// Get a new application from our RM 在其中的一個nodemanger中創建AM
val newApp = yarnClient.createApplication()
// Set up the appropriate contexts to launch our AM
val containerContext = createContainerLaunchContext(newAppResponse)
createContainerLaunchContext
val amClass =
if (isClusterMode) {
Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName
} else {
# spark-shell xcall jps 可以看到顯示的爲ExecutorLauncher,spark-shell不可能爲cluster,只能爲client
Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName
}
org.apache.spark.deploy.yarn.ApplicationMaster
ApplicationMaster()->val amArgs = new ApplicationMasterArguments(args)
master = new ApplicationMaster(amArgs)
啓動ApplicationMaster
System.exit(master.run())
run()->runImpl()->->runDriver()/runExecutorLauncher()->
userClassThread = startUserApplication()->userThread.setName("Driver") userThread.start()
# 分配資源
createAllocator->allocator.allocateResources()
handleAllocatedContainers(allocatedContainers.asScala)
runAllocatedContainers(containersToUse)
ExecutorRunnable.run()->
# 想其中的一個nodemanager中創建容器
nmClient = NMClient.createNMClient()
nmClient.init(conf)
nmClient.start()
startContainer()->prepareCommand()->org.apache.spark.executor.CoarseGrainedExecutorBackend
->main()->run()->env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend
# executor先想driver註冊,接收driver的註冊成功響應,最後啓動任務
->onstart()-receive[RegisteredExecutor,RegisterExecutorFailed,LaunchTask]