Spark架構
1、Standalone架構
- 整個集羣分爲 Master 節點和 Worker 節點,相當於 Hadoop 的 Master 和 Slave 節點。
- Master 節點上常駐 Master 守護進程,負責管理全部的 Worker 節點。
- Worker 節點上常駐 Worker 守護進程,負責與 Master 節點通信並管理 executors。
- Driver 官方解釋是 “The process running the main() function of the application and creating the SparkContext”。Application 就是用戶自己寫的 Spark 程序(driver program),比如 WordCount.scala。如果 driver program 在 Master 上運行,比如在 Master 上運行,那麼 SparkPi 就是 Master 上的 Driver。如果是 YARN 集羣,那麼 Driver 可能被調度到 Worker 節點上運行(比如上圖中的 Worker Node 2)。
- 每個 Worker 上存在一個或者多個 ExecutorBackend 進程(一個application在一個Worker上只有一個ExecutorBackend )。每個進程包含一個 Executor 對象,該對象持有一個線程池,每個線程可以執行一個 task。
- 每個
application 包含一個 driver 和多個 executors,每個 executor 裏面運行的 tasks 都屬於同一個 application
- Worker
通過持有 ExecutorRunner 線程來控制 CoarseGrainedExecutorBackend 的啓停
2、Spark on yarn架構
上圖是yarn-client模式,下圖是yarn-cluster模式
從深層次的含義講,yarn-cluster和yarn-client模式的區別其實就是Spark Driver運行在哪的區別,yarn-cluster模式下,driver運行在AM(Application Master)中,它負責向YARN申請資源,並監督作業的運行狀況。當用戶提交了作業之後,就可以關掉Client,作業會繼續在YARN上運行。然而yarn-cluster模式不適合運行交互類型的作業。而yarn-client模式下,Client中運行Driver,在Client中能看到Application的運行信息,也就是說Client不能離開。
注意:本文以及後面的文章全是在standalone架構下分析的
作業提交
export HADOOP_CONF_DIR=/home/fw/hadoop-2.2.0/etc/hadoop
$SPARK_HOME/bin/spark-submit --class cn.edu.hdu.threshold.Threshold \
--master yarn-client \
--num-executors 2 \
--driver-memory 4g \
--executor-memory 4g \
--executor-cores 4 \
/home/fw/Cluster2.jar
打開腳本spark-submit.sh,最後會發現調用的是org.apache.spark.deploy.SparkSubmit這個類
SparkSubmit.main
def main(args: Array[String]) {
val appArgs = new SparkSubmitArguments(args)
if (appArgs.verbose) {
printStream.println(appArgs)
}
val (childArgs, classpath, sysProps, mainClass) = createLaunchEnv(appArgs)
launch(childArgs, classpath, sysProps, mainClass, appArgs.verbose)
}
先看createLaunchEnv方法
private[spark] def createLaunchEnv(args: SparkSubmitArguments)
: (ArrayBuffer[String], ArrayBuffer[String], Map[String, String], String) = {
if (args.master.startsWith("local")) {
clusterManager = LOCAL
} else if (args.master.startsWith("yarn")) {
clusterManager = YARN
} else if (args.master.startsWith("spark")) {
clusterManager = STANDALONE
} else if (args.master.startsWith("mesos")) {
clusterManager = MESOS
} else {
printErrorAndExit("Master must start with yarn, mesos, spark, or local")
}
......
(childArgs, childClasspath, sysProps, childMainClass)
}
該方法會讀取提交上來的運行參數,判斷是運行Standalone還是Spark on yarn,最後返回的childMainClass變量很重要。如果是Standalone架構則childMainClass爲org.apache.spark.deploy.Client,如果是Spark
on yarn架構則childMainClass爲org.apache.spark.deploy.yarn.Client
接下來是launch方法
private def launch(
childArgs: ArrayBuffer[String],
childClasspath: ArrayBuffer[String],
sysProps: Map[String, String],
childMainClass: String,
verbose: Boolean = false) {
......
val mainClass = Class.forName(childMainClass, true, loader)
val mainMethod = mainClass.getMethod("main", new Array[String](0).getClass)
try {
mainMethod.invoke(null, childArgs.toArray)
} catch {
case e: InvocationTargetException => e.getCause match {
case cause: Throwable => throw cause
case null => throw e
}
}
}
利用反射機制運行childMainClass
啓動Client
def main(args: Array[String]) {
......
// TODO: See if we can initialize akka so return messages are sent back using the same TCP
// flow. Else, this (sadly) requires the DriverClient be routable from the Master.
val (actorSystem, _) = AkkaUtils.createActorSystem(
"driverClient", Utils.localHostName(), 0, conf, new SecurityManager(conf))
actorSystem.actorOf(Props(classOf[ClientActor], driverArgs, conf))
actorSystem.awaitTermination()
}
}
//定義一個case class用來傳遞參數
case class Greeting(who: String)
//定義Actor,比較重要的一個方法是receive方法,用來接收信息的
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) ⇒ log.info("Hello " + who)
}
}
//創建一個ActorSystem
val system = ActorSystem("MySystem")
//給ActorSystem設置Actor
val greeter = system.actorOf(Props[GreetingActor], name = "greeter")
//向greeter發送信息,用Greeting來傳遞
greeter ! Greeting("Charlie Parker")
我們再看Client.Main函數,它初始化了類ClientActor,它有兩個方法preStart和receive,所有Actor都有這兩個方法,第一個方法是初始化使用的,第二個方法是接受其他Actor發送過來的消息override def preStart() = {
masterActor = context.actorSelection(Master.toAkkaUrl(driverArgs.master))
......
driverArgs.cmd match {
case "launch" =>
......
masterActor ! RequestSubmitDriver(driverDescription)
case "kill" =>
val driverId = driverArgs.driverId
val killFuture = masterActor ! RequestKillDriver(driverId)
}
}
1、動態代理獲得Master的引用啓動Driver
override def receive = {
......
case RequestSubmitDriver(description) => {
if (state != RecoveryState.ALIVE) {
val msg = s"Can only accept driver submissions in ALIVE state. Current state: $state."
sender ! SubmitDriverResponse(false, None, msg)
} else {
logInfo("Driver submitted " + description.command.mainClass)
val driver = createDriver(description)
persistenceEngine.addDriver(driver)
waitingDrivers += driver
drivers.add(driver)
schedule()
// TODO: It might be good to instead have the submission client poll the master to determine
// the current status of the driver. For now it's simply "fire and forget".
sender ! SubmitDriverResponse(true, Some(driver.id),
s"Driver successfully submitted as ${driver.id}")
}
}
def launchDriver(worker: WorkerInfo, driver: DriverInfo) {
logInfo("Launching driver " + driver.id + " on worker " + worker.id)
worker.addDriver(driver)
driver.worker = Some(worker)
worker.actor ! LaunchDriver(driver.id, driver.desc)
driver.state = DriverState.RUNNING
}
上述會通知Worer節點啓動Driveroverride def receive = {
......
case LaunchDriver(driverId, driverDesc) => {
logInfo(s"Asked to launch driver $driverId")
val driver = new DriverRunner(driverId, workDir, sparkHome, driverDesc, self, akkaUrl)
drivers(driverId) = driver
driver.start()
coresUsed += driverDesc.cores
memoryUsed += driverDesc.mem
}
......
}