從1.6後,Driver的BlockManagerMaster與BlockManager之間的通信不再使用AkkaUtil而是RpcEndpoint
Spark集羣中有很多執行程序執行,需要很多Executor,CoarseGrainedExecutorBackend是Executor所在的進程,Executor需要CoarseGrainedExecutorBackend進行維護和管理。CoarseGrainedExecutorBackend啓動時需要通過發送RegisterExecutor向Driver註冊,註冊內容是RegisterExecutor。RegisterExecutor是一個case class,源碼如下:
// 向Driver註冊的Executor信息
case class RegisterExecutor(executorId: String,executorRef: RpcEndpointRef,
hostPort: String,cores: Int,logUrls: Map[String, String])
extends CoarseGrainedClusterMessage
RegisterExecutor的註冊信息的發送是由類CoarseGrainedExecutorBackend的onStart方法實現的。
override def onStart() {
rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
driver = Some(ref)
ref.ask[RegisterExecutorResponse](
RegisterExecutor(executorId, self, hostPort, cores, extractLogUrls))
}(ThreadUtils.sameThread).onComplete {
case Success(msg) => Utils.tryLogNonFatalError {
Option(self).foreach(_.send(msg)) // msg must be RegisterExecutorResponse
}
case Failure(e) => {
logError(s"Cannot register with driver: $driverUrl", e)
System.exit(1)
}
}(ThreadUtils.sameThread)
}
CoarseGrainedExecutorBackend啓動時向Driver發送RegisterExecutor消息進行註冊;Driver收到RegisterExecutor消息,在Executor註冊成功後返回消息RegisterExecutor給CoarseGrainedExecutorBackend。這裏註冊的Executor和真正工作的Executor沒有任何關係,其實註冊是RegisterExecutorBackend,可以將RegisterExecutor理解爲RegisterExecutorBackend。
注意:1、CoarseGrainedExecutorBackend是Executor運行所在的進程名稱,CoarseGrainedExecutorBackend本身不會完成任務的計算。2、Executor纔是正在處理任務的對象,Executor內部是通過線程池的方式來完成Task的計算的。3、CoarseGrainedExecutorBackend與Executor是一一對應的。4、CoarseGrainedExecutorBackend是一個消息通信體,可接收Driver信息、發送給Driver信息等,繼承與ThreadSafeRPCEndpoint。
CoarseGrainedExecutorBackend把RegisterExecutor消息發送給Driver,其中Driver在SparkDeploySchedulerBackend實現(在Spark-2.0後SparkDeploySchedulerBackend更名爲StandaloneSchedulerBackend)。SparkDeploySchedulerBackend繼承自CoarseGrainedSchedulerBackend,start啓動時啓動AppClient(Spark-2.0後,AppClient更名爲StandaloneAppClient),SparkDeploySchedulerBackend的start方法的源碼:
override def start() { super.start() //調用CoarseGrainedSchedulerBackend的start方法 launcherBackend.connect() // The endpoint for executors to talk to us val driverUrl = rpcEnv.uriOf(SparkEnv.driverActorSystemName, RpcAddress(sc.conf.get("spark.driver.host"), sc.conf.get("spark.driver.port").toInt), CoarseGrainedSchedulerBackend.ENDPOINT_NAME) val args = Seq( "--driver-url", driverUrl, "--executor-id", "{{EXECUTOR_ID}}", "--hostname", "{{HOSTNAME}}", "--cores", "{{CORES}}", "--app-id", "{{APP_ID}}", "--worker-url", "{{WORKER_URL}}") val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions") .map(Utils.splitCommandString).getOrElse(Seq.empty) val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath") .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil) val libraryPathEntries = sc.conf.getOption("spark.executor.extraLibraryPath") .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil) val testingClassPath = if (sys.props.contains("spark.testing")) { sys.props("java.class.path").split(java.io.File.pathSeparator).toSeq } else {Nil} // 通過發送過來的註冊信息啓動Executors
val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
val javaOpts = sparkJavaOpts ++ extraJavaOpts
val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("")
val coresPerExecutor = conf.getOption("spark.executor.cores").map(_.toInt)
val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory,
command, appUIAddress, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor)
client = new AppClient(sc.env.rpcEnv, masters, appDesc, this, conf)
client.start() //啓動Driver
launcherBackend.setState(SparkAppHandle.State.SUBMITTED)
waitForRegistration()
launcherBackend.setState(SparkAppHandle.State.RUNNING)
}
在Driver進程中有兩個很重要的Endpoint:
-
ClientEndpoint:負責向Master註冊當前的程序,是AppClient內部的成員類
-
DriverEndpoint:是整個程序運行時的驅動器,接收到RegisterExecutor消息完成在Driver上的註冊,是CoarseGrainedSchedulerBackend內部的成員類。
Eexecutor的RegisterExecutor註冊消息提交給DriverEndPoint,通過DriverEndPoint寫數據給CoarseGrainedSchedulerBackend裏面的數據結構executorMapData,致使CoarseGrainedSchedulerBackend獲取了當前程序分配的所有的ExecutorBackend進程,而在每個ExecutorBackend進行實例中,通過Executor對象負責具體任務的執行。Executor與CoarseGrainedSchedulerBackend之間RegisterExecutor消息的接收和發送是通過receiveAndReply方法實現的,receiveAndReply方法很重要。receiveAndReply方法中有RegisterExecutor註冊的過程:
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
case RegisterExecutor(executorId, executorRef, hostPort, cores, logUrls) =>
if (executorDataMap.contains(executorId)) { //RegisterExecutor的註冊信息
context.reply(RegisterExecutorFailed("Duplicate executor ID: " + executorId))
} else {
// address不爲空。則executorRef.address作爲executorAddress
val executorAddress = if (executorRef.address != null) {
executorRef.address
} else { //若爲空,使用sender的Address作爲executorAddress
context.senderAddress
}
addressToExecutorId(executorAddress) = executorId
totalCoreCount.addAndGet(cores)
totalRegisteredExecutors.addAndGet(1)
val data = new ExecutorData(executorRef, executorRef.address, executorAddress.host,
cores, cores, logUrls)
CoarseGrainedSchedulerBackend.this.synchronized {
executorDataMap.put(executorId, data)
if (numPendingExecutors > 0) {
numPendingExecutors -= 1}}
context.reply(RegisteredExecutor(executorAddress.host))
listenerBus.post(
SparkListenerExecutorAdded(System.currentTimeMillis(), executorId, data))
makeOffers()
}
case StopDriver =>
context.reply(true)
stop()
case StopExecutors =>
logInfo("Asking each executor to shut down")
for ((_, executorData) <- executorDataMap) {
executorData.executorEndpoint.send(StopExecutor)
}
context.reply(true)
case RemoveExecutor(executorId, reason) =>
removeExecutor(executorId, reason)
context.reply(true)
case RetrieveSparkProps =>
context.reply(sparkProperties)
}
由CoarseGrainedSchedulerBackend的receiveAndReply方法可知RegisterExecutor消息的執行過程,即Executor的註冊過程:
-
判斷executorDataMap是否包含executorId,若已包含發送註冊失敗的消息RegisterExecutorFailed,因爲已經有重複的executorId的Executor在運行。
-
進行Executor的註冊,獲取executorAddress
-
定義3個數據結構:addressToExecutorId(DriverEndPoint的數據結構,包含RPC地址主機名和端口與ExecutorId的對應關係)、totalCoreCount(集羣的總核數)、totalRegisteredExecutors(當前註冊的Executors總數。兩者都爲CoarseGrainedSchedulerBackend數據結構)
-
創建一個ExecutorData,提取executorRef、executorRef.address、hostname、cores等信息。
-
通過代碼塊CoarseGrainedSchedulerBackend.this.synchronized:集羣中多個Executor向Driver註冊,防止寫衝突,設計爲一個同步代碼塊。
-
executor.send(RegisterExecutor)發消息RegisterExecutor給sender,sender是CoarseGrainedExecutorBackend,而CoarseGrainedExecutorBackend收到RegisterExecutor消息後,創建了Executor。而Executor是負責真正Task計算的。
-
override def receive: PartialFunction[Any, Unit] = { case RegisteredExecutor(hostname) => logInfo("Successfully registered with driver") executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false) case RegisterExecutorFailed(message) => logError("Slave registration failed: " + message) System.exit(1) case LaunchTask(data) => if (executor == null) { logError("Received LaunchTask command but executor was null") System.exit(1) } else { val taskDesc = ser.deserialize[TaskDescription](data.value) logInfo("Got assigned task " + taskDesc.taskId) executor.launchTask(this, taskId = taskDesc.taskId, attemptNumber = taskDesc.attemptNumber, taskDesc.name, taskDesc.serializedTask) } ......................
創建的threadPool中以多線程併發執行和線程複用的方式來高效執行spark發送過來的Task。線程池創建好後,等待Driver發送任務給CoarseGrainedExecutorBackend,不是直接給Executor,因爲Executo不是一個消息循環體。
Executor具體是如何工作的???
Driver發送過來Task時,其實發送給了CoarseGrainedExecutorBackend這個RpcEndpoint,而不是直接發送給Executor(因爲Executor不是消息循環體,永遠無法直接接收遠程發送過來的消息)。
Driver向CoarseGrainedExecutorBackend發送LaunchTask,轉過來交給線程池中的線程去執行。先判斷Executor是否爲空,爲空則直接退出,不爲空則反序列化任務調用Executor的launchTask,提交給Executor執行。
launchTask接收到Task執行的命令後,首先將Task封裝成TaskRunner裏面,然後放到runningTasks,runningTasks是一個數據結構。
Executor的run方法最終調用Task.run方法實現。Executor的run方法調用方法Task的run方法源碼:
-
............................ var threwException = true val (value, accumUpdates) = try { val res = task.run( taskAttemptId = taskId, attemptNumber = attemptNumber, metricsSystem = env.metricsSystem) threwException = false res } ...........................................
類Task的run方法的源碼:
-
final def run(taskAttemptId: Long,attemptNumber: Int,metricsSystem: MetricsSystem) : (T, AccumulatorUpdates) = { context = new TaskContextImpl( stageId,partitionId,taskAttemptId,attemptNumber,taskMemoryManager,metricsSystem, internalAccumulators,runningLocally = false) TaskContext.setTaskContext(context) context.taskMetrics.setHostname(Utils.localHostName()) context.taskMetrics.setAccumulatorsUpdater(context.collectInternalAccumulators) taskThread = Thread.currentThread() if (_killed) { kill(interruptThread = false) } try {//主要是調用runTask方法 (runTask(context), context.collectAccumulators()) } finally { context.markTaskCompleted() .................................
Task是抽象類,Task的run方法是具體的實例方法,而Task的runTask方法是抽象方法,而Task的子類有ShuffleMapTask和ResultTask兩個,根據任務的實際情況而定實現哪個子類的runTask方法。ShuffleMapTask與ResultTask的runTask方法的區別是對task是否進行shuffle操作(在runTask中是否執行shuffle的寫操作)。