Spark-executor
@(spark)[executor]
ExecutorExitCode
/**
* These are exit codes that executors should use to provide the master with information about
* executor failures assuming that cluster management framework can capture the exit codes (but
* perhaps not log files). The exit code constants here are chosen to be unlikely to conflict
* with "natural" exit statuses that may be caused by the JVM or user code. In particular,
* exit codes 128+ arise on some Unix-likes as a result of signals, and it appears that the
* OpenJDK JVM may use exit code 1 in some of its own "last chance" code.
*/
private[spark]
object ExecutorExitCode {
ExecutorSource
主要就是一些metric
CoarseGrainedExecutorBackend
class CoarseGrainedExecutorBackend其實是個Actor,它是有main函數的:
1. 啓動一個叫做fetcher的actorSystem,從driver處獲取sparkConf
2. 關閉fetcher
3. createExecutorEnv即SparkEnv
4. 啓動CoarseGrainedExecutorBackend這個Actor
- 向driver註冊自己
- 等待收消息
override def receiveWithLogging = {
case RegisteredExecutor =>
logInfo("Successfully registered with driver")
val (hostname, _) = Utils.parseHostPort(hostPort)
executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
case RegisterExecutorFailed(message) =>
logError("Slave registration failed: " + message)
System.exit(1)
case LaunchTask(data) =>
if (executor == null) {
logError("Received LaunchTask command but executor was null")
System.exit(1)
} else {
val ser = env.closureSerializer.newInstance()
val taskDesc = ser.deserialize[TaskDescription](data.value)
logInfo("Got assigned task " + taskDesc.taskId)
executor.launchTask(this, taskId = taskDesc.taskId, attemptNumber = taskDesc.attemptNumber,
taskDesc.name, taskDesc.serializedTask)
}
case KillTask(taskId, _, interruptThread) =>
if (executor == null) {
logError("Received KillTask command but executor was null")
System.exit(1)
} else {
executor.killTask(taskId, interruptThread)
}
case x: DisassociatedEvent =>
if (x.remoteAddress == driver.anchorPath.address) {
logError(s"Driver $x disassociated! Shutting down.")
System.exit(1)
} else {
logWarning(s"Received irrelevant DisassociatedEvent $x")
}
case StopExecutor =>
logInfo("Driver commanded a shutdown")
executor.stop()
context.stop(self)
context.system.shutdown()
}
值得關注的message其實是LaunchTask,它會調用executor.launchTask
Executor
/**
* Spark executor used with Mesos, YARN, and the standalone scheduler.
* In coarse-grained mode, an existing actor system is provided.
*/
private[spark] class Executor(
executorId: String,
executorHostname: String,
env: SparkEnv,
userClassPath: Seq[URL] = Nil,
isLocal: Boolean = false)
extends Logging
其中最重要的函數是:
def launchTask(
context: ExecutorBackend,
taskId: Long,
attemptNumber: Int,
taskName: String,
serializedTask: ByteBuffer) {
val tr = new TaskRunner(context, taskId = taskId, attemptNumber = attemptNumber, taskName,
serializedTask)
runningTasks.put(taskId, tr)
threadPool.execute(tr)
}
顯然這是一個基於線程池的異步執行:TaskRunner的Run函數邏輯如下:
1. 調用 Task.deserializeWithDependencies,得到依賴的文件和jar
2. 根據cache的狀況決定是get文件還是用cache的文件;文件是否更新是由filename和timestamp共同決定的
3. 反序列化,得到真正的task
4. call task.run真正去執行task
5. 根據結果大小決定是把結果直接返回還是寫入blockManager
6. 通過execBackEnd的statusUpdate把結果返回driver
從上面的流程可以看出:在整個過程中有下面幾種數據流動:
1. taskFiles和taskJar的url
2. taskFiles和taskJars的文件,如果有cache的話,可以沒有
3. Task的序列化byte
4. 結果(block地址或者一個比較小的結果)
即一個task的網絡傳輸不會很多。
task的真正執行過程和task的調度在scheduler中。