博客地址: http://blog.csdn.net/yueqian_zhu/
本節主要講解SparkContext的邏輯
首先看一個spark自帶的最簡單的例子:
object SparkPi {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Spark Pi")
val spark = new SparkContext(conf)
val slices = if (args.length > 0) args(0).toInt else 2
val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
val count = spark.parallelize(1 until n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / n)
spark.stop()
}
}
我們一般寫spark程
序的流程與此類似。從這個簡單的程序中,逐步分析內部的原理。個人覺得這纔是spark最精髓的地方,至於之前的master,worker的啓動流程與一般的分佈式系統無太多差別。
首先創建SparkConf,加載一些spark的配置信息。
創建SparkContext,在創建SparkContext時可以指定preferredNodeLocationData,也可以不指定。
SparkContext創建的過程比較複雜,我們只介紹比較重要的對象及方法
1、listenerBus中可添加各種SparkListener監聽器,當任何SparkListenerEvent事件到來時,向所有註冊進來的監聽器發送事件
// An asynchronous listener bus for Spark events
private[spark] val listenerBus = new LiveListenerBus
2、persistentRdds用於緩存RDD在內存中// Keeps track of all persisted RDDs
private[spark] val persistentRdds = new TimeStampedWeakValueHashMap[Int, RDD[_]]
3、創建SparkEnv -> 調用createDriverEnv<pre name="code" class="java" style="font-size: 12pt; background-color: rgb(255, 255, 255);">// Create the Spark execution environment (cache, map output tracker, etc)
_env = createSparkEnv(_conf, isLocal, listenerBus)
流程:1)創建driver的ActorRef,幷包裝在rpcEnv中
2)創建mapOutputTracker,實際類型爲MapOutputTrackerMaster,用於跟蹤map output的信息。並將該對象註冊到MapOutputTrackerMasterEndpoint中。說明一下注冊的作用:註冊返回mapOutputTracker.trackerEndpoint(ActorRef類型),之後向該ActorRef發送消息會回調mapOutputTracker中的相關方法。比如發送AkkaMessage消息,會回調MapOutputTrackerMasterEndpoint的receiveAndReply或者receive方法。
3)創建shuffleManager,默認是org.apache.spark.shuffle.hash.HashShuffleManager
4)創建
shuffleMemoryManager
5)創建
blockTransferService默認是
netty,shuffle時讀取塊的服務
6)創建
blockManagerMaster,
負責記錄下所有BlockIds存儲在哪個Worker上
7)創建
blockManager,提供真正的接口用於讀寫
8)創建
cacheManager,它是依賴於blockManager的,RDD在進行計算的時候,通過CacheManager來獲取數據,並通過CacheManager來存儲計算結果
9)創建
broadcastManager
10)創建
httpFileServer,Driver和Executor在運行的時候都有可能存在第三方包依賴,
Driver比較簡單,spark-submit在提交的時候會指定所要依賴的jar文件從哪裏讀取;Executor由worker來啓動,worker需要下載Executor啓動時所需要的jar文件。爲了解決Executor啓動時依賴的Jar問題,Driver在啓動的時候要啓動HttpFileServer存儲第三方jar包,然後由worker從HttpFileServer來獲取。
11)創建
outputCommitCoordinator
12)創建
executorMemoryManager
將上面的對象共同包裝成SparkEnv
4、創建_metadataCleaner,定期清理元數據信息
5、創建executorEnvs,Executor相關的配置
6、_heartbeatReceiver,用於接收Executor的心跳,同時,也會起一個定時器檢測Executor是否過期
7、調用
createTaskScheduler方法創建_taskScheduler和_schedulerBackend
1)根據master來區分運行的邏輯,我們以standalone模式(spark://開頭)爲例講解
2)taskscheduler實際創建的是TaskSchedulerImpl,backend實際是SparkDeploySchedulerBackend,而SparkDeploySchedulerBackend本身拓展自CoarseGrainedSchedulerBackend。CoarseGrainedSchedulerBackend是一個基於Akka Actor實現的粗粒度的資源調度類,在整個SparkJob運行期間,CoarseGrainedSchedulerBackend會監聽並持有註冊給它的Executor資源,並且接收Executor註冊,狀態更新,響應Scheduler請求等,根據現有Executor資源發起任務調度流程。總之,兩者是互相協作,分工合作,共同完成整個任務調度的流程。
case SPARK_REGEX(sparkUrl) =>
val scheduler = new TaskSchedulerImpl(sc)//任務相關的調度
val masterUrls = sparkUrl.split(",").map("spark://" + _)
val backend = new SparkDeploySchedulerBackend(scheduler, sc, masterUrls)
scheduler.initialize(backend)
(backend, scheduler)
3)scheduler的初始化 這裏需要說明一下Pool的作用:每個SparkContext可能同時存在多個可運行的沒有依賴關係任務集,這些任務集之間如何調度,則是由pool來決定的,默認是FIFO,其他還有Fair調度器
def initialize(backend: SchedulerBackend) {
this.backend = backend
// temporarily set rootPool name to empty
rootPool = new Pool("", schedulingMode, 0, 0)
schedulableBuilder = {
schedulingMode match {
case SchedulingMode.FIFO =>
new FIFOSchedulableBuilder(rootPool)
case SchedulingMode.FAIR =>
new FairSchedulableBuilder(rootPool, conf)
}
}
schedulableBuilder.buildPools()
}
8、創建_dagScheduler,它是根據我們的程序來劃分stage,構建有依賴關係的任務集。DAGscheduler內部會開啓事件循環器,輪詢處理接收到的事件9、調用_taskScheduler.start() -> backend.start(),創建driverEndpoint,用於向外界的交互,構建運行Executor所需要的環境,包括Appname,每個Executor上需要的cores、memory,classpath,jar以及參數,指定運行的類爲org.apache.spark.executor.CoarseGrainedExecutorBackend,封裝成ApplicationDescription。並將ApplicationDescription以及masters等封裝成AppClient,作爲App向masters提交的入口。
override def start() {
super.start()
//
...略
//
val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("")
val coresPerExecutor = conf.getOption("spark.executor.cores").map(_.toInt)
val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory,
command, appUIAddress, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor)
client = new AppClient(sc.env.actorSystem, masters, appDesc, this, conf)
client.start()
waitForRegistration()
}
查看client.start()內部,創建基於ClientActor對象的ActorRef,繼續查看preStart() -> registerWithMaster
def tryRegisterAllMasters() {
for (masterAkkaUrl <- masterAkkaUrls) {
logInfo("Connecting to master " + masterAkkaUrl + "...")
val actor = context.actorSelection(masterAkkaUrl)
actor ! RegisterApplication(appDescription)
}
}
可以看到,其實只是向masters的actorRef的發送RegisterApplication消息。我們繼續看master收到這個消息如何處理?
在主master收到後,保存app的詳細信息,創建appId,持久化app,並回饋RegisteredApplication消息,之後執行調度。調度流程在《spark core源碼分析2 master啓動流程》一節中已經介紹過了。
case RegisterApplication(description) => {
if (state == RecoveryState.STANDBY) {
// ignore, don't send response
} else {
logInfo("Registering app " + description.name)
val app = createApplication(description, sender)
registerApplication(app)//將app中的詳細信息保存在master的內存各種數據結構中
logInfo("Registered app " + description.name + " with ID " + app.id)
persistenceEngine.addApplication(app)//持久化app,用於主備切換時重構
sender ! RegisteredApplication(app.id, masterUrl)
schedule()//調度
}
}
AppClient收到RegisteredApplication消息後,確定主master,並設置app狀態爲已註冊,設置master傳回的AppIdcase RegisteredApplication(appId_, masterUrl) =>
appId = appId_
registered = true
changeMaster(masterUrl)
listener.connected(appId)
在《spark core源碼分析2 master啓動流程》一節中,我們講了調度的master端的處理,當時還沒有app註冊上來,所以也就沒有向worker發送啓動Executor的命令。而此時我們已經註冊了一個App了,所以master調用launchExecutor(worker, exec),向worker發送LaunchExecutor消息。同時,也會向Appclient發送ExecutorAdded消息。
worker端收到後創建工作目錄,創建ExecutorRunner,ExecutorRunner啓動後單獨開闢一個線程處理,會根據之前包裝的command啓動一個進程,mainclass其實就是CoarseGrainedExecutorBackend,這些運行的參數等信息都已經被包含在appDesc中,由driver經master傳遞過來。處理完成之後,向master反饋ExecutorStateChanged消息
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
if (masterUrl != activeMasterUrl) {
logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.")
} else {
try {
logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))
// Create the executor's working directory
val executorDir = new File(workDir, appId + "/" + execId)
if (!executorDir.mkdirs()) {
throw new IOException("Failed to create directory " + executorDir)
}
// Create local dirs for the executor. These are passed to the executor via the
// SPARK_EXECUTOR_DIRS environment variable, and deleted by the Worker when the
// application finishes.
val appLocalDirs = appDirectories.get(appId).getOrElse {
Utils.getOrCreateLocalRootDirs(conf).map { dir =>
Utils.createDirectory(dir, namePrefix = "executor").getAbsolutePath()
}.toSeq
}
appDirectories(appId) = appLocalDirs
val manager = new ExecutorRunner(
appId,
execId,
appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
cores_,
memory_,
self,
workerId,
host,
webUi.boundPort,
publicAddress,
sparkHome,
executorDir,
akkaUrl,
conf,
appLocalDirs, ExecutorState.LOADING)
executors(appId + "/" + execId) = manager
manager.start()
coresUsed += cores_
memoryUsed += memory_
master ! ExecutorStateChanged(appId, execId, manager.state, None, None)
} catch {
case e: Exception => {
logError(s"Failed to launch executor $appId/$execId for ${appDesc.name}.", e)
if (executors.contains(appId + "/" + execId)) {
executors(appId + "/" + execId).kill()
executors -= appId + "/" + execId
}
master ! ExecutorStateChanged(appId, execId, ExecutorState.FAILED,
Some(e.toString), None)
}
}
}
master收到消息後會根據Executor的狀態來區分。那哪些時候會收到這些消息呢? (1)當CoarseGrainedExecutorBackend進程退出後,會向master發送ExecutorStateChanged,狀態爲EXITED。
(2)當AppClient收到ExecutorAdded消息後,會向master發送ExecutorStateChanged,狀態爲RUNNING
(3)當ExecutorRunner啓動進程失敗時,會向master發送ExecutorStateChanged,狀態爲FAILED
關於CoarseGrainedExecutorBackend進程的啓動,即Executor的啓動,我們下節再講。真正的任務是運行在Executor中的,只有Executor進程正常啓動之後,才能運行被分配的任務。我們先介紹_taskScheduler.start()之後的邏輯。
10、下面主要就是初始化blockManager
_applicationId = _taskScheduler.applicationId()
_applicationAttemptId = taskScheduler.applicationAttemptId()
_conf.set("spark.app.id", _applicationId)
_env.blockManager.initialize(_applicationId)
def initialize(appId: String): Unit = {
blockTransferService.init(this)//讀取block
shuffleClient.init(appId)//跟ShuffleServie有關,如果開關不打開,這裏不處理
blockManagerId = BlockManagerId(
executorId, blockTransferService.hostName, blockTransferService.port)//blockManager元信息
shuffleServerId = if (externalShuffleServiceEnabled) {<span style="font-family: Menlo;">//跟ShuffleServie有關,暫時不介紹</span>
BlockManagerId(executorId, blockTransferService.hostName, externalShuffleServicePort)
} else {
blockManagerId
}
//向driver註冊自己,註冊時攜帶了自身的ActorRef,Driver收到後會將blockManagerId及自身的ActorRef放入hashmap中保存起來。
master.registerBlockManager(blockManagerId, maxMemory, slaveEndpoint)
// Register Executors' configuration with the local shuffle service, if one should exist.
if (externalShuffleServiceEnabled && !blockManagerId.isDriver) {
registerWithExternalShuffleServer()
}
}