文章目錄

資源調度源碼分析

資源調度流程

1，當集羣啓動時，各個worker向master彙報信息，封裝到workerInfo類中，放入workers集合。
2，當客戶端向集羣提交Application時，首先會在客戶端啓動一個sparksubmit進程
3，爲driver向Master申請資源，此時先遍歷waitingDriver集合，集合不爲空，說明有客戶端向master申請資源，此時再遍歷workers集合，隨機找一臺資源充沛的worker節點，啓動driver進程。
4，Driver啓動成功後，把這個申請信息從waitingDriver中刪除掉。
5，Driver向Master爲當前的Application申請資源。此時將請求信息封裝到ApplicationInfo中，放入waitingApps集合中。
5，Master接收到請求後，查看waitingApps集合，若不爲空，再遍歷worker集合，尋找符合條件的worker節點，在這些worker節點上啓動一批excuter進程，這些excuter，默認佔用1G內存和這個excuter所能管理的所有的核。
7，excuter啓動成功後，向TaskSchedule反向註冊。
8，TaskSchedule可以向各個excuter進程分發task。

資源調度原理

workers集合

當集羣啓動的時候，各個worker節點向master彙報信息，這些信息先是封裝到workerInfo類中，再把這些類放到workers集合中。

waitingDriver集合

waitingDriver存放的都是等待client向master申請資源的信息（這個資源實際上就是Driver）。
當 waitingDriver集合中元素不爲空，說明有客戶端向master申請資源，此時應該查看當前集羣的資源情況(查看一下workers集合)，找到符合要求的節點，啓動Driver，當Driver成功啓動，這個申請資源的信息從waitingDrivers中刪掉。

waitingApps集合

waitingApps存放的都是Driver向Master申請的資源（爲當前Application申請的資源）。
當waitingApps集合不爲空，說明Driver向Master爲當前的Application申請資源，查看集羣的資源情況（workers集合），找到合適的worker節點，啓動Excuter進程，默認情況下，每一個worker爲當前的Application只是啓動一個Excutor，這個Excutor會使用1G內存和這個worker所管理的所有的core。

Schedule方法

waitingApps和waitingDriver這兩個集合一直在發生變化，所以需要時時監控他們的狀態。

所以Master裏有一個schedule()方法，每當這兩個集合中添加元素的時候，就會反調這個方法，這個方法裏有2套邏輯，分別對應這兩個集合，當某個集合反調這個函數時，它會按照上述處理過程來處理。

資源調度源碼分析

Master是通過schedule方法進行資源調度，告知worker啓動executor等。

資源調度三大集合源碼解析

val workers = new HashSet[WorkerInfo]   // 存儲每一個Worker節點的基本信息
val waitingApps = new ArrayBuffer[ApplicationInfo]
private val waitingDrivers = new ArrayBuffer[DriverInfo]

WorkerInfo

host：Worker所在的節點
port：端口號
cores：worker所有的核數
memory：它所有的內存
endpoint：spark內部通信屬性，類似於郵箱
webUiAddress：外部UI的地址，默認端口8081

DriverInfo

StartTime：啓動時間
id：id號
desc：Driver的資源描述信息

ApplicationInfo

startTime：開啓時間
id：id號
desc：App的使用資源信息

一 schedule方法

前面分析的都是怎樣將資源,如worker、executor、Application等加入到各自的等待隊列中(失敗完成異常等等).
在等待的應用程序中調度當前可用的資源。
此方法將被調用–>每次一個新的應用程序連接或可用資源改變的時候。

Master上面最重要的部分–>Master資源調度算法(其實就是在worker上面啓動Executor)

1 判斷master狀態，只有alive狀態的master纔可以進行資源調度，standby是不能夠調度的
2 將可用的worker節點打亂，這樣有利於driver的負載均衡
3 進行driver資源調度，遍歷處於等待狀態的driver隊列，發起driver
4 在worker上開啓executor進程

private def schedule(): Unit = {
// 判斷Master的狀態
// 只有alive狀態的master纔可以進行資源調度，standby是不能夠調度的
if (state != RecoveryState.ALIVE) { return }
 
// 將可用的worker節點打亂，這樣有利於driver的均衡
val shuffledWorkers = Random.shuffle(workers)
for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) {
// 進行driver資源調度，遍歷處於等待狀態的driver隊列
for (driver <- waitingDrivers) {
// 判斷worker的可使用內存是否大於driver所需要的內存以及worker可使用cpu核數是否大於driver所需要的cpu核數
if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
// 滿足條件發起driver
launchDriver(worker, driver)
// 將當前driver從等待隊列中移除
waitingDrivers -= driver
}
}
}
// 在worker上開啓executor進程
startExecutorsOnWorkers()
}

1.1 Random.shuffle是一個java的方法。

Random.shuffle的原理是:
對傳入集合中的元素進行隨機的打亂,取出workers中所有註冊上來的worker,首先進行過濾,必須保證(作爲參數)傳入的worker的狀態都是alive的,然後, 對於狀態爲alive的worker,調用shuffle方法進行打亂。

將worker存入到ArrayBuffer中並賦值給buf.
swap函數: 是將索引位置上的Worker兩兩進行交換.
For循環：從buf中最後一個元素開始循環，一直到索引爲3，其中的nextInt是取0到n-1的隨機數，然後調用swp()函數，將n-1和k進行交換，這樣執行結束後，buf中的Worker順序完全被打亂了

def shuffle[T, CC[X] <: TraversableOnce[X]](xs: CC[T])(implicit bf: CanBuildFrom[CC[T], T, CC[T]]): CC[T] = {
val buf = new ArrayBuffer[T] ++= xs
 
def swap(i1: Int, i2: Int) {
val tmp = buf(i1)
buf(i1) = buf(i2)
buf(i2) = tmp
}
 
for (n <- buf.length to 2 by -1) {
val k = nextInt(n)
swap(n - 1, k)
}
 
(bf(xs) ++= buf).result()
}

1.2 launchDriver 發起driver

首先調度driver–>優先級高於Application 爲什麼先調度driver?
其實,只有提交方式爲cluster的時候,纔會註冊driver,然後調度driver
standalone和client都是在本地啓動driver,而不會來註冊driver,更不用說調度driver了


/**
* 判斷某個worker上面有driver所需的足夠資源來啓動相應的Executor
* @param worker
* @param driver
*/
private def launchDriver(worker: WorkerInfo, driver: DriverInfo) {
// 提示信息
logInfo("Launching driver " + driver.id + " on worker " + worker.id)
// 將driver的信息加入到要爲其啓動Executor的workerInfo中
worker.addDriver(driver)
driver.worker = Some(worker)
// 向worker發送LaunchDriver消息
worker.endpoint.send(LaunchDriver(driver.id, driver.desc))
// 將driver的狀態修改爲RUNNING
driver.state = DriverState.RUNNING
}

二 startExecutorsOnWorkers 在worker上啓動executor進程

/**
* Schedule and launch executors on workers
* 在worker上開啓executor進程
*/
private def startExecutorsOnWorkers(): Unit = {
// 遍歷處於等待狀態的application，且處於等待的狀態的application的所需要的cpu核數大於0
// coresLeft = app請求的核數-已經分配給executor的核數的和
for (app <- waitingApps if app.coresLeft > 0) {
// 每一個executor所需要的核數
val coresPerExecutor: Option[Int] = app.desc.coresPerExecutor
// 過濾出有效的可用worker
// 再從worker中過濾出worker剩餘內存和CPU核數不小於app對應executor所需要的內存和CPU核數
// 按照剩餘的CPU核數反向排序woker
val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
.filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
worker.coresFree >= coresPerExecutor.getOrElse(1))
.sortBy(_.coresFree).reverse
 
// 在可用的worker上調度executor，啓動executor有兩種算法模式：
// 一：將應用程序儘可能多的分配到不同的worker上：spreadOutApps(平均分配)
// 二：和第一種相反，分配到儘可能少的worker上，通常用於計算密集型；非spreadOutApps(有點按需分配的意思)
// 每一個executor所需要的核數是可以配置的，一般來講如果worker有足夠的內存和CPU核數，同一個應用程序就可以
// 在該worker啓動多個executors；否則就不能再啓動新的executor了，則需要到其他worker上去分配executor了
val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)
 
// 在可用的worker上分配資源給executor
for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
allocateWorkerResourceToExecutors(
app, assignedCores(pos), coresPerExecutor, usableWorkers(pos))
}
}
}

三 scheduleExecutorsOnWorkers在每一個worker上調度資源

判斷該worker能不能分配一個或者多個executor，能則分配相對應的executor所需要的CPU核數.

private def scheduleExecutorsOnWorkers(
      app: ApplicationInfo,
      usableWorkers: Array[WorkerInfo],
      spreadOutApps: Boolean): Array[Int] = {
    // 每一個Executor所需要的核數，若沒有設置則爲null
    val coresPerExecutor = app.desc.coresPerExecutor
    // 加入爲空，給他設置默認最小值1
    val minCoresPerExecutor = coresPerExecutor.getOrElse(1)
    val oneExecutorPerWorker = coresPerExecutor.isEmpty
    // 每一個Executor所需要的內存
    val memoryPerExecutor = app.desc.memoryPerExecutorMB
    // 可用的Worker個數
    val numUsable = usableWorkers.length
    val assignedCores = new Array[Int](numUsable) // 每一個Worker可以貢獻的核
    val assignedExecutors = new Array[Int](numUsable) // 每一個Worker啓動的executor的個數
    // 計算所有可用worker的可用核數
    var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)

    /** Return whether the specified worker can launch an executor for this app. */
    def canLaunchExecutor(pos: Int): Boolean = {
      val keepScheduling = coresToAssign >= minCoresPerExecutor
      val enoughCores = usableWorkers(pos).coresFree - assignedCores(pos) >= minCoresPerExecutor

      // If we allow multiple executors per worker, then we can always launch new executors.
      // Otherwise, if there is already an executor on this worker, just give it more cores.
      val launchingNewExecutor = !oneExecutorPerWorker || assignedExecutors(pos) == 0
      if (launchingNewExecutor) {
        val assignedMemory = assignedExecutors(pos) * memoryPerExecutor
        val enoughMemory = usableWorkers(pos).memoryFree - assignedMemory >= memoryPerExecutor
        val underLimit = assignedExecutors.sum + app.executors.size < app.executorLimit
        keepScheduling && enoughCores && enoughMemory && underLimit
      } else {
        // We're adding cores to an existing executor, so no need
        // to check memory and executor limits
        keepScheduling && enoughCores
      }
    }

    // Keep launching executors until no more workers can accommodate any
    // more executors, or if we have reached this application's limits
    var freeWorkers = (0 until numUsable).filter(canLaunchExecutor)
    while (freeWorkers.nonEmpty) {
      freeWorkers.foreach { pos =>
        var keepScheduling = true
        while (keepScheduling && canLaunchExecutor(pos)) {
          coresToAssign -= minCoresPerExecutor
          assignedCores(pos) += minCoresPerExecutor

          // If we are launching one executor per worker, then every iteration assigns 1 core
          // to the executor. Otherwise, every iteration assigns cores to a new executor.
          if (oneExecutorPerWorker) {
            assignedExecutors(pos) = 1
          } else {
            assignedExecutors(pos) += 1
          }

          // Spreading out an application means spreading out its executors across as
          // many workers as possible. If we are not spreading out, then we should keep
          // scheduling executors on this worker until we use all of its resources.
          // Otherwise, just move on to the next worker.
          if (spreadOutApps) {
            keepScheduling = false
          }
        }
      }
      freeWorkers = freeWorkers.filter(canLaunchExecutor)
    }
    assignedCores
  }

四 allocateWorkerResourceToExecutors在worker上分配具體的資源

private def allocateWorkerResourceToExecutors(
app: ApplicationInfo,
assignedCores: Int,
coresPerExecutor: Option[Int],
worker: WorkerInfo): Unit = {
 
// 獲取該worker應該有多少個executor
val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
// 獲取每一個executor應該分配的核數，如果沒有指定則使用計算的應該分配的核數
val coresToAssign = coresPerExecutor.getOrElse(assignedCores)
for (i <- 1 to numExecutors) {
// 向worker上添加executor，創建ExecutorDesc對象，更新application已經分配到的cpu核數
val exec = app.addExecutor(worker, coresToAssign)
// 啓動executor
launchExecutor(worker, exec)
// 更新application的狀態
app.state = ApplicationState.RUNNING
}
}

五 launchExecutor發起executor

/**
* launchExecutor發起executor
* @param worker-->WorkerInfo
* @param exec-->ExecutorDesc
*/
private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
// worker啓動executor,並且更新worker的cpu和內存信息
worker.addExecutor(exec)
worker.endpoint.send(LaunchExecutor(masterUrl,
exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
// 向application發送ExecutorAdded消息
exec.application.driver.send(
ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
}

資源調度（學習筆記）

文章目錄

資源調度流程

資源調度原理

workers集合

waitingDriver集合

waitingApps集合

Schedule方法

資源調度源碼分析

資源調度三大集合源碼解析

一 schedule方法

1.1 Random.shuffle是一個java的方法。

1.2 launchDriver 發起driver

二 startExecutorsOnWorkers 在worker上啓動executor進程

三 scheduleExecutorsOnWorkers在每一個worker上調度資源

四 allocateWorkerResourceToExecutors在worker上分配具體的資源

linux安裝cuda和cudnn

模擬手機設備：使用 Playwright 實現移動端自動化測試

Mellanox網卡開啓SR-IOV

測試人員都是畫畫大神，讓我看看誰還不會用代碼圖？

Object.values()對象遍歷

我拍了拍Redis，被移出了羣聊···

網絡現代化通向雲原生應用的高速公路

面試官：說說你對序列化的理解

我宣佈，這是我找到的史上AI最全論文體系！

linux CPU查看和清緩存

HTML5學習之FileReader接口

單點登錄的三種實現方式

commons-io之FileUtils、IOUtils

權限管理--常見模塊設計

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結