spark內存管理

概述

spark執行應用程序的時候,會啓動Driver和executor兩種JVM進程,Driver進程負責創建SparkContext上下文,提交任務,分發task等;Executor進程負責執行task,並返回結果給Driver以及提供RDD持久化所需的內存。我們所說的Spark內存管理是指Executor的內存管理。

  1. Executor內存管理分爲兩種
  • 靜態內存管理 Spark1.6之前默認的
    顧名思義,這種內存管理方式下的存儲內存、執行內存和其他內存的大小在Spark應用程序運行之間都是固定的,但是可以在應用程序啓動之前進行配置
  • 統一(動態)內存管理 Spark1.6之後默認的
    與靜態內存管理的區別在於存儲內存和執行內存共享一塊內存,可以借用彼此的空間,可以通過設置spark.memory.useLegacyMode參數爲true來啓動靜態內存管理

源碼(2.4.4)

SparkEnv.scala


val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false)
    val memoryManager: MemoryManager =
      if (useLegacyMemoryManager) {
        new StaticMemoryManager(conf, numUsableCores)
      } else {
        UnifiedMemoryManager(conf, numUsableCores)
      }

從這段源碼也可以看出,現在默認使用的是統一內存管理,但是可以通過參數spark.memory.useLegacyMode來啓用靜態內存管理

靜態內存管理

  1. 源碼
StaticMemoryManager.scala

private[spark] object StaticMemoryManager {

  private val MIN_MEMORY_BYTES = 32 * 1024 * 1024

  /**
   * Return the total amount of memory available for the storage region, in bytes.
   */
  private def getMaxStorageMemory(conf: SparkConf): Long = {
    val systemMaxMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)
    val memoryFraction = conf.getDouble("spark.storage.memoryFraction", 0.6)
    val safetyFraction = conf.getDouble("spark.storage.safetyFraction", 0.9)
    (systemMaxMemory * memoryFraction * safetyFraction).toLong
  }

  /**
   * Return the total amount of memory available for the execution region, in bytes.
   */
  private def getMaxExecutionMemory(conf: SparkConf): Long = {
    val systemMaxMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)

    if (systemMaxMemory < MIN_MEMORY_BYTES) {
      throw new IllegalArgumentException(s"System memory $systemMaxMemory must " +
        s"be at least $MIN_MEMORY_BYTES. Please increase heap size using the --driver-memory " +
        s"option or spark.driver.memory in Spark configuration.")
    }
    if (conf.contains("spark.executor.memory")) {
      val executorMemory = conf.getSizeAsBytes("spark.executor.memory")
      if (executorMemory < MIN_MEMORY_BYTES) {
        throw new IllegalArgumentException(s"Executor memory $executorMemory must be at least " +
          s"$MIN_MEMORY_BYTES. Please increase executor memory using the " +
          s"--executor-memory option or spark.executor.memory in Spark configuration.")
      }
    }
    val memoryFraction = conf.getDouble("spark.shuffle.memoryFraction", 0.2)
    val safetyFraction = conf.getDouble("spark.shuffle.safetyFraction", 0.8)
    (systemMaxMemory * memoryFraction * safetyFraction).toLong
  }

}
  1. 分析
  • getMaxStorageMemory()

該方法用於計算分配給storage的內存大小;通過計算得到分配給storage的默認內存大小爲

systemMaxMemory * 0.6 *0.9

佔比爲54%

  • getMaxExecutionMemory()
    該方法用於計算分配給execution的內存大小;通過計算得到分配給execution的默認內存大小爲

systemMaxMemory * 0.2 * 0.8

佔比爲16%

統一內存管理

  1. 源碼
object UnifiedMemoryManager {

  // Set aside a fixed amount of memory for non-storage, non-execution purposes.
  // This serves a function similar to `spark.memory.fraction`, but guarantees that we reserve
  // sufficient memory for the system even for small heaps. E.g. if we have a 1GB JVM, then
  // the memory used for execution and storage will be (1024 - 300) * 0.6 = 434MB by default.
  private val RESERVED_SYSTEM_MEMORY_BYTES = 300 * 1024 * 1024

  def apply(conf: SparkConf, numCores: Int): UnifiedMemoryManager = {
    val maxMemory = getMaxMemory(conf)
    new UnifiedMemoryManager(
      conf,
      maxHeapMemory = maxMemory,
      onHeapStorageRegionSize =
        (maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong,
      numCores = numCores)
  }

  /**
   * Return the total amount of memory shared between execution and storage, in bytes.
   */
  private def getMaxMemory(conf: SparkConf): Long = {
    val systemMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)
    val reservedMemory = conf.getLong("spark.testing.reservedMemory",
      if (conf.contains("spark.testing")) 0 else RESERVED_SYSTEM_MEMORY_BYTES)
    val minSystemMemory = (reservedMemory * 1.5).ceil.toLong
    if (systemMemory < minSystemMemory) {
      throw new IllegalArgumentException(s"System memory $systemMemory must " +
        s"be at least $minSystemMemory. Please increase heap size using the --driver-memory " +
        s"option or spark.driver.memory in Spark configuration.")
    }
    // SPARK-12759 Check executor memory to fail fast if memory is insufficient
    if (conf.contains("spark.executor.memory")) {
      val executorMemory = conf.getSizeAsBytes("spark.executor.memory")
      if (executorMemory < minSystemMemory) {
        throw new IllegalArgumentException(s"Executor memory $executorMemory must be at least " +
          s"$minSystemMemory. Please increase executor memory using the " +
          s"--executor-memory option or spark.executor.memory in Spark configuration.")
      }
    }
    val usableMemory = systemMemory - reservedMemory
    val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6)
    (usableMemory * memoryFraction).toLong
  }
}
  1. 分析
  • 首先計算出分配給storageexecution的總的內存的大小,爲總內存減去預留內存之後的60%,剩下40%的可用內存爲元數據以及用戶數據結構等使用
  • 默認storeageexecution各佔50%,但是並不是固定的,在後面的實際使用的,他們可以互相佔用彼此的內存,
    execution佔用的內存,storage不可以強行使用,直到execution釋放之後;但是storage已經使用的內存,當execution要使用的時候可以強行搶佔過來使用,但是不會將storage所有的內存都佔用,會留下一部分(spark.memory.storageFraction);這樣設計的原因是:storage存儲的東西丟了還可以再找回來,但是execution產生的東西丟了將會影響到後續的計算

注意

通過Runtime.getRuntime.maxMemory拿到的內存大小,並不等於通過--executor-memory設置的值,而是小於它,且不同的JDK版本得到的值可能不一樣

發佈了44 篇原創文章 · 獲贊 8 · 訪問量 9192
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章