Spark中executor-memory參數詳解

原創

wisgood

2020-02-24 20:03

我們知道，spark執行的時候，可以通過 --executor-memory 來設置executor執行時所需的memory。但如果設置的過大，程序是會報錯的，如下

那麼這個值最大能設置多少呢？本文來分析一下。
文中安裝的是Spark1.6.1，安裝在hadoop2.7上。

1、相關的2個參數

1.1 yarn.scheduler.maximum-allocation-mb

這個參數表示每個container能夠申請到的最大內存，一般是集羣統一配置。Spark中的executor進程是跑在container中，所以container的最大內存會直接影響到executor的最大可用內存。當你設置一個比較大的內存時，日誌中會報錯，同時會打印這個參數的值。如下圖，6144MB，即6G。

1.2 spark.yarn.executor.memoryOverhead

executor執行的時候，用的內存可能會超過executor-memoy，所以會爲executor額外預留一部分內存。spark.yarn.executor.memoryOverhead代表了這部分內存。這個參數如果沒有設置，會有一個自動計算公式(位於ClientArguments.scala中)，代碼如下：

其中，MEMORY_OVERHEAD_FACTOR默認爲0.1，executorMemory爲設置的executor-memory, MEMORY_OVERHEAD_MIN默認爲384m。參數MEMORY_OVERHEAD_FACTOR和MEMORY_OVERHEAD_MIN一般不能直接修改，是Spark代碼中直接寫死的。

2、executor-memory計算

計算公式：

  val executorMem = args.executorMemory + executorMemoryOverhead

假設executor-爲X（整數，單位爲M），即
1）如果沒有設置spark.yarn.executor.memoryOverhead,

executorMem= X+max(X*0.1,384)

2）如果設置了spark.yarn.executor.memoryOverhead（整數，單位是M）

executorMem=X +spark.yarn.executor.memoryOverhead

需要滿足的條件：

executorMem< yarn.scheduler.maximum-allocation-mb

注意：以上代碼位於Client.scala中。
本例中 :

6144=X+max(X*0.1,384) 
X=5585.45

向上取整爲5586M，即最大能設置5586M內存。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Spark中executor-memory參數詳解

1、相關的2個參數

1.1 yarn.scheduler.maximum-allocation-mb

1.2 spark.yarn.executor.memoryOverhead

2、executor-memory計算

MySQL 核心模塊揭祕 | 18 期 | 鎖在內存里長什麼樣*

使用perf工具生成火焰圖

大齡程序員思考

響應式界面控件DevExtreme * 更強的數據分析和可視化功能

HttpSecurity 是如何組裝過濾器鏈的

數說海南——近6年海南各市縣人口簡單看

長序列中Transformers的高級注意力機制總結

WebStorm 創建 Vue 項目

hadoop中查找某個字符串所在的hdfs位置

spark讀取kafka兩種方式的區別

hadoop節點字符編碼導致的reduce重複記錄問題排查

論SparkStreaming的數據可靠性和一致性

Kafka Mirror Maker Best Practices

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結