詳細探究Spark的shuffle實現

Background

在MapReduce框架中,shuffle是連接Map和Reduce之間的橋樑,Map的輸出要用到Reduce中必須經過shuffle這個環節,shuffle的性能高低直接影響了整個程序的性能和吞吐量。Spark作爲MapReduce框架的一種實現,自然也實現了shuffle的邏輯,本文就深入研究Spark的shuffle是如何實現的,有什麼優缺點,與Hadoop MapReduce的shuffle有什麼不同。

Shuffle

Shuffle是MapReduce框架中的一個特定的phase,介於Map phase和Reduce phase之間,當Map的輸出結果要被Reduce使用時,輸出結果需要按key哈希,並且分發到每一個Reducer上去,這個過程就是shuffle。由於shuffle涉及到了磁盤的讀寫和網絡的傳輸,因此shuffle性能的高低直接影響到了整個程序的運行效率。

下面這幅圖清晰地描述了MapReduce算法的整個流程,其中shuffle phase是介於Map phase和Reduce phase之間。

mapreduce running process

概念上shuffle就是一個溝通數據連接的橋樑,那麼實際上shuffle這一部分是如何實現的的呢,下面我們就以Spark爲例講一下shuffle在Spark中的實現。

Spark Shuffle進化史

先以圖爲例簡單描述一下Spark中shuffle的整一個流程:

spark shuffle process

  • 首先每一個Mapper會根據Reducer的數量創建出相應的bucket,bucket的數量是M×R,其中M是Map的個數,R是Reduce的個數。
  • 其次Mapper產生的結果會根據設置的partition算法填充到每個bucket中去。這裏的partition算法是可以自定義的,當然默認的算法是根據key哈希到不同的bucket中去。
  • 當Reducer啓動時,它會根據自己task的id和所依賴的Mapper的id從遠端或是本地的block manager中取得相應的bucket作爲Reducer的輸入進行處理。

這裏的bucket是一個抽象概念,在實現中每個bucket可以對應一個文件,可以對應文件的一部分或是其他等。

接下來我們分別從shuffle writeshuffle fetch這兩塊來講述一下Spark的shuffle進化史。

Shuffle Write

在Spark 0.6和0.7的版本中,對於shuffle數據的存儲是以文件的方式存儲在block manager中,與rdd.persist(StorageLevel.DISk_ONLY)採取相同的策略,可以參看:

<ol class="linenums" style="padding: 0px; margin: 0px 0px 0px 25px; color: rgb(174, 174, 174);"><li class="L0" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">override</span><span class="pln"> </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">def</span><span class="pln"> run</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">attemptId</span><span class="pun" style="color: rgb(255, 255, 255);">:</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Long</span><span class="pun" style="color: rgb(255, 255, 255);">):</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">MapStatus</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span></code></li><li class="L1" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">  val numOutputSplits </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> dep</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">partitioner</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">numPartitions</span></code></li><li class="L2" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">   </span></code></li><li class="L3" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">  </span><span class="pun" style="color: rgb(255, 255, 255);">...</span></code></li><li class="L4"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="com" style="color: rgb(135, 206, 235);">// Partition the map output.</span></code></li><li class="L5" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    val buckets </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Array</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">fill</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">numOutputSplits</span><span class="pun" style="color: rgb(255, 255, 255);">)(</span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">new</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">ArrayBuffer</span><span class="pun" style="color: rgb(255, 255, 255);">[(</span><span class="typ" style="color: rgb(152, 251, 152);">Any</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Any</span><span class="pun" style="color: rgb(255, 255, 255);">)])</span></code></li><li class="L6" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">for</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">elem </span><span class="pun" style="color: rgb(255, 255, 255);"><-</span><span class="pln"> rdd</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">iterator</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">split</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> taskContext</span><span class="pun" style="color: rgb(255, 255, 255);">))</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span></code></li><li class="L7" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      val pair </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> elem</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">asInstanceOf</span><span class="pun" style="color: rgb(255, 255, 255);">[(</span><span class="typ" style="color: rgb(152, 251, 152);">Any</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Any</span><span class="pun" style="color: rgb(255, 255, 255);">)]</span></code></li><li class="L8" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      val bucketId </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> dep</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">partitioner</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">getPartition</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">pair</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">_1</span><span class="pun" style="color: rgb(255, 255, 255);">)</span></code></li><li class="L9"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      buckets</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">bucketId</span><span class="pun" style="color: rgb(255, 255, 255);">)</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">+=</span><span class="pln"> pair</span></code></li><li class="L0" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li><li class="L1" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"></code></li><li class="L2" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="pun" style="color: rgb(255, 255, 255);">...</span></code></li><li class="L3" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"></code></li><li class="L4"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    val blockManager </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">SparkEnv</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">get</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">blockManager</span></code></li><li class="L5" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">for</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">i </span><span class="pun" style="color: rgb(255, 255, 255);"><-</span><span class="pln"> </span><span class="lit" style="color: rgb(205, 92, 92);">0</span><span class="pln"> </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">until</span><span class="pln"> numOutputSplits</span><span class="pun" style="color: rgb(255, 255, 255);">)</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span></code></li><li class="L6" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      val blockId </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="str" style="color: rgb(255, 160, 160);">"shuffle_"</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">+</span><span class="pln"> dep</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">shuffleId </span><span class="pun" style="color: rgb(255, 255, 255);">+</span><span class="pln"> </span><span class="str" style="color: rgb(255, 160, 160);">"_"</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">+</span><span class="pln"> partition </span><span class="pun" style="color: rgb(255, 255, 255);">+</span><span class="pln"> </span><span class="str" style="color: rgb(255, 160, 160);">"_"</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">+</span><span class="pln"> i</span></code></li><li class="L7" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      </span><span class="com" style="color: rgb(135, 206, 235);">// Get a Scala iterator from Java map</span></code></li><li class="L8" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      val iter</span><span class="pun" style="color: rgb(255, 255, 255);">:</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Iterator</span><span class="pun" style="color: rgb(255, 255, 255);">[(</span><span class="typ" style="color: rgb(152, 251, 152);">Any</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Any</span><span class="pun" style="color: rgb(255, 255, 255);">)]</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> buckets</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">i</span><span class="pun" style="color: rgb(255, 255, 255);">).</span><span class="pln">iterator</span></code></li><li class="L9"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      val size </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> blockManager</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">put</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">blockId</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> iter</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">StorageLevel</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">DISK_ONLY</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">false</span><span class="pun" style="color: rgb(255, 255, 255);">)</span></code></li><li class="L0" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      totalBytes </span><span class="pun" style="color: rgb(255, 255, 255);">+=</span><span class="pln"> size</span></code></li><li class="L1" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li><li class="L2" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">  </span><span class="pun" style="color: rgb(255, 255, 255);">...</span></code></li><li class="L3" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li></ol>

我已經將一些干擾代碼刪去。可以看到Spark在每一個Mapper中爲每個Reducer創建一個bucket,並將RDD計算結果放進bucket中。需要注意的是每個bucket是一個ArrayBuffer,也就是說Map的輸出結果是會先存儲在內存。

接着Spark會將ArrayBuffer中的Map輸出結果寫入block manager所管理的磁盤中,這裏文件的命名方式爲:shuffle_ + shuffle_id + "_" + map partition id + "_" + shuffle partition id

早期的shuffle write有兩個比較大的問題:

  1. Map的輸出必須先全部存儲到內存中,然後寫入磁盤。這對內存是一個非常大的開銷,當內存不足以存儲所有的Map output時就會出現OOM。
  2. 每一個Mapper都會產生Reducer number個shuffle文件,如果Mapper個數是1k,Reducer個數也是1k,那麼就會產生1M個shuffle文件,這對於文件系統是一個非常大的負擔。同時在shuffle數據量不大而shuffle文件又非常多的情況下,隨機寫也會嚴重降低IO的性能。

在Spark 0.8版本中,shuffle write採用了與RDD block write不同的方式,同時也爲shuffle write單獨創建了ShuffleBlockManager,部分解決了0.6和0.7版本中遇到的問題。

首先我們來看一下Spark 0.8的具體實現:

<ol class="linenums" style="padding: 0px; margin: 0px 0px 0px 25px; color: rgb(174, 174, 174);"><li class="L0" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">override</span><span class="pln"> </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">def</span><span class="pln"> run</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">attemptId</span><span class="pun" style="color: rgb(255, 255, 255);">:</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Long</span><span class="pun" style="color: rgb(255, 255, 255);">):</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">MapStatus</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span></code></li><li class="L1" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"></code></li><li class="L2" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">  </span><span class="pun" style="color: rgb(255, 255, 255);">...</span></code></li><li class="L3" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"></code></li><li class="L4"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">  val blockManager </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">SparkEnv</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">get</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">blockManager</span></code></li><li class="L5" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">  </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">var</span><span class="pln"> shuffle</span><span class="pun" style="color: rgb(255, 255, 255);">:</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">ShuffleBlocks</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">null</span></code></li><li class="L6" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">  </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">var</span><span class="pln"> buckets</span><span class="pun" style="color: rgb(255, 255, 255);">:</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">ShuffleWriterGroup</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">null</span></code></li><li class="L7" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"></code></li><li class="L8" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">  </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">try</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span></code></li><li class="L9"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="com" style="color: rgb(135, 206, 235);">// Obtain all the block writers for shuffle blocks.</span></code></li><li class="L0" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    val ser </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">SparkEnv</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">get</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">serializerManager</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">get</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">dep</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">serializerClass</span><span class="pun" style="color: rgb(255, 255, 255);">)</span></code></li><li class="L1" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    shuffle </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> blockManager</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">shuffleBlockManager</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">forShuffle</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">dep</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">shuffleId</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> numOutputSplits</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> ser</span><span class="pun" style="color: rgb(255, 255, 255);">)</span></code></li><li class="L2" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    buckets </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> shuffle</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">acquireWriters</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">partition</span><span class="pun" style="color: rgb(255, 255, 255);">)</span></code></li><li class="L3" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"></code></li><li class="L4"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="com" style="color: rgb(135, 206, 235);">// Write the map output to its associated buckets.</span></code></li><li class="L5" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">for</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">elem </span><span class="pun" style="color: rgb(255, 255, 255);"><-</span><span class="pln"> rdd</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">iterator</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">split</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> taskContext</span><span class="pun" style="color: rgb(255, 255, 255);">))</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span></code></li><li class="L6" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      val pair </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> elem</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">asInstanceOf</span><span class="pun" style="color: rgb(255, 255, 255);">[</span><span class="typ" style="color: rgb(152, 251, 152);">Product2</span><span class="pun" style="color: rgb(255, 255, 255);">[</span><span class="typ" style="color: rgb(152, 251, 152);">Any</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Any</span><span class="pun" style="color: rgb(255, 255, 255);">]]</span></code></li><li class="L7" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      val bucketId </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> dep</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">partitioner</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">getPartition</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">pair</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">_1</span><span class="pun" style="color: rgb(255, 255, 255);">)</span></code></li><li class="L8" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      buckets</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">writers</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">bucketId</span><span class="pun" style="color: rgb(255, 255, 255);">).</span><span class="pln">write</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">pair</span><span class="pun" style="color: rgb(255, 255, 255);">)</span></code></li><li class="L9"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li><li class="L0" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"></code></li><li class="L1" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="com" style="color: rgb(135, 206, 235);">// Commit the writes. Get the size of each bucket block (total block size).</span></code></li><li class="L2" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">var</span><span class="pln"> totalBytes </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="lit" style="color: rgb(205, 92, 92);">0L</span></code></li><li class="L3" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    val compressedSizes</span><span class="pun" style="color: rgb(255, 255, 255);">:</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Array</span><span class="pun" style="color: rgb(255, 255, 255);">[</span><span class="typ" style="color: rgb(152, 251, 152);">Byte</span><span class="pun" style="color: rgb(255, 255, 255);">]</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> buckets</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">writers</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">map </span><span class="pun" style="color: rgb(255, 255, 255);">{</span><span class="pln"> writer</span><span class="pun" style="color: rgb(255, 255, 255);">:</span><span class="pln">   </span><span class="typ" style="color: rgb(152, 251, 152);">BlockObjectWriter</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">=></span></code></li><li class="L4"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      writer</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">commit</span><span class="pun" style="color: rgb(255, 255, 255);">()</span></code></li><li class="L5" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      writer</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">close</span><span class="pun" style="color: rgb(255, 255, 255);">()</span></code></li><li class="L6" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      val size </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> writer</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">size</span><span class="pun" style="color: rgb(255, 255, 255);">()</span></code></li><li class="L7" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      totalBytes </span><span class="pun" style="color: rgb(255, 255, 255);">+=</span><span class="pln"> size</span></code></li><li class="L8" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      </span><span class="typ" style="color: rgb(152, 251, 152);">MapOutputTracker</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">compressSize</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">size</span><span class="pun" style="color: rgb(255, 255, 255);">)</span></code></li><li class="L9"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li><li class="L0" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"></code></li><li class="L1" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="pun" style="color: rgb(255, 255, 255);">...</span></code></li><li class="L2" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"></code></li><li class="L3" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">  </span><span class="pun" style="color: rgb(255, 255, 255);">}</span><span class="pln"> </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">catch</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span><span class="pln"> </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">case</span><span class="pln"> e</span><span class="pun" style="color: rgb(255, 255, 255);">:</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Exception</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">=></span></code></li><li class="L4"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="com" style="color: rgb(135, 206, 235);">// If there is an exception from running the task, revert the partial writes</span></code></li><li class="L5" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="com" style="color: rgb(135, 206, 235);">// and throw the exception upstream to Spark.</span></code></li><li class="L6" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">if</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">buckets </span><span class="pun" style="color: rgb(255, 255, 255);">!=</span><span class="pln"> </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">null</span><span class="pun" style="color: rgb(255, 255, 255);">)</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span></code></li><li class="L7" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      buckets</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">writers</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">foreach</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">_</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">revertPartialWrites</span><span class="pun" style="color: rgb(255, 255, 255);">())</span></code></li><li class="L8" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li><li class="L9"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">throw</span><span class="pln"> e</span></code></li><li class="L0" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">  </span><span class="pun" style="color: rgb(255, 255, 255);">}</span><span class="pln"> </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">finally</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span></code></li><li class="L1" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="com" style="color: rgb(135, 206, 235);">// Release the writers back to the shuffle block manager.</span></code></li><li class="L2" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">if</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">shuffle </span><span class="pun" style="color: rgb(255, 255, 255);">!=</span><span class="pln"> </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">null</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">&&</span><span class="pln"> buckets </span><span class="pun" style="color: rgb(255, 255, 255);">!=</span><span class="pln"> </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">null</span><span class="pun" style="color: rgb(255, 255, 255);">)</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span></code></li><li class="L3" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      shuffle</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">releaseWriters</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">buckets</span><span class="pun" style="color: rgb(255, 255, 255);">)</span></code></li><li class="L4"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li><li class="L5" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="com" style="color: rgb(135, 206, 235);">// Execute the callbacks on task completion.</span></code></li><li class="L6" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    taskContext</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">executeOnCompleteCallbacks</span><span class="pun" style="color: rgb(255, 255, 255);">()</span></code></li><li class="L7" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li><li class="L8" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">  </span><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li><li class="L9"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li></ol>

在這個版本中爲shuffle write添加了一個新的類ShuffleBlockManager,由ShuffleBlockManager來分配和管理bucket。同時ShuffleBlockManager爲每一個bucket分配一個DiskObjectWriter,每個write handler擁有默認100KB的緩存,使用這個write handler將Map output寫入文件中。可以看到現在的寫入方式變爲buckets.writers(bucketId).write(pair),也就是說Map output的key-value pair是逐個寫入到磁盤而不是預先把所有數據存儲在內存中在整體flush到磁盤中去。

ShuffleBlockManager的代碼如下所示:

<ol class="linenums" style="padding: 0px; margin: 0px 0px 0px 25px; color: rgb(174, 174, 174);"><li class="L0" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">private</span><span class="pun" style="color: rgb(255, 255, 255);">[</span><span class="pln">spark</span><span class="pun" style="color: rgb(255, 255, 255);">]</span></code></li><li class="L1" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">class</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">ShuffleBlockManager</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">blockManager</span><span class="pun" style="color: rgb(255, 255, 255);">:</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">BlockManager</span><span class="pun" style="color: rgb(255, 255, 255);">)</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span></code></li><li class="L2" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"></code></li><li class="L3" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">  </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">def</span><span class="pln"> forShuffle</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">shuffleId</span><span class="pun" style="color: rgb(255, 255, 255);">:</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Int</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> numBuckets</span><span class="pun" style="color: rgb(255, 255, 255);">:</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Int</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> serializer</span><span class="pun" style="color: rgb(255, 255, 255);">:</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Serializer</span><span class="pun" style="color: rgb(255, 255, 255);">):</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">ShuffleBlocks</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span></code></li><li class="L4"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">new</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">ShuffleBlocks</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span></code></li><li class="L5" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      </span><span class="com" style="color: rgb(135, 206, 235);">// Get a group of writers for a map task.</span></code></li><li class="L6" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">override</span><span class="pln"> </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">def</span><span class="pln"> acquireWriters</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">mapId</span><span class="pun" style="color: rgb(255, 255, 255);">:</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Int</span><span class="pun" style="color: rgb(255, 255, 255);">):</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">ShuffleWriterGroup</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span></code></li><li class="L7" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">        val bufferSize </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">System</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">getProperty</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="str" style="color: rgb(255, 160, 160);">"spark.shuffle.file.buffer.kb"</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> </span><span class="str" style="color: rgb(255, 160, 160);">"100"</span><span class="pun" style="color: rgb(255, 255, 255);">).</span><span class="pln">toInt </span><span class="pun" style="color: rgb(255, 255, 255);">*</span><span class="pln"> </span><span class="lit" style="color: rgb(205, 92, 92);">1024</span></code></li><li class="L8" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">        val writers </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">Array</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">tabulate</span><span class="pun" style="color: rgb(255, 255, 255);">[</span><span class="typ" style="color: rgb(152, 251, 152);">BlockObjectWriter</span><span class="pun" style="color: rgb(255, 255, 255);">](</span><span class="pln">numBuckets</span><span class="pun" style="color: rgb(255, 255, 255);">)</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span><span class="pln"> bucketId </span><span class="pun" style="color: rgb(255, 255, 255);">=></span></code></li><li class="L9"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">          val blockId </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">ShuffleBlockManager</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">blockId</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">shuffleId</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> bucketId</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> mapId</span><span class="pun" style="color: rgb(255, 255, 255);">)</span></code></li><li class="L0" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">          blockManager</span><span class="pun" style="color: rgb(255, 255, 255);">.</span><span class="pln">getDiskBlockWriter</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">blockId</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> serializer</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> bufferSize</span><span class="pun" style="color: rgb(255, 255, 255);">)</span></code></li><li class="L1" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">        </span><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li><li class="L2" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">        </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">new</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">ShuffleWriterGroup</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="pln">mapId</span><span class="pun" style="color: rgb(255, 255, 255);">,</span><span class="pln"> writers</span><span class="pun" style="color: rgb(255, 255, 255);">)</span></code></li><li class="L3" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      </span><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li><li class="L4"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"></code></li><li class="L5" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">override</span><span class="pln"> </span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">def</span><span class="pln"> releaseWriters</span><span class="pun" style="color: rgb(255, 255, 255);">(</span><span class="kwd" style="color: rgb(240, 230, 140); font-weight: bold;">group</span><span class="pun" style="color: rgb(255, 255, 255);">:</span><span class="pln"> </span><span class="typ" style="color: rgb(152, 251, 152);">ShuffleWriterGroup</span><span class="pun" style="color: rgb(255, 255, 255);">)</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">=</span><span class="pln"> </span><span class="pun" style="color: rgb(255, 255, 255);">{</span></code></li><li class="L6" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">        </span><span class="com" style="color: rgb(135, 206, 235);">// Nothing really to release here.</span></code></li><li class="L7" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">      </span><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li><li class="L8" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">    </span><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li><li class="L9"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pln">  </span><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li><li class="L0" style="list-style-type: decimal;"><code style="padding: 0px; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12px; color: inherit; background-color: transparent; border: 0px;"><span class="pun" style="color: rgb(255, 255, 255);">}</span></code></li></ol>

Spark 0.8顯著減少了shuffle的內存壓力,現在Map output不需要先全部存儲在內存中,再flush到硬盤,而是record-by-record寫入到磁盤中。同時對於shuffle文件的管理也獨立出新的ShuffleBlockManager進行管理,而不是與rdd cache文件在一起了。

但是這一版Spark 0.8的shuffle write仍然有兩個大的問題沒有解決:

  • 首先依舊是shuffle文件過多的問題,shuffle文件過多一是會造成文件系統的壓力過大,二是會降低IO的吞吐量。
  • 其次雖然Map output數據不再需要預先在內存中evaluate顯著減少了內存壓力,但是新引入的DiskObjectWriter所帶來的buffer開銷也是一個不容小視的內存開銷。假定我們有1k個Mapper和1k個Reducer,那麼就會有1M個bucket,於此同時就會有1M個write handler,而每一個write handler默認需要100KB內存,那麼總共需要100GB的內存。這樣的話僅僅是buffer就需要這麼多的內存,內存的開銷是驚人的。當然實際情況下這1k個Mapper是分時運行的話,所需的內存就只有cores * reducer numbers * 100KB大小了。但是reducer數量很多的話,這個buffer的內存開銷也是蠻厲害的。

爲了解決shuffle文件過多的情況,Spark 0.8.1引入了新的shuffle consolidation,以期顯著減少shuffle文件的數量。

首先我們以圖例來介紹一下shuffle consolidation的原理。

spark shuffle  consolidation process

假定該job有4個Mapper和4個Reducer,有2個core,也就是能並行運行兩個task。我們可以算出Spark的shuffle write共需要16個bucket,也就有了16個write handler。在之前的Spark版本中,每一個bucket對應的是一個文件,因此在這裏會產生16個shuffle文件。

而在shuffle consolidation中每一個bucket並非對應一個文件,而是對應文件中的一個segment,同時shuffle consolidation所產生的shuffle文件數量與Spark core的個數也有關係。在上面的圖例中,job的4個Mapper分爲兩批運行,在第一批2個Mapper運行時會申請8個bucket,產生8個shuffle文件;而在第二批Mapper運行時,申請的8個bucket並不會再產生8個新的文件,而是追加寫到之前的8個文件後面,這樣一共就只有8個shuffle文件,而在文件內部這有16個不同的segment。因此從理論上講shuffle consolidation所產生的shuffle文件數量爲C×R,其中C是Spark集羣的core number,R是Reducer的個數。

需要注意的是當 M=C時shuffle consolidation所產生的文件數和之前的實現是一樣的。

Shuffle consolidation顯著減少了shuffle文件的數量,解決了之前版本一個比較嚴重的問題,但是writer handler的buffer開銷過大依然沒有減少,若要減少writer handler的buffer開銷,我們只能減少Reducer的數量,但是這又會引入新的問題,下文將會有詳細介紹。

講完了shuffle write的進化史,接下來要講一下shuffle fetch了,同時還要講一下Spark的aggregator,這一塊對於Spark實際應用的性能至關重要。

Shuffle Fetch and Aggregator

Shuffle write寫出去的數據要被Reducer使用,就需要shuffle fetcher將所需的數據fetch過來,這裏的fetch包括本地和遠端,因爲shuffle數據有可能一部分是存儲在本地的。Spark對shuffle fetcher實現了兩套不同的框架:NIO通過socket連接去fetch數據;OIO通過netty server去fetch數據。分別對應的類是BasicBlockFetcherIteratorNettyBlockFetcherIterator

在Spark 0.7和更早的版本中,只支持BasicBlockFetcherIterator,而BasicBlockFetcherIterator在shuffle數據量比較大的情況下performance始終不是很好,無法充分利用網絡帶寬,爲了解決這個問題,添加了新的shuffle fetcher來試圖取得更好的性能。對於早期shuffle性能的評測可以參看Spark usergroup。當然現在BasicBlockFetcherIterator的性能也已經好了很多,使用的時候可以對這兩種實現都進行測試比較。

接下來說一下aggregator。我們都知道在Hadoop MapReduce的shuffle過程中,shuffle fetch過來的數據會進行merge sort,使得相同key下的不同value按序歸併到一起供Reducer使用,這個過程可以參看下圖:

mapreduce shuffle process

所有的merge sort都是在磁盤上進行的,有效地控制了內存的使用,但是代價是更多的磁盤IO。

那麼Spark是否也有merge sort呢,還是以別的方式實現,下面我們就細細說明。

首先雖然Spark屬於MapReduce體系,但是對傳統的MapReduce算法進行了一定的改變。Spark假定在大多數用戶的case中,shuffle數據的sort不是必須的,比如word count,強制地進行排序只會使性能變差,因此Spark並不在Reducer端做merge sort。既然沒有merge sort那Spark是如何進行reduce的呢?這就要說到aggregator了。

aggregator本質上是一個hashmap,它是以map output的key爲key,以任意所要combine的類型爲value的hashmap。當我們在做word count reduce計算count值的時候,它會將shuffle fetch到的每一個key-value pair更新或是插入到hashmap中(若在hashmap中沒有查找到,則插入其中;若查找到則更新value值)。這樣就不需要預先把所有的key-value進行merge sort,而是來一個處理一個,省下了外部排序這一步驟。但同時需要注意的是reducer的內存必須足以存放這個partition的所有key和count值,因此對內存有一定的要求。

在上面word count的例子中,因爲value會不斷地更新,而不需要將其全部記錄在內存中,因此內存的使用還是比較少的。考慮一下如果是group by key這樣的操作,Reducer需要得到key對應的所有value。在Hadoop MapReduce中,由於有了merge sort,因此給予Reducer的數據已經是group by key了,而Spark沒有這一步,因此需要將key和對應的value全部存放在hashmap中,並將value合併成一個array。可以想象爲了能夠存放所有數據,用戶必須確保每一個partition足夠小到內存能夠容納,這對於內存是一個非常嚴峻的考驗。因此Spark文檔中建議用戶涉及到這類操作的時候儘量增加partition,也就是增加Mapper和Reducer的數量。

增加Mapper和Reducer的數量固然可以減小partition的大小,使得內存可以容納這個partition。但是我們在shuffle write中提到,bucket和對應於bucket的write handler是由Mapper和Reducer的數量決定的,task越多,bucket就會增加的更多,由此帶來write handler所需的buffer也會更多。在一方面我們爲了減少內存的使用採取了增加task數量的策略,另一方面task數量增多又會帶來buffer開銷更大的問題,因此陷入了內存使用的兩難境地。

爲了減少內存的使用,只能將aggregator的操作從內存移到磁盤上進行,Spark社區也意識到了Spark在處理數據規模遠遠大於內存大小時所帶來的問題。因此PR303提供了外部排序的實現方案,相信在Spark 0.9 release的時候,這個patch應該能merge進去,到時候內存的使用量可以顯著地減少。

End

本文詳細地介紹了Spark的shuffle實現是如何進化的,以及遇到問題解決問題的過程。shuffle作爲Spark程序中很重要的一個環節,直接影響了Spark程序的性能,現如今的Spark版本雖然shuffle實現還存在着種種問題,但是相比於早期版本,已經有了很大的進步。開源代碼就是如此不停地迭代推進,隨着Spark的普及程度越來越高,貢獻的人越來越多,相信後續的版本會有更大的提升。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章