Spark技術內幕: Shuffle詳解(二)

本文主要關注ShuffledRDD的Shuffle Read是如何從其他的node上讀取數據的。

上文講到了獲取如何獲取的策略都在org.apache.spark.storage.BlockFetcherIterator.BasicBlockFetcherIterator#splitLocalRemoteBlocks中。可以見註釋。

    protected def splitLocalRemoteBlocks(): ArrayBuffer[FetchRequest] = {
      // Make remote requests at most maxBytesInFlight / 5 in length; the reason to keep them
      // smaller than maxBytesInFlight is to allow multiple, parallel fetches from up to 5
      // nodes, rather than blocking on reading output from one node.
      // 爲了快速的得到數據,每次都會啓動5個線程去最多5個node上取數據;
      // 每次請求的數據不會超過spark.reducer.maxMbInFlight(默認值爲48MB) / 5。
      // 這樣做的原因有幾個:
      // 1. 避免佔用目標機器的過多帶寬,在千兆網卡爲主流的今天,帶寬還是比較重要的。
      //    如果一個連接將要佔用48M的帶寬,這個Network IO可能會成爲瓶頸。
      // 2. 請求數據可以平行化,這樣請求數據的時間可以大大減少。請求數據的總時間就是那個請求最長的。
      //    如果不是並行請求,那麼總時間將是所有的請求時間之和。
      // 而設置spark.reducer.maxMbInFlight,也是爲了不要佔用過多的內存
      val targetRequestSize = math.max(maxBytesInFlight / 5, 1L)
      logInfo("maxBytesInFlight: " + maxBytesInFlight + ", targetRequestSize: " + targetRequestSize)

      // Split local and remote blocks. Remote blocks are further split into FetchRequests of size
      // at most maxBytesInFlight in order to limit the amount of data in flight.
      val remoteRequests = new ArrayBuffer[FetchRequest]
      var totalBlocks = 0
      for ((address, blockInfos) <- blocksByAddress) { //  address實際上是executor_id
        totalBlocks += blockInfos.size
        if (address == blockManagerId) { //數據在本地,那麼直接走local read
          // Filter out zero-sized blocks
          localBlocksToFetch ++= blockInfos.filter(_._2 != 0).map(_._1)
          _numBlocksToFetch += localBlocksToFetch.size
        } else {
          val iterator = blockInfos.iterator
          var curRequestSize = 0L
          var curBlocks = new ArrayBuffer[(BlockId, Long)]
          while (iterator.hasNext) {
          // blockId 是org.apache.spark.storage.ShuffleBlockId,
          // 格式:"shuffle_" + shuffleId + "_" + mapId + "_" + reduceId
            val (blockId, size) = iterator.next()
            // Skip empty blocks
            if (size > 0) { //過濾掉爲大小爲0的文件
              curBlocks += ((blockId, size))
              remoteBlocksToFetch += blockId
              _numBlocksToFetch += 1
              curRequestSize += size
            } else if (size < 0) {
              throw new BlockException(blockId, "Negative block size " + size)
            }
            if (curRequestSize >= targetRequestSize) { // 避免一次請求的數據量過大
              // Add this FetchRequest
              remoteRequests += new FetchRequest(address, curBlocks)
              curBlocks = new ArrayBuffer[(BlockId, Long)]
              logDebug(s"Creating fetch request of $curRequestSize at $address")
              curRequestSize = 0
            }
          }
          // Add in the final request
          if (!curBlocks.isEmpty) { // 將剩餘的請求放到最後一個request中。
            remoteRequests += new FetchRequest(address, curBlocks)
          }
        }
      }
      logInfo("Getting " + _numBlocksToFetch + " non-empty blocks out of " +
        totalBlocks + " blocks")
      remoteRequests
    }


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章