Spark-2.4.0 源碼學習系列《通信框架》之Dispatcher

    Dispatcher是Spark通信框架中的消息分發器,會在NettyRpcEnv初始化的時候創建,NettyRpcEnv的初始化過程我們以後再談,現在我們先看一下Dispatcher初始化時都幹了哪些事。

  private val endpoints: ConcurrentMap[String, EndpointData] =
    new ConcurrentHashMap[String, EndpointData]
  private val endpointRefs: ConcurrentMap[RpcEndpoint, RpcEndpointRef] =
    new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef]

  // Track the receivers whose inboxes may contain messages.
  private val receivers = new LinkedBlockingQueue[EndpointData]

  /**
   * True if the dispatcher has been stopped. Once stopped, all messages posted will be bounced
   * immediately.
   */
  @GuardedBy("this")
  private var stopped = false

1. 創建一個endpoints:ConcurrentMap 用來存放EndPoint,key是EndPoint的名稱,value是EndPointData

2. 創建一個endpointRefs:ConcurrentMap 用來存放EndPoint的引用,key是RpcEndpoint自身,value是RpcEndpointRef

3. 創建一個receivers:LinkedBlockingQueue[EndpointData] 來存放Dispatcher接收到的EndpointData

4. 聲明標量 stopped 標記Dispatcher是否已停止,默認false

  /** Thread pool used for dispatching messages. */
  private val threadpool: ThreadPoolExecutor = {
    val availableCores =
      if (numUsableCores > 0) numUsableCores else Runtime.getRuntime.availableProcessors()
    val numThreads = nettyEnv.conf.getInt("spark.rpc.netty.dispatcher.numThreads",
      math.max(2, availableCores))
    val pool = ThreadUtils.newDaemonFixedThreadPool(numThreads, "dispatcher-event-loop")
    for (i <- 0 until numThreads) {
      pool.execute(new MessageLoop)
    }
    pool
  }

 5. 定義了一個線程池 threadpool:ThreadPoolExecutor,並調用 pool.execute(new MessageLoop),初始化了一個MessageLoop,我們來看下MessageLoop

 /** Message loop used for dispatching messages. */
  private class MessageLoop extends Runnable {
    override def run(): Unit = {
      try {
        while (true) {
          try {
            val data = receivers.take() // 從receivers中拿出一個EndpointData
            if (data == PoisonPill) { // 如果是毒藥丸就放回去(毒藥丸是空數據,代表隊列中沒有EndpointData)
              // Put PoisonPill back so that other MessageLoops can see it.
              receivers.offer(PoisonPill)
              return
            }
            data.inbox.process(Dispatcher.this) // 調用inbox的process方法
          } catch {
            case NonFatal(e) => logError(e.getMessage, e)
          }
        }
      } catch {
        case ie: InterruptedException => // exit
      }
    }
  }

  /** A poison endpoint that indicates MessageLoop should exit its message loop. */
  private val PoisonPill = new EndpointData(null, null, null)

    5.1 從receivers中拿出一個EndpointData 判斷是否是毒藥丸,如果是則放回去,循環結束

    5.2 如果不是毒藥丸,則調用inbox.process(Dispatcher)處理EndpointData

    下面我們看下Inbox初始化時都幹了啥:

/**
 * An inbox that stores messages for an [[RpcEndpoint]] and posts messages to it thread-safely.
 */
private[netty] class Inbox(
    val endpointRef: NettyRpcEndpointRef,
    val endpoint: RpcEndpoint)
  extends Logging {

  inbox =>  // Give this an alias so we can use it more clearly in closures.

  @GuardedBy("this")
  protected val messages = new java.util.LinkedList[InboxMessage]()

  /** True if the inbox (and its associated endpoint) is stopped. */
  @GuardedBy("this")
  private var stopped = false

  /** Allow multiple threads to process messages at the same time. */
  @GuardedBy("this")
  private var enableConcurrent = false

  /** The number of threads processing messages for this inbox. */
  @GuardedBy("this")
  private var numActiveThreads = 0

  // OnStart should be the first message to process
  inbox.synchronized {
    messages.add(OnStart)
  }

        5.2.1 聲明一個messages:java.util.LinkedList[InboxMessage]

        5.2.2 聲明變量stopped 標記inbox是否已經停止,默認false

        5.2.3 聲明變量enableConcurrent標記inbox是否允許併發,默認false

        5.2.4 聲明變量numActiveThreads表示當前活動線程數,默認0

        5.2.5 向messages中添加一個OnStart對象

下面我們看下inbox.process(Dispatcher)方法:

 /**
   * Process stored messages.
   */
  def process(dispatcher: Dispatcher): Unit = {
    var message: InboxMessage = null
    inbox.synchronized {
      if (!enableConcurrent && numActiveThreads != 0) {
        return
      }
      message = messages.poll()
      if (message != null) {
        numActiveThreads += 1
      } else {
        return
      }
    }

        5.2.6 聲明變量var message: InboxMessage = null

        5.2.7 判斷是否是“不允許多線程處理消息”並且“當前的活動線程數不爲零”,滿足條件則方法結束

        5.2.8 如果允許多線程處理且至少有一個活動線程,則從messages取出一條message

        5.2.9 判斷取出的message是否非null,是則活動線程數加1,否則方法結束。

 while (true) {
      safelyCall(endpoint) {
        message match {
          case RpcMessage(_sender, content, context) =>
            try {
              endpoint.receiveAndReply(context).applyOrElse[Any, Unit](content, { msg =>
                throw new SparkException(s"Unsupported message $message from ${_sender}")
              })
            } catch {
              case NonFatal(e) =>
                context.sendFailure(e)
                // Throw the exception -- this exception will be caught by the safelyCall function.
                // The endpoint's onError function will be called.
                throw e
            }

          case OneWayMessage(_sender, content) =>
            endpoint.receive.applyOrElse[Any, Unit](content, { msg =>
              throw new SparkException(s"Unsupported message $message from ${_sender}")
            })

          case OnStart =>
            endpoint.onStart()
            if (!endpoint.isInstanceOf[ThreadSafeRpcEndpoint]) {
              inbox.synchronized {
                if (!stopped) {
                  enableConcurrent = true
                }
              }
            }

          case OnStop =>
            val activeThreads = inbox.synchronized { inbox.numActiveThreads }
            assert(activeThreads == 1,
              s"There should be only a single active thread but found $activeThreads threads.")
            dispatcher.removeRpcEndpointRef(endpoint)
            endpoint.onStop()
            assert(isEmpty, "OnStop should be the last message")

          case RemoteProcessConnected(remoteAddress) =>
            endpoint.onConnected(remoteAddress)

          case RemoteProcessDisconnected(remoteAddress) =>
            endpoint.onDisconnected(remoteAddress)

          case RemoteProcessConnectionError(cause, remoteAddress) =>
            endpoint.onNetworkError(cause, remoteAddress)
        }
      }

      inbox.synchronized {
        // "enableConcurrent" will be set to false after `onStop` is called, so we should check it
        // every time.
        if (!enableConcurrent && numActiveThreads != 1) {
          // If we are not the only one worker, exit
          numActiveThreads -= 1
          return
        }
        message = messages.poll()
        if (message == null) {
          numActiveThreads -= 1
          return
        }
      }
    }
  }

        5.2.5 根據message類型進行模式匹配,根據不同數據類型進行相應的處理。 

        5.2.6 判斷是否是“不允許多線程處理消息”並且“當前的活動線程數不爲1”,滿足條件則當前活動線程數減1,循環結束

        5.2.7 如果不滿足上述條件,則從messages中取出一條message

        5.2.8 如果message爲null,則當前活動線程數減1,循環結束

 

    致此,Dispatcher的初始化完成,我們來總結一下Dispatcher初始化中幹了哪些事:

    定義了兩個ConcurrentMap(endpoints,endpointRefs) 分別來存放終端名稱和終端數據的映射、終端和終端引用的映射,一個receivers:LinkedBlockingQueue[EndpointData] 來存放Dispatcher接收到的EndpointData,一個線程池 threadpool:ThreadPoolExecutor,並執行pool.execute(new MessageLoop), pool.execute(new MessageLoop)方法不斷的從receivers中拿數據,如果不是空數據,則調用inbox.process方法,inbox.process方法則根據messages中的數據,進行相應的處理操作。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章