Boost.ASIO源碼：deadline_timer源碼級解析（三）—— 從源碼解釋io_service::run()到底發生了什麼

前前後後run這個函數來來往往反反覆覆看了不知道多少遍，對它的邏輯始終沒弄明白，直到最近研究deadline_timer才恍然大悟理清了前面的一些邏輯，在此順便總結一下，也算是填了前面幾個博客一直沒講明白的一些點。（因爲那時我還沒完全看懂，故只能留坑了。。）
我前面所有的博客都算是這篇博客的鋪墊，本文中也多次引用了我以前寫的博客，需要的時候建議還是參考下，不然可能有點難以理解。

前文回顧

前面兩個博客一個講了deadline_timer的調用邏輯，一個講了epoll_reactor的觸發邏輯。其實最重要的真正處理邏輯一直沒講，而這個處理邏輯，便是在io_service::run()中。
前面兩篇文章，第一篇是講到epoll_reactor::schedule_timer就沒有擴展下去了，第二篇在第一篇的基礎上進行擴展，但講到epoll_reactor::run()也沒有擴展下去了。在這裏我還是建議一下至少先看前面兩篇文章，雖然直接看這篇文章也能對io_service::run()有所瞭解，但是從最外層調用開始看能對整個代碼邏輯有更宏觀的理解。

io_service::run的背景相關

這些其實大部分前面的博客都說過了，但在這裏還是總結說明一下。
io_service是一個別名，它的本名叫io_context，而io_context繼承自execution_context，代表一個上下文環境，這個execution_context持有一個service_registry對象，service_registry維護所有的服務，每種服務只會在service_registry中存在一個對象，這個主要通過use_service這個函數實現。而服務說直白點，就是調用ASIO的那些接口類所調用的邏輯處理函數集合體（當然服務裏還會持有相關數據），所有的服務都繼承自execution_context::service，這個基類規定，所有的服務都要持有它的運行上下文對象，即execution_context（具體點方便理解，這裏就把execution_context直接看作io_service就行了）。而io_context實際上也是不幹實事的，io_context有個成員叫impl_，這個成員可以理解爲io_context的具體邏輯實現類，io_context幾乎所有函數都是調用這個impl_的接口函數。在這裏，這個impl_就是我們前面多次提到的scheduler。
所以所謂的io_service.run，它的真實邏輯在scheduler.run中。

從構造scheduler到scheduler::run()到底發生了什麼

這裏還提到了構造scheduler，scheduler在構造時實際上還做了一些很不明顯的工作，如果忽略的話scheduler::run()的邏輯就講不通了（這也是前面我一直沒看懂的原因）。
首先看到一個最基本的deadline_timer用法，後面會以這個例子爲例來講解run的邏輯：

void Print(const boost::system::error_code &ec);
boost::asio::io_service io;  
boost::asio::deadline_timer t(io, boost::posix_time::seconds(5));  
t.async_wait(Print);  
io.run()

從第2行構造io_service看起，此時當然實際上執行的是io_context的空參構造函數，其中會以默認的方式構造scheduler：

io_context::io_context()
  : impl_(add_impl(new impl_type(*this, BOOST_ASIO_CONCURRENCY_HINT_DEFAULT)))// 這個impl_type就是scheduler的別名
{
}

io_context::impl_type& io_context::add_impl(io_context::impl_type* impl)  // 這個impl_type就是scheduler的別名
{
  boost::asio::detail::scoped_ptr<impl_type> scoped_impl(impl);  // 指針包裹類，銷燬時會delete指針所指對象
  boost::asio::add_service<impl_type>(*this, scoped_impl.get());  // 這裏把scheduler添加到service_registry中
  return *scoped_impl.release();  // 解除指針與所指對象的綁定關係，這樣就不會釋放原本所指的對象了。
  // 爲什麼既然最後要解除綁定還要用scoped_impl這種智能指針來處理呢：考慮下中間異常退出的情況。
}

然後再看所調用的scheduler構造函數：

scheduler::scheduler(
    boost::asio::execution_context& ctx, int concurrency_hint)
  : boost::asio::detail::execution_context_service_base<scheduler>(ctx),
    one_thread_(concurrency_hint == 1
        || !BOOST_ASIO_CONCURRENCY_HINT_IS_LOCKING(
          SCHEDULER, concurrency_hint)
        || !BOOST_ASIO_CONCURRENCY_HINT_IS_LOCKING(
          REACTOR_IO, concurrency_hint)),
    mutex_(BOOST_ASIO_CONCURRENCY_HINT_IS_LOCKING(
          SCHEDULER, concurrency_hint)),
    task_(0),  // 這個是epoll_reactor，此時初始化爲空
    task_interrupted_(true),  // 一個flag，標識epoll_reactor是否處於中斷狀態
    outstanding_work_(0),  // 未完成的任務數
    stopped_(false),
    shutdown_(false),
    concurrency_hint_(concurrency_hint)
{
}

這裏只需要記住outstanding_work_這個屬性就行了，這個表示未完成任務數的成員屬性將是決定scheduler::run()的運行狀態的關鍵屬性。
這裏可以看到在scheduler構造時實際上它的觸發器是空的，那這個epoll_reactor是在哪裏初始化呢——在deadline_timer的構造函數中（這裏僅僅是以最前面那個例子爲例，並不是說epoll_reactor只能在這裏面初始化，準確說epoll_reactor在許多相關的服務類中都會觸發初始化）。如前面文章所說，deadline_timer的構造實際上會導致它的服務類deadline_timer_service的構造，而deadline_timer_service的構造函數中會調用epoll_reactor::init_task()，而這個函數又會調用scheduler::init_task()：

  deadline_timer_service(boost::asio::io_context& io_context)
    : service_base<deadline_timer_service<Time_Traits> >(io_context),
      scheduler_(boost::asio::use_service<timer_scheduler>(io_context))
  {
    scheduler_.init_task();  // here~
    scheduler_.add_timer_queue(timer_queue_);
  }

void epoll_reactor::init_task()
{
  scheduler_.init_task();
}

void scheduler::init_task()
{
  mutex::scoped_lock lock(mutex_);
  if (!shutdown_ && !task_)
  {
    task_ = &use_service<reactor>(this->context());
    op_queue_.push(&task_operation_);
    wake_one_thread_and_unlock(lock);
  }
}

這裏需要強調注意的是scheduler::init_task中的task_operation_，op_queue是scheduler的所有待處理回調函數（包裝類）隊列，而這個task_operation_是個空的包裝類，僅代表每個scheduler它的epoll_reactor的佔位符。

  // Operation object to represent the position of the task in the queue.
  struct task_operation : operation
  {
    task_operation() : operation(0) {}
  } task_operation_;

當從隊列中取出的op是這個task_operation_時，則說明scheduler該處理epoll_reactor了。（不然誰來調用epoll_reactor::run()啊）
然後接着執行例子中的async_wait函數，參考前面的博客，這時候會執行到epoll_rector::schdule_timer函數，當然，此時shutdown_標識爲爲false，故走的是下面的邏輯：

template <typename Time_Traits>
void epoll_reactor::schedule_timer(timer_queue<Time_Traits>& queue,
    const typename Time_Traits::time_type& time,
    typename timer_queue<Time_Traits>::per_timer_data& timer, wait_op* op)
{
  mutex::scoped_lock lock(mutex_);

  if (shutdown_)
  {
    scheduler_.post_immediate_completion(op, false);
    return;
  }

  bool earliest = queue.enqueue_timer(time, timer, op);
  scheduler_.work_started();
  if (earliest)
    update_timeout();
}

// scheduler.hpp
  void work_started()
  {
    ++outstanding_work_;
  }

shutdown_爲true的情況後面會講。這裏執行完後就返回了，實際上變化的只有epoll_reactor的定時器隊列，還有scheduler的outstanding_work_未完成任務數以及epoll中timerfd的定時觸發時間。再次強調，注意此時outstanding_work_不再是0了（在這裏爲1）。
下面就是最關鍵的io_service::run()了，也就是例子中的 io.run() 。當然，此時調用的是scheduler.run()：

std::size_t scheduler::run(boost::system::error_code& ec)
{
  ec = boost::system::error_code();
  if (outstanding_work_ == 0)
  {
    stop();
    return 0;
  }

  thread_info this_thread;
  this_thread.private_outstanding_work = 0;
  thread_call_stack::context ctx(this, this_thread);

  mutex::scoped_lock lock(mutex_);

  std::size_t n = 0;
  for (; do_run_one(lock, this_thread, ec); lock.lock())
    if (n != (std::numeric_limits<std::size_t>::max)())   // 如果n還沒到最大值：防止溢出
      ++n;
  return n;
}

還記得我前面強調的在調用run()時outstanding_work_不爲0嗎，不然此時就直接stop()了。所以說io_service在調用run()之前一定要先添加任務。
thread_info代表當前的私有工作線程，具體邏輯參考我前面的博客，然後便是循環調用do_run_one了，這個函數每次處理一個op_queue_中的項：

std::size_t scheduler::do_run_one(mutex::scoped_lock& lock,
    scheduler::thread_info& this_thread,
    const boost::system::error_code& ec)
{
  while (!stopped_)
  {
    if (!op_queue_.empty())   // 非空就取出來一個處理，空的話就阻塞等待喚醒
    {
      // Prepare to execute first handler from queue.
      operation* o = op_queue_.front();
      op_queue_.pop();
      bool more_handlers = (!op_queue_.empty());

      if (o == &task_operation_)   // 這就是我提到的task_operation_作爲epoll_reactor的佔位符
      {
        task_interrupted_ = more_handlers;

        if (more_handlers && !one_thread_)   // 如果還有其它任務要執行，且當前時多線程執行環境
          wakeup_event_.unlock_and_signal_one(lock);  // 喚醒另外一個線程來執行其它任務，本線程接着處理epoll_reactor
        else
          lock.unlock();

        task_cleanup on_exit = { this, &lock, &this_thread };   // 這個對象僅僅是利用它的析構函數
        (void)on_exit;   // 編譯器欺騙手法，防止報警告

        task_->run(more_handlers ? 0 : -1, this_thread.private_op_queue);  // 這裏調用epoll_reactor::run
      }
      else
      {
      
        // 這裏處理其它的需執行的op_queue_的項

        return 1;
      }
    }
    else
    {   // 如果op_queue_爲空，則阻塞
      wakeup_event_.clear(lock);
      wakeup_event_.wait(lock);
    }
  }

  return 0;
}

總的來說，先從op_queue_中取出待操作項，如果op_queue_爲空，則將該線程阻塞（wakeup_event_細節參考前面的博客）。
在本文所介紹的例子中，調用run()時op_queue_中只有一項，就是task_operation_這個epoll_reactor的佔位符，故接下來會調用epoll_reactor::run，然後epoll_reactor::run函數會阻塞直到定時器到點，然後會將定時器的回調函數op添加到私有隊列this_thread.private_op_queue中並返回（這一段內容可參考上一篇博客）。
返回後私有隊列有我們將處理的回調函數（包裝類），但op_queue_是空的，那究竟在哪裏把私有隊列中的op轉移到op_queue_中呢，這裏用了很巧妙的方法，就是這個task_cleanup。可以看到on_exit變量實際上並沒有用到，準確說我們只用到了它的析構函數，在if語句塊結束時就自動執行了，哪怕出現了異常也會執行（這纔是重點）。如下爲task_cleanup的源碼，它是scheduler的內部類：

struct scheduler::task_cleanup
{
  ~task_cleanup()
  {
    if (this_thread_->private_outstanding_work > 0)
    {
      boost::asio::detail::increment(
          scheduler_->outstanding_work_,
          this_thread_->private_outstanding_work);
    }
    this_thread_->private_outstanding_work = 0;

    // Enqueue the completed operations and reinsert the task at the end of
    // the operation queue.
    lock_->lock();
    scheduler_->task_interrupted_ = true;
    scheduler_->op_queue_.push(this_thread_->private_op_queue);
    scheduler_->op_queue_.push(&scheduler_->task_operation_);
  }

  scheduler* scheduler_;
  mutex::scoped_lock* lock_;
  thread_info* this_thread_;
};

可以看到這個析構函數的做法就是把this_thread_中的private_op_queue中的任務（還是叫任務吧，實際上裏面就是回調函數的包裝類）轉移到scheduler的op_queue_中同時把scheduler的task_interrupted標爲true，因爲執行到這裏task_.run已經返回了，相當於epoll_reactor暫時處於不運行狀態。這裏再注意一個細節，把this_thread中的私有任務（private_op_queue）轉移到scheduler中後，還重新把task_operation添加到隊列尾端。這是爲了保證所有的人物全部執行之後，epoll_reactor再重新運行。再參考如下epoll_reactor::run中的官方註釋就不難理解了（不要偷懶，我建議還是好好讀下這段英文）：

  // This code relies on the fact that the scheduler queues the reactor task
  // behind all descriptor operations generated by this function. This means,
  // that by the time we reach this point, any previously returned descriptor
  // operations have already been dequeued. Therefore it is now safe for us to
  // reuse and return them for the scheduler to queue again.

再回到scheduler::do_run_one中，if結束後op_queue_中就有人物了，在下一次循環時就自然進入到了上面源碼中省略的else的語句塊中，下面再把這段源碼補上：

//。。。
	   if (o == &task_operation_)
       {
        // 。。。
      }
      else
      {
        std::size_t task_result = o->task_result_;

        if (more_handlers && !one_thread_)
          wake_one_thread_and_unlock(lock);
        else
          lock.unlock();

        // Ensure the count of outstanding work is decremented on block exit.
        work_cleanup on_exit = { this, &lock, &this_thread };
        (void)on_exit;

        // Complete the operation. May throw an exception. Deletes the object.
        o->complete(this, ec, task_result);   // 這裏就是處理任務（執行回調函數）了

        return 1;
      }
//。。。

終於看到最終任務的處理代碼了，就是那行complete函數的執行。可以看到這裏用到了work_cleanup，猜也猜得到那是task_cleanup類似的用法，接下來直接看work_cleanup的源碼：

struct scheduler::work_cleanup
{
  ~work_cleanup()
  {
    if (this_thread_->private_outstanding_work > 1)
    {
      boost::asio::detail::increment(
          scheduler_->outstanding_work_,
          this_thread_->private_outstanding_work - 1);
    }
    else if (this_thread_->private_outstanding_work < 1)
    {
      scheduler_->work_finished();
      // void work_finished()
	  // {
	  //   if (--outstanding_work_ == 0)
	  //     stop();
	  // }
    }
    this_thread_->private_outstanding_work = 0;

    if (!this_thread_->private_op_queue.empty())
    {
      lock_->lock();
      scheduler_->op_queue_.push(this_thread_->private_op_queue);
    }
  }

  scheduler* scheduler_;
  mutex::scoped_lock* lock_;
  thread_info* this_thread_;
};

在本文的例子中，析構函數剛開始private_outstanding_work是等於0的，故直接把scheduler的未完成任務數（outstanding_work）減一，此時scheduler的outstanding_work也爲0了（本來就只有Print那一個任務），故執行stop()，此時scheduler::stopped_標誌位就變爲true了，下次scheduler::do_run_one就會跳出循環並返回0了，而scheduler::run中循環調用do_run_one的那個for循環自然也就結束了。
爲了補全上述代碼的邏輯，這裏再提一下這個stop函數，下面看源碼：

void scheduler::stop()
{
  mutex::scoped_lock lock(mutex_);
  stop_all_threads(lock);
}

void scheduler::stop_all_threads(
    mutex::scoped_lock& lock)
{
  stopped_ = true;
  wakeup_event_.signal_all(lock);

  if (!task_interrupted_ && task_)
  {
    task_interrupted_ = true;
    task_->interrupt();
  }
}

對於wakeup_event的細節可以參考我前面的博客。
可以看到這裏實際上通知了所有的線程scheduler應該結束do_run_one函數的循環調用了，還記得do_run_one中若op_queue_爲空便會阻塞在wakeup_event_.wait()上嗎，對於多個線程執行同一個io_service的run時，這是經常發生的情況：當前只有一個任務，故只有一個線程處於正常的執行狀態，其它的都阻塞在wait()上，當那個正常執行的線程執行完畢後，stopped_標誌位也更新爲true了，此時這個線程再執行wakeup_event_.signal_all喚醒所有阻塞在wait上面的線程，這些線程發現stopped_爲true自然也就結束自己的run函數的執行了，由此達到目的。
綜上所述，可以看到，在scheduler中task和work的概念實際上已經比較清晰了——task指epoll_reactor這個調度器，而work指正常的外界傳入的回調函數這樣的任務（當初還不懂完整邏輯時爲了理解這一茬可是花了我好久好久）。

回到deadline_timer

在我的deadline_timer源碼級解析（一）一文中最後降到了epoll_reactor::schedule_timer函數：

template <typename Time_Traits>
void epoll_reactor::schedule_timer(timer_queue<Time_Traits>& queue,
    const typename Time_Traits::time_type& time,
    typename timer_queue<Time_Traits>::per_timer_data& timer, wait_op* op)
{
  mutex::scoped_lock lock(mutex_);

  if (shutdown_)
  {
    scheduler_.post_immediate_completion(op, false);
    return;
  }

  bool earliest = queue.enqueue_timer(time, timer, op);//將定時器添加進隊列，這個隊列是deadline_timer_service的timer_queue_成員
  scheduler_.work_started();
  if (earliest)
    update_timeout();  // ，如果當前定時器的觸發時間最早，則更新epoll_reactor的timer_fd
}

其中第8到12行的邏輯還沒講，在這裏已經有schduler::run運行邏輯的鋪墊後，這個post_immediate_completion也就不難理解了。先來看post_immediate_completion的源碼：

  // Request invocation of the given operation and return immediately. Assumes
  // that work_started() has not yet been called for the operation.
void scheduler::post_immediate_completion(
    scheduler::operation* op, bool is_continuation)
{
  if (one_thread_ || is_continuation)
  {
    if (thread_info_base* this_thread = thread_call_stack::contains(this))
    {
      ++static_cast<thread_info*>(this_thread)->private_outstanding_work;
      static_cast<thread_info*>(this_thread)->private_op_queue.push(op);
      return;
    }
  }

  work_started();
  mutex::scoped_lock lock(mutex_);
  op_queue_.push(op);
  wake_one_thread_and_unlock(lock);
}

先不看if內的邏輯（deadline_timer也不走這個if中的邏輯），從函數名的immediate就可以看出，這相當於立即處理的方式——直接把任務push到scheduler的op_queue_中，不走this_thread->private_op_queue那一層了，這樣子也就不需要epoll_reactor::run中的邏輯進行處理了，再回想下先開始就是判斷epoll_reactor::shutdown_爲true時才走post_immediate_completion函數的，理到這裏邏輯就清晰了。
這時候再來看if中的邏輯，這裏的is_continuation的意思着實困惑了我很久，最後理解了半天我才感覺將這個continuation翻譯爲延遲比較合適——如果is_continuation爲true代表延遲執行。爲什麼這麼說呢，可以看到這裏面先判斷當前線程是否處於調用棧中，何謂調用棧，就是當前這個執行上下文中的scheduler有沒有調用run函數，再回想下scheduler::run中會聲明一個this_thread變量，此時就相當於把該scheduler添加到調用棧中了。其中具體邏輯參考我前面的博客。如果是再調用棧中，就把它的thread_info取出來，然後把該任務添加到它的thread_info的private_op_queue中再返回。此時就相當於要等epoll_reactor運行起來後，再把這個private_op_queue中的任務添加到op_queue_中，後面就是正常的處理邏輯了。這就是我爲什麼把這個continuation翻譯爲延遲的原因。

關於io_service私有隊列公共隊列的邏輯總結

這裏相當於重新說明一下前面的private_op_queue和scheduler的op_queue_的邏輯。因爲前面所說的邏輯好像不符合正常對io_service的理解：private_op_queue不是私有隊列嗎，op_queue_不是公有隊列嗎，爲什麼好像是把私有隊列中的任務放到公有隊列中，而不是把公有隊列中的任務放到私有隊列中？
其實這個問題我也納悶了一段時間，但只要仔細理解代碼邏輯就能發現，private_op_queue的主要任務是暫存將要處理的任務，至於爲什麼叫private_op_queue呢，是因爲這個將要處理的任務是由某個線程發現的，是它在它自己的scheduler::run中的epoll_reactor::run中的某處邏輯發現的（再仔細回想下，如果要用到io_service多線程實際上是要自己創建多個線程來調用同一個io_service::run的，此時每個io_service::run中都會有自己的臨時變量this_thread，而這個thread_info類型的變量某種程度上就代表了屬於某個工作線程的私有數據信息，private_op_queue就是它的成員）。而某個正運行scheduler::run的線程要處理這些任務並不是直接從它自己的private_op_queue中取出來，而是先放到所有線程共享的同一個scheduler::op_queue_中，再將任務從中取出來處理。
爲了防止邏輯混亂這裏再重申一下，哪些數據是屬於多線程共享的，哪些是數據是屬於每個線程私有的（這裏指的多線程是指多個運行同一個io_service對象的run函數的線程）。所有scheduler中的成員變量都是共享的，所有scheduler::run中的臨時變量，主要就是thread_info類型的this_thread變量是私有的。

Boost.ASIO源碼：deadline_timer源碼級解析（三）—— 從源碼解釋io_service::run()到底發生了什麼

前文回顧

io_service::run的背景相關

從構造scheduler到scheduler::run()到底發生了什麼

回到deadline_timer

關於io_service私有隊列公共隊列的邏輯總結

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

無鎖隊列、自旋鎖隊列、互斥鎖隊列性能對比測試

一個完全解耦的對象池模型

Boost.ASIO源碼：從async_write看ASIO的異步IO邏輯

模板偏特化和默認模板參數的匹配順序

一個用於控制對象個數的基類

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結