作者：孫偉

1、進程/線程/協程基本概念

一個進程可以有多個線程，一般情況下固定2MB內存塊來做棧，用來保存當前被調用/掛起的函數內部的變量，CPU在執行調度的時候切換的是線程，如果下一個線程也是當前進程的，就只有線程切換，“很快”就能完成；如果下一個線程不是當前的進程，就需要切換進程，這就得費點時間了。
線程分爲內核態線程和用戶態線程，用戶態線程需要綁定內核態線程，CPU並不能感知用戶態線程的存在，它只知道它在運行1個線程，這個線程實際是內核態線程。
用戶態線程實際有個名字叫協程（co-routine），爲了容易區分，我們使用協程指用戶態線程，使用線程指內核態線程。
協程跟線程是有區別的，線程由CPU調度是搶佔式的，協程由用戶態調度是協作式的，一個協程讓出CPU後，才執行下一個協程。

協程和線程綁定關係有以下3種：

N:1，N個協程綁定1個線程，優點就是協程在用戶態線程即完成切換，不會陷入到內核態，這種切換非常的輕量快速。但也有很大的缺點，1個進程的所有協程都綁定在1個線程上，一是某個程序用不了硬件的多核加速能力，二是一旦某協程阻塞，造成線程阻塞，本進程的其他協程都無法執行了，根本就沒有併發的能力了。
1:1，1個協程綁定1個線程，這種最容易實現。協程的調度都由CPU完成了，不存在N:1缺點，但有一個缺點是協程的創建、刪除和切換的代價都由CPU完成，有點略顯昂貴了。
M:N，M個協程綁定N個線程，是N:1和1:1類型的結合，克服了以上2種模型的缺點，但實現起來最爲複雜。

2、Golang簡介

2.1 Goroutine 概念

因爲線程切換需要很大的上下文，這種切換消耗了大量CPU時間，所以Go的並行單元並不是傳統意義上的線程，而是採用更輕量的協程（goroutine）來處理，大大提高了並行度，因此Go被稱爲“最並行的語言”。

2.2與其他併發模型的對比

Python等解釋性語言採用的是多進程併發模型，進程的上下文是最大的，所以切換耗費巨大，同時由於多進程通信只能用socket通訊，或者專門設置共享內存，給編程帶來了極大的困擾與不便；
C++等語言通常會採用多線程併發模型，相比進程，線程的上下文要小很多，而且多個線程之間本來就是共享內存的，所以編程相比要輕鬆很多。但是線程的啓動和銷燬，切換依然要耗費大量CPU時間；於是出現了線程池技術，將線程先儲存起來，保持一定的數量，來避免頻繁開啓/關閉線程的時間消耗，但是這種初級的技術存在一些問題，比如有線程一直被IO阻塞，這樣的話這個線程一直佔據着坑位，導致後面的任務排不到隊，拿不到線程來執行；
Go的併發較爲複雜，Go採用了更輕量的數據結構來代替線程，這種數據結構相比線程更輕量，他有自己的棧，切換起來更快。然而真正執行併發的還是線程，Go通過調度器將goroutine調度到線程中執行，並適時地釋放和創建新的線程，並且當一個正在運行的goroutine進入阻塞（常見場景就是等待IO）時，將其脫離佔用的線程，將其他準備好運行的goroutine放在該線程上執行。通過較爲複雜的調度手段，使得整個系統獲得極高的並行度同時又不耗費大量的CPU資源。

2.3 Goroutine的特點

非阻塞。Goroutine的引入是爲了方便高併發程序的編寫。一個Goroutine在進行阻塞操作（比如系統調用）時，會把當前線程中的其他Goroutine移交到其他線程中繼續執行，從而避免了整個程序的阻塞。
調度器。雖然Golang引入了垃圾回收（gc），在執行gc時就要求Goroutine是停止的，但Go通過自己實現調度器，也可以方便的實現該功能。通過多個Goroutine來實現併發程序，既有異步IO的優勢，又具有多線程、多進程編寫程序的便利性。
自己維護堆棧。當然引入Goroutine，也意味着引入了極大的複雜性。一個Goroutine既要包含要執行的代碼，又要包含用於執行該代碼的棧、PC（PC值=當前程序執行位置+8）和SP指針。堆棧指針需要保證各種模式下程序完成性。

既然每個Goroutine都有自己的棧，那麼在創建Goroutine時，就要同時創建對應的棧。Goroutine在執行時，棧空間會不停增長。棧通常是連續增長的，由於每個進程中的各個線程共享虛擬內存空間，當有多個線程時，就需要爲每個線程分配不同起始地址的棧。這就需要在分配棧之前先預估每個線程棧的大小。如果線程數量非常多，就很容易棧溢出。

爲了解決這個問題，就有了Split Stacks 技術：創建棧時，只分配一塊比較小的內存，如果進行某次函數調用導致棧空間不足時，就會在其他地方分配一塊新的棧空間。新的空間不需要和老的棧空間連續。函數調用的參數會拷貝到新的棧空間中，接下來的函數執行都在新棧空間中進行。Golang的棧管理方式與此類似，但是爲了更高的效率，使用了連續棧（ Golang連續棧）實現方式也是先分配一塊固定大小的棧，在棧空間不足時，分配一塊更大的棧，並把舊的棧全部拷貝到新棧中。這樣避免了Split Stacks方法可能導致的頻繁內存分配和釋放。

Goroutine的執行是可以被搶佔的。如果一個Goroutine一直佔用CPU，長時間沒有被調度過，就會被runtime搶佔掉，把CPU時間交給其他Goroutine。這個可以通過 debug/goroutine 阻塞實現。

2.4 結構體

M：指go中的工作者線程，是真正執行代碼的單元；
P：是一種調度goroutine的上下文，goroutine依賴於P進行調度，P是真正的並行單元；
G：即goroutine，是go語言中的一段代碼（以一個函數的形式展現），最小的並行單元；

P必須綁定在M上才能運行，M必須綁定了P才能運行，而一般情況下，最多有MAXPROCS（通常等於CPU數量）個P，但是可能有很多個M，真正運行的只有綁定了M的P，所以P是真正的並行單元。

每個P有一個自己的runnableG隊列，可以從裏面拿出一個G來運行，同時也有一個全局的runnable G隊列，G通過P依附在M上面執行。不單獨使用全局的runnable G隊列的原因是，分佈式的隊列有利於減小臨界區大小，想一想多個線程同時請求可用的G的時候，如果只有全局的資源，那麼這個全局的鎖會導致多少線程一直在等待。

但是如果一個正在執行的G進入了阻塞，典型的例子就是等待IO，那麼他和它所在的M會在那邊等待，而上下文P會傳遞到其他可用的M上面，這樣這個阻塞就不會影響程序的並行度。

G結構體

type g struct {
   // Stack parameters.
   // stack describes the actual stack memory: [stack.lo, stack.hi).
   // stackguard0 is the stack pointer compared in the Go stack growth prologue.
   // It is stack.lo+StackGuard normally, but can be StackPreempt to trigger a preemption.
   // stackguard1 is the stack pointer compared in the C stack growth prologue.
   // It is stack.lo+StackGuard on g0 and gsignal stacks.
   // It is ~0 on other goroutine stacks, to trigger a call to morestackc (and crash).
   stack       stack   // offset known to runtime/cgo //描述了真實的棧內存,包括上下界、
   stackguard0 uintptr // offset known to liblink
   stackguard1 uintptr // offset known to liblink
 
   _panic         *_panic // innermost panic - offset known to liblink
   _defer         *_defer // innermost defer
   m              *m      // current m; offset known to arm liblink  //當前的M
   sched          gobuf    //goroutine切換時,用於保存g的上下文
   syscallsp      uintptr        // if status==Gsyscall, syscallsp = sched.sp to use during gc
   syscallpc      uintptr        // if status==Gsyscall, syscallpc = sched.pc to use during gc
   stktopsp       uintptr        // expected sp at top of stack, to check in traceback
   param          unsafe.Pointer // passed parameter on wakeup 用於傳遞參數,睡眠時 其他goroutine可以設置param,喚醒時該goroutine可以獲取
   atomicstatus   uint32
   stackLock      uint32 // sigprof/scang lock; TODO: fold in to atomicstatus
   goid           int64   //goroutine 的ID
   waitsince      int64  // approx time when the g become blocked  g被阻塞的 大概時間
   waitreason     string // if status==Gwaiting
   schedlink      guintptr
   preempt        bool     // preemption signal, duplicates stackguard0 = stackpreempt
   paniconfault   bool     // panic (instead of crash) on unexpected fault address
   preemptscan    bool     // preempted g does scan for gc
   gcscandone     bool     // g has scanned stack; protected by _Gscan bit in status
   gcscanvalid    bool     // false at start of gc cycle, true if G has not run since last scan; TODO: remove?
   throwsplit     bool     // must not split stack
   raceignore     int8     // ignore race detection events
   sysblocktraced bool     // StartTrace has emitted EvGoInSyscall about this goroutine
   sysexitticks   int64    // cputicks when syscall has returned (for tracing)
   traceseq       uint64   // trace event sequencer
   tracelastp     puintptr // last P emitted an event for this goroutine
   lockedm        muintptr    //G被鎖定只能在這個M運行
   sig            uint32
   writebuf       []byte
   sigcode0       uintptr
   sigcode1       uintptr
   sigpc          uintptr
   gopc           uintptr // pc of go statement that created this goroutine
   startpc        uintptr // pc of goroutine function
   racectx        uintptr
   waiting        *sudog         // sudog structures this g is waiting on (that have a valid elem ptr); in lock order
   cgoCtxt        []uintptr      // cgo traceback context
   labels         unsafe.Pointer // profiler labels
   timer          *timer         // cached timer for time.Sleep
   selectDone     uint32         // are we participating in a select and did someone win the race?
 
   // Per-G GC state
 
   // gcAssistBytes is this G's GC assist credit in terms of
   // bytes allocated. If this is positive, then the G has credit
   // to allocate gcAssistBytes bytes without assisting. If this
   // is negative, then the G must correct this by performing
   // scan work. We track this in bytes to make it fast to update
   // and check for debt in the malloc hot path. The assist ratio
   // determines how this corresponds to scan work debt.
   gcAssistBytes int64
}

Gobuf結構體

type gobuf struct {
    sp   uintptr
    pc   uintptr
    g    guintptr
    ctxt unsafe.Pointer
    ret  sys.Uintreg
    lr   uintptr
    bp   uintptr // for GOEXPERIMENT=framepointer
}

其中最主要的當然是sched了，保存了goroutine的上下文。goroutine切換的時候不同於線程有OS來負責這部分數據，而是由一個gobuf對象來保存，這樣能夠更加輕量級，再來看看gobuf的結構

M結構體

type m struct {
    g0      *g     // 帶有調度棧的goroutine
    gsignal       *g         // 處理信號的goroutine
    tls           [6]uintptr // thread-local storage
    mstartfn      func()
    curg          *g       // 當前運行的goroutine
    caughtsig     guintptr
    p             puintptr // 關聯p和執行的go代碼
    nextp         puintptr
    id            int32
    mallocing     int32 // 狀態
    spinning      bool // m是否out of work
    blocked       bool // m是否被阻塞
    inwb          bool // m是否在執行寫屏蔽
    printlock     int8
    incgo         bool // m在執行cgo嗎
    fastrand      uint32
    ncgocall      uint64      // cgo調用的總數
    ncgo          int32       // 當前cgo調用的數目
    park          note
    alllink       *m // 用於鏈接allm
    schedlink     muintptr
    mcache        *mcache // 當前m的內存緩存
    lockedg       *g // 鎖定g在當前m上執行，而不會切換到其他m
    createstack   [32]uintptr // thread創建的棧
}

結構體M中有兩個G是需要關注一下的:

一個是curg，代表結構體M當前綁定的結構體G。
另一個是g0，是帶有調度棧的goroutine，這是一個比較特殊的goroutine。普通的goroutine的棧是在堆上分配的可增長的棧，而g0的棧是M對應的線程的棧。所有調度相關的代碼，會先切換到該goroutine的棧中再執行。也就是說線程的棧也是用的g實現，而不是使用的OS的。

P結構體

type p struct {
    lock mutex
    id          int32
    status      uint32 // 狀態，可以爲pidle/prunning/...
    link        puintptr
    schedtick   uint32     // 每調度一次加1
    syscalltick uint32     // 每一次系統調用加1
    sysmontick  sysmontick
    m           muintptr   // 回鏈到關聯的m
    mcache      *mcache
    racectx     uintptr
    goidcache    uint64 // goroutine的ID的緩存
    goidcacheend uint64
    // 可運行的goroutine的隊列
    runqhead uint32
    runqtail uint32
    runq     [256]guintptr
    runnext guintptr // 下一個運行的g
    sudogcache []*sudog
    sudogbuf   [128]*sudog
    palloc persistentAlloc // per-P to avoid mutex
    pad [sys.CacheLineSize]byte
}

其中P的狀態有Pidle, Prunning, Psyscall, Pgcstop, Pdead；在其內部隊列runqhead裏面有可運行的goroutine，P優先從內部獲取執行的g，這樣能夠提高效率。

Schedt結構體

type schedt struct {
   goidgen  uint64
    lastpoll uint64
    lock mutex
    midle        muintptr // idle狀態的m
    nmidle       int32    // idle狀態的m個數
    nmidlelocked int32    // lockde狀態的m個數
    mcount       int32    // 創建的m的總數
    maxmcount    int32    // m允許的最大個數
    ngsys uint32 // 系統中goroutine的數目，會自動更新
    pidle      puintptr // idle的p
    npidle     uint32
    nmspinning uint32
    // 全局的可運行的g隊列
    runqhead guintptr
    runqtail guintptr
    runqsize int32
    // dead的G的全局緩存
    gflock       mutex
    gfreeStack   *g
    gfreeNoStack *g
    ngfree       int32
    // sudog的緩存中心
    sudoglock  mutex
    sudogcache *sudog
}

大多數需要的信息都已放在了結構體M、G和P中，schedt結構體只是一個殼。可以看到，其中有M的idle隊列，P的idle隊列，以及一個全局的就緒的G隊列。schedt結構體中的Lock是非常必須的，如果M或P等做一些非局部的操作，它們一般需要先鎖住調度器。

2.5具體函數

goroutine調度器的代碼在/src/runtime/proc.go中，一些比較關鍵的函數分析如下。

2.5.1 schedule函數

schedule函數在runtime需要進行調度時執行，爲當前的P尋找一個可以運行的G並執行它，尋找順序如下：

1）調用runqget函數來從P自己的runnable G隊列中得到一個可以執行的G；
2）如果1）失敗，則調用findrunnable函數去尋找一個可以執行的G；
3）如果2）也沒有得到可以執行的G，那麼結束調度，從上次的現場繼續執行。
4）注意）//偶爾會先檢查一次全局可運行隊列，以確保公平性。否則，兩個goroutine可以完全佔用本地runqueue。通過 schedtick計數 %61來保證

代碼如下：

// One round of scheduler: find a runnable goroutine and execute it.
// Never returns.
func schedule() {
   _g_ := getg()
 
   if _g_.m.locks != 0 {
      throw("schedule: holding locks")
   }
 
   if _g_.m.lockedg != 0 {
      stoplockedm()
      execute(_g_.m.lockedg.ptr(), false) // Never returns.
   }
 
   // We should not schedule away from a g that is executing a cgo call,
   // since the cgo call is using the m's g0 stack.
   if _g_.m.incgo {
      throw("schedule: in cgo")
   }
 
top:
   if sched.gcwaiting != 0 {
      gcstopm()
      goto top
   }
   if _g_.m.p.ptr().runSafePointFn != 0 {
      runSafePointFn()
   }
 
   var gp *g
   var inheritTime bool
   if trace.enabled || trace.shutdown {
      gp = traceReader()
      if gp != nil {
         casgstatus(gp, _Gwaiting, _Grunnable)
         traceGoUnpark(gp, 0)
      }
   }
   if gp == nil && gcBlackenEnabled != 0 {
      gp = gcController.findRunnableGCWorker(_g_.m.p.ptr())
   }
   if gp == nil {
      // Check the global runnable queue once in a while to ensure fairness.
      // Otherwise two goroutines can completely occupy the local runqueue
      // by constantly respawning each other.
      if _g_.m.p.ptr().schedtick%61 == 0 && sched.runqsize > 0 {
         lock(&sched.lock)
         gp = globrunqget(_g_.m.p.ptr(), 1)
         unlock(&sched.lock)
      }
   }
   if gp == nil {
      gp, inheritTime = runqget(_g_.m.p.ptr())
      if gp != nil && _g_.m.spinning {
         throw("schedule: spinning with local work")
      }
   }
   if gp == nil {
      gp, inheritTime = findrunnable() // blocks until work is available
   }
 
   // This thread is going to run a goroutine and is not spinning anymore,
   // so if it was marked as spinning we need to reset it now and potentially
   // start a new spinning M.
   if _g_.m.spinning {
      resetspinning()
   }
 
   if gp.lockedm != 0 {
      // Hands off own p to the locked m,
      // then blocks waiting for a new p.
      startlockedm(gp)
      goto top
   }
 
   execute(gp, inheritTime)
}

2.5.2 findrunnable函數

findrunnable函數負責給一個P尋找可以執行的G，它的尋找順序如下：

1）調用runqget函數來從P自己的runnable G隊列中得到一個可以執行的G；
2）如果1）失敗，調用globrunqget函數從全局runnableG隊列中得到一個可以執行的G；
3）如果2）失敗，調用netpoll（非阻塞）函數取一個異步回調的G
4）如果3）失敗，嘗試從其他P那裏偷取一半數量的G過來；
5）如果4）失敗，再次調用globrunqget函數從全局runnableG隊列中得到一個可以執行的G；
6）如果5）失敗，調用netpoll（阻塞）函數取一個異步回調的G；
7）如果6）仍然沒有取到G，那麼調用stopm函數停止這個M。

代碼如下：

// Finds a runnable goroutine to execute.
// Tries to steal from other P's, get g from global queue, poll network.
func findrunnable() (gp *g, inheritTime bool) {
   _g_ := getg()
 
   // The conditions here and in handoffp must agree: if
   // findrunnable would return a G to run, handoffp must start
   // an M.
 
top:
   _p_ := _g_.m.p.ptr()
   if sched.gcwaiting != 0 {
      gcstopm()
      goto top
   }
   if _p_.runSafePointFn != 0 {
      runSafePointFn()
   }
   if fingwait && fingwake {
      if gp := wakefing(); gp != nil {
         ready(gp, 0, true)
      }
   }
   if *cgo_yield != nil {
      asmcgocall(*cgo_yield, nil)
   }
 
   // local runq
   if gp, inheritTime := runqget(_p_); gp != nil {
      return gp, inheritTime
   }
 
   // global runq
   if sched.runqsize != 0 {
      lock(&sched.lock)
      gp := globrunqget(_p_, 0)
      unlock(&sched.lock)
      if gp != nil {
         return gp, false
      }
   }
 
   // Poll network.
   // This netpoll is only an optimization before we resort to stealing.
   // We can safely skip it if there are no waiters or a thread is blocked
   // in netpoll already. If there is any kind of logical race with that
   // blocked thread (e.g. it has already returned from netpoll, but does
   // not set lastpoll yet), this thread will do blocking netpoll below
   // anyway.
   if netpollinited() && atomic.Load(&netpollWaiters) > 0 && atomic.Load64(&sched.lastpoll) != 0 {
      if gp := netpoll(false); gp != nil { // non-blocking
         // netpoll returns list of goroutines linked by schedlink.
         injectglist(gp.schedlink.ptr())
         casgstatus(gp, _Gwaiting, _Grunnable)
         if trace.enabled {
            traceGoUnpark(gp, 0)
         }
         return gp, false
      }
   }
 
   // Steal work from other P's.
   procs := uint32(gomaxprocs)
   if atomic.Load(&sched.npidle) == procs-1 {
      // Either GOMAXPROCS=1 or everybody, except for us, is idle already.
      // New work can appear from returning syscall/cgocall, network or timers.
      // Neither of that submits to local run queues, so no point in stealing.
      goto stop
   }
   // If number of spinning M's >= number of busy P's, block.
   // This is necessary to prevent excessive CPU consumption
   // when GOMAXPROCS>>1 but the program parallelism is low.
   if !_g_.m.spinning && 2*atomic.Load(&sched.nmspinning) >= procs-atomic.Load(&sched.npidle) {
      goto stop
   }
   if !_g_.m.spinning {
      _g_.m.spinning = true
      atomic.Xadd(&sched.nmspinning, 1)
   }
   for i := 0; i < 4; i++ {
      for enum := stealOrder.start(fastrand()); !enum.done(); enum.next() {
         if sched.gcwaiting != 0 {
            goto top
         }
         stealRunNextG := i > 2 // first look for ready queues with more than 1 g
         if gp := runqsteal(_p_, allp[enum.position()], stealRunNextG); gp != nil {
            return gp, false
         }
      }
   }
 
stop:
 
   // We have nothing to do. If we're in the GC mark phase, can
   // safely scan and blacken objects, and have work to do, run
   // idle-time marking rather than give up the P.
   if gcBlackenEnabled != 0 && _p_.gcBgMarkWorker != 0 && gcMarkWorkAvailable(_p_) {
      _p_.gcMarkWorkerMode = gcMarkWorkerIdleMode
      gp := _p_.gcBgMarkWorker.ptr()
      casgstatus(gp, _Gwaiting, _Grunnable)
      if trace.enabled {
         traceGoUnpark(gp, 0)
      }
      return gp, false
   }
 
   // Before we drop our P, make a snapshot of the allp slice,
   // which can change underfoot once we no longer block
   // safe-points. We don't need to snapshot the contents because
   // everything up to cap(allp) is immutable.
   allpSnapshot := allp
 
   // return P and block
   lock(&sched.lock)
   if sched.gcwaiting != 0 || _p_.runSafePointFn != 0 {
      unlock(&sched.lock)
      goto top
   }
   if sched.runqsize != 0 {
      gp := globrunqget(_p_, 0)
      unlock(&sched.lock)
      return gp, false
   }
   if releasep() != _p_ {
      throw("findrunnable: wrong p")
   }
   pidleput(_p_)
   unlock(&sched.lock)
 
   // Delicate dance: thread transitions from spinning to non-spinning state,
   // potentially concurrently with submission of new goroutines. We must
   // drop nmspinning first and then check all per-P queues again (with
   // #StoreLoad memory barrier in between). If we do it the other way around,
   // another thread can submit a goroutine after we've checked all run queues
   // but before we drop nmspinning; as the result nobody will unpark a thread
   // to run the goroutine.
   // If we discover new work below, we need to restore m.spinning as a signal
   // for resetspinning to unpark a new worker thread (because there can be more
   // than one starving goroutine). However, if after discovering new work
   // we also observe no idle Ps, it is OK to just park the current thread:
   // the system is fully loaded so no spinning threads are required.
   // Also see "Worker thread parking/unparking" comment at the top of the file.
   wasSpinning := _g_.m.spinning
   if _g_.m.spinning {
      _g_.m.spinning = false
      if int32(atomic.Xadd(&sched.nmspinning, -1)) < 0 {
         throw("findrunnable: negative nmspinning")
      }
   }
 
   // check all runqueues once again
   for _, _p_ := range allpSnapshot {
      if !runqempty(_p_) {
         lock(&sched.lock)
         _p_ = pidleget()
         unlock(&sched.lock)
         if _p_ != nil {
            acquirep(_p_)
            if wasSpinning {
               _g_.m.spinning = true
               atomic.Xadd(&sched.nmspinning, 1)
            }
            goto top
         }
         break
      }
   }
 
   // Check for idle-priority GC work again.
   if gcBlackenEnabled != 0 && gcMarkWorkAvailable(nil) {
      lock(&sched.lock)
      _p_ = pidleget()
      if _p_ != nil && _p_.gcBgMarkWorker == 0 {
         pidleput(_p_)
         _p_ = nil
      }
      unlock(&sched.lock)
      if _p_ != nil {
         acquirep(_p_)
         if wasSpinning {
            _g_.m.spinning = true
            atomic.Xadd(&sched.nmspinning, 1)
         }
         // Go back to idle GC check.
         goto stop
      }
   }
 
   // poll network
   if netpollinited() && atomic.Load(&netpollWaiters) > 0 && atomic.Xchg64(&sched.lastpoll, 0) != 0 {
      if _g_.m.p != 0 {
         throw("findrunnable: netpoll with p")
      }
      if _g_.m.spinning {
         throw("findrunnable: netpoll with spinning")
      }
      gp := netpoll(true) // block until new work is available
      atomic.Store64(&sched.lastpoll, uint64(nanotime()))
      if gp != nil {
         lock(&sched.lock)
         _p_ = pidleget()
         unlock(&sched.lock)
         if _p_ != nil {
            acquirep(_p_)
            injectglist(gp.schedlink.ptr())
            casgstatus(gp, _Gwaiting, _Grunnable)
            if trace.enabled {
               traceGoUnpark(gp, 0)
            }
            return gp, false
         }
         injectglist(gp)
      }
   }
   stopm()
   goto top
}

2.5.3 newproc函數

newproc函數負責創建一個可以運行的G並將其放在當前的P的runnable G隊列中，它是類似”go func() { … }”語句真正被編譯器翻譯後的調用，核心代碼在newproc1函數。這個函數執行順序如下：

1）獲得當前的G所在的 P，然後從free G隊列中取出一個G；
2）如果1）取到則對這個G進行參數配置，否則新建一個G；
3）將G加入P的runnable G隊列。

代碼如下：

// Go1.10.8版本默認stack大小爲2KB

_StackMin = 2048
// 創建一個g對象,然後放到g隊列
// 等待被執行

// Create a new g running fn with narg bytes of arguments starting
// at argp. callerpc is the address of the go statement that created
// this. The new g is put on the queue of g's waiting to run.
func newproc1(fn *funcval, argp *uint8, narg int32, callerpc uintptr) {
   _g_ := getg()
 
   if fn == nil {
      _g_.m.throwing = -1 // do not dump full stacks
      throw("go of nil func value")
   }
   _g_.m.locks++ // disable preemption because it can be holding p in a local var
   siz := narg
   siz = (siz + 7) &^ 7
 
   // We could allocate a larger initial stack if necessary.
   // Not worth it: this is almost always an error.
   // 4*sizeof(uintreg): extra space added below
   // sizeof(uintreg): caller's LR (arm) or return address (x86, in gostartcall).
   if siz >= _StackMin-4*sys.RegSize-sys.RegSize {
      throw("newproc: function arguments too large for new goroutine")
   }
 
   _p_ := _g_.m.p.ptr()
   newg := gfget(_p_)
   if newg == nil {
      newg = malg(_StackMin)
      casgstatus(newg, _Gidle, _Gdead)
      allgadd(newg) // publishes with a g->status of Gdead so GC scanner doesn't look at uninitialized stack.
   }
   if newg.stack.hi == 0 {
      throw("newproc1: newg missing stack")
   }
 
   if readgstatus(newg) != _Gdead {
      throw("newproc1: new g is not Gdead")
   }
 
   totalSize := 4*sys.RegSize + uintptr(siz) + sys.MinFrameSize // extra space in case of reads slightly beyond frame
   totalSize += -totalSize & (sys.SpAlign - 1)                  // align to spAlign
   sp := newg.stack.hi - totalSize
   spArg := sp
   if usesLR {
      // caller's LR
      *(*uintptr)(unsafe.Pointer(sp)) = 0
      prepGoExitFrame(sp)
      spArg += sys.MinFrameSize
   }
   if narg > 0 {
      memmove(unsafe.Pointer(spArg), unsafe.Pointer(argp), uintptr(narg))
      // This is a stack-to-stack copy. If write barriers
      // are enabled and the source stack is grey (the
      // destination is always black), then perform a
      // barrier copy. We do this *after* the memmove
      // because the destination stack may have garbage on
      // it.
      if writeBarrier.needed && !_g_.m.curg.gcscandone {
         f := findfunc(fn.fn)
         stkmap := (*stackmap)(funcdata(f, _FUNCDATA_ArgsPointerMaps))
         // We're in the prologue, so it's always stack map index 0.
         bv := stackmapdata(stkmap, 0)
         bulkBarrierBitmap(spArg, spArg, uintptr(narg), 0, bv.bytedata)
      }
   }
 
   memclrNoHeapPointers(unsafe.Pointer(&newg.sched), unsafe.Sizeof(newg.sched))
   newg.sched.sp = sp
   newg.stktopsp = sp
   newg.sched.pc = funcPC(goexit) + sys.PCQuantum // +PCQuantum so that previous instruction is in same function
   newg.sched.g = guintptr(unsafe.Pointer(newg))
   gostartcallfn(&newg.sched, fn)
   newg.gopc = callerpc
   newg.startpc = fn.fn
   if _g_.m.curg != nil {
      newg.labels = _g_.m.curg.labels
   }
   if isSystemGoroutine(newg) {
      atomic.Xadd(&sched.ngsys, +1)
   }
   newg.gcscanvalid = false
   casgstatus(newg, _Gdead, _Grunnable)
 
   if _p_.goidcache == _p_.goidcacheend {
      // Sched.goidgen is the last allocated id,
      // this batch must be [sched.goidgen+1, sched.goidgen+GoidCacheBatch].
      // At startup sched.goidgen=0, so main goroutine receives goid=1.
      _p_.goidcache = atomic.Xadd64(&sched.goidgen, _GoidCacheBatch)
      _p_.goidcache -= _GoidCacheBatch - 1
      _p_.goidcacheend = _p_.goidcache + _GoidCacheBatch
   }
   newg.goid = int64(_p_.goidcache)
   _p_.goidcache++
   if raceenabled {
      newg.racectx = racegostart(callerpc)
   }
   if trace.enabled {
      traceGoCreate(newg, newg.startpc)
   }
   runqput(_p_, newg, true)
 
   if atomic.Load(&sched.npidle) != 0 && atomic.Load(&sched.nmspinning) == 0 && mainStarted {
      wakep()
   }
   _g_.m.locks--
   if _g_.m.locks == 0 && _g_.preempt { // restore the preemption request in case we've cleared it in newstack
      _g_.stackguard0 = stackPreempt
   }
}

2.5.4 goexit0函數

goexit函數是當G退出時調用的。這個函數對G進行一些設置後，將它放入free G列表中，供以後複用，之後調用schedule函數調度。

// goexit continuation on g0.
func goexit0(gp *g) {
   _g_ := getg()
 
   //設置g的 status從 _Grunning變爲 _Gdead
   casgstatus(gp, _Grunning, _Gdead)
   if isSystemGoroutine(gp) {
      atomic.Xadd(&sched.ngsys, -1)
   }
   //對該g 進行釋放設置 基本爲nil /0
   gp.m = nil
   locked := gp.lockedm != 0
   gp.lockedm = 0
   _g_.m.lockedg = 0
   gp.paniconfault = false
   gp._defer = nil // should be true already but just in case.
   gp._panic = nil // non-nil for Goexit during panic. points at stack-allocated data.
   gp.writebuf = nil
   gp.waitreason = ""
   gp.param = nil
   gp.labels = nil
   gp.timer = nil
 
   if gcBlackenEnabled != 0 && gp.gcAssistBytes > 0 {
      // Flush assist credit to the global pool. This gives
      // better information to pacing if the application is
      // rapidly creating an exiting goroutines.
      scanCredit := int64(gcController.assistWorkPerByte * float64(gp.gcAssistBytes))
      atomic.Xaddint64(&gcController.bgScanCredit, scanCredit)
      gp.gcAssistBytes = 0
   }
 
   // Note that gp's stack scan is now "valid" because it has no
   // stack.
   gp.gcscanvalid = true
   dropg()
 
   if _g_.m.lockedInt != 0 {
      print("invalid m->lockedInt = ", _g_.m.lockedInt, "\n")
      throw("internal lockOSThread error")
   }
   _g_.m.lockedExt = 0
   //把這個g 推到free G 列表
   gfput(_g_.m.p.ptr(), gp)
   if locked {
      // The goroutine may have locked this thread because
      // it put it in an unusual kernel state. Kill it
      // rather than returning it to the thread pool.
 
      // Return to mstart, which will release the P and exit
      // the thread.
      if GOOS != "plan9" { // See golang.org/issue/22227.
         gogo(&_g_.m.g0.sched)
      }
   }
   schedule()
}

2.5.5 handoffp函數

handoffp函數將P從系統調用或阻塞的M中傳遞出去，如果P還有runnable G隊列，那麼新開一個M，調用startm函數，新開的M不空旋。

// Hands off P from syscall or locked M.
// Always runs without a P, so write barriers are not allowed.
//go:nowritebarrierrec
func handoffp(_p_ *p) {
   // handoffp must start an M in any situation where
   // findrunnable would return a G to run on _p_.
 
 
   //如果這個P的隊列不爲空或調度內的size不爲空 那麼 進行startm 且不空旋
   if !runqempty(_p_) || sched.runqsize != 0 {
      startm(_p_, false)
      return
   }
   //如果正在進行GC處理  同上
   if gcBlackenEnabled != 0 && gcMarkWorkAvailable(_p_) {
      startm(_p_, false)
      return
   }
   //如果沒活可做了，檢查下有沒有 空閒/自旋的 M
   //否則 不需要我們做自旋
   if atomic.Load(&sched.nmspinning)+atomic.Load(&sched.npidle) == 0 && atomic.Cas(&sched.nmspinning, 0, 1) { // TODO: fast atomic
      startm(_p_, true)
      return
   }
   //調度上鎖  將這個P 摘除走
   lock(&sched.lock)
   if sched.gcwaiting != 0 {
      _p_.status = _Pgcstop
      sched.stopwait--
      if sched.stopwait == 0 {
         notewakeup(&sched.stopnote)
      }
      unlock(&sched.lock)
      return
   }
   if _p_.runSafePointFn != 0 && atomic.Cas(&_p_.runSafePointFn, 1, 0) {
      sched.safePointFn(_p_)
      sched.safePointWait--
      if sched.safePointWait == 0 {
         notewakeup(&sched.safePointNote)
      }
   }
   if sched.runqsize != 0 {
      unlock(&sched.lock)
      startm(_p_, false)
      return
   }
   // If this is the last running P and nobody is polling network,
   // need to wakeup another M to poll network.
   if sched.npidle == uint32(gomaxprocs-1) && atomic.Load64(&sched.lastpoll) != 0 {
      unlock(&sched.lock)
      startm(_p_, false)
      return
   }
   pidleput(_p_)
   unlock(&sched.lock)
}

2.5.6 startm函數

startm函數調度一個M或者必要時創建一個M來運行指定的P。

// Schedules some M to run the p (creates an M if necessary).
// If p==nil, tries to get an idle P, if no idle P's does nothing.
// May run with m.p==nil, so write barriers are not allowed.
// If spinning is set, the caller has incremented nmspinning and startm will
// either decrement nmspinning or set m.spinning in the newly started M.
//go:nowritebarrierrec
func startm(_p_ *p, spinning bool) {
   //加鎖
   lock(&sched.lock)
   if _p_ == nil {
       
      _p_ = pidleget()
      if _p_ == nil {
         unlock(&sched.lock)
         if spinning {
            // The caller incremented nmspinning, but there are no idle Ps,
            // so it's okay to just undo the increment and give up.
            if int32(atomic.Xadd(&sched.nmspinning, -1)) < 0 {
               throw("startm: negative nmspinning")
            }
         }
         return
      }
   }
    
   mp := mget()
   unlock(&sched.lock)
   if mp == nil {
      var fn func()
      if spinning {
         // The caller incremented nmspinning, so set m.spinning in the new M.
         fn = mspinning
      }
      newm(fn, _p_)
      return
   }
    
   if mp.spinning {
      throw("startm: m is spinning")
   }
   if mp.nextp != 0 {
      throw("startm: m has p")
   }
   if spinning && !runqempty(_p_) {
      throw("startm: p has runnable gs")
   }
   // The caller incremented nmspinning, so set m.spinning in the new M.
   mp.spinning = spinning
   mp.nextp.set(_p_)
   notewakeup(&mp.park)
}

2.5.7 sysmon函數

sysmon函數是Go runtime啓動時創建的，負責監控所有goroutine的狀態，判斷是否需要GC，進行netpoll等操作。sysmon函數中會調用retake函數進行搶佔式調度。

// Always runs without a P, so write barriers are not allowed.
//
//go:nowritebarrierrec
func sysmon() {
   lock(&sched.lock)
   sched.nmsys++
   checkdead()
   unlock(&sched.lock)
 
   // If a heap span goes unused for 5 minutes after a garbage collection,
   // we hand it back to the operating system.
   scavengelimit := int64(5 * 60 * 1e9)
 
   if debug.scavenge > 0 {
      // Scavenge-a-lot for testing.
      forcegcperiod = 10 * 1e6
      scavengelimit = 20 * 1e6
   }
 
   lastscavenge := nanotime()
   nscavenge := 0
 
   lasttrace := int64(0)
   idle := 0 // how many cycles in succession we had not wokeup somebody
   delay := uint32(0)
   for {
      if idle == 0 { // start with 20us sleep...
         delay = 20
      } else if idle > 50 { // start doubling the sleep after 1ms...
         delay *= 2
      }
      if delay > 10*1000 { // up to 10ms
         delay = 10 * 1000
      }
      usleep(delay)
      if debug.schedtrace <= 0 && (sched.gcwaiting != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs)) {
         lock(&sched.lock)
         if atomic.Load(&sched.gcwaiting) != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs) {
            atomic.Store(&sched.sysmonwait, 1)
            unlock(&sched.lock)
            // Make wake-up period small enough
            // for the sampling to be correct.
            maxsleep := forcegcperiod / 2
            if scavengelimit < forcegcperiod {
               maxsleep = scavengelimit / 2
            }
            shouldRelax := true
            if osRelaxMinNS > 0 {
               next := timeSleepUntil()
               now := nanotime()
               if next-now < osRelaxMinNS {
                  shouldRelax = false
               }
            }
            if shouldRelax {
               osRelax(true)
            }
            notetsleep(&sched.sysmonnote, maxsleep)
            if shouldRelax {
               osRelax(false)
            }
            lock(&sched.lock)
            atomic.Store(&sched.sysmonwait, 0)
            noteclear(&sched.sysmonnote)
            idle = 0
            delay = 20
         }
         unlock(&sched.lock)
      }
      // trigger libc interceptors if needed
      if *cgo_yield != nil {
         asmcgocall(*cgo_yield, nil)
      }
      // poll network if not polled for more than 10ms
      lastpoll := int64(atomic.Load64(&sched.lastpoll))
      now := nanotime()
      if netpollinited() && lastpoll != 0 && lastpoll+10*1000*1000 < now {
         atomic.Cas64(&sched.lastpoll, uint64(lastpoll), uint64(now))
         gp := netpoll(false) // non-blocking - returns list of goroutines
         if gp != nil {
            // Need to decrement number of idle locked M's
            // (pretending that one more is running) before injectglist.
            // Otherwise it can lead to the following situation:
            // injectglist grabs all P's but before it starts M's to run the P's,
            // another M returns from syscall, finishes running its G,
            // observes that there is no work to do and no other running M's
            // and reports deadlock.
            incidlelocked(-1)
            injectglist(gp)
            incidlelocked(1)
         }
      }
      // retake P's blocked in syscalls
      // and preempt long running G's
      if retake(now) != 0 {
         idle = 0
      } else {
         idle++
      }
      // check if we need to force a GC
      if t := (gcTrigger{kind: gcTriggerTime, now: now}); t.test() && atomic.Load(&forcegc.idle) != 0 {
         lock(&forcegc.lock)
         forcegc.idle = 0
         forcegc.g.schedlink = 0
         injectglist(forcegc.g)
         unlock(&forcegc.lock)
      }
      // scavenge heap once in a while
      if lastscavenge+scavengelimit/2 < now {
         mheap_.scavenge(int32(nscavenge), uint64(now), uint64(scavengelimit))
         lastscavenge = now
         nscavenge++
      }
      if debug.schedtrace > 0 && lasttrace+int64(debug.schedtrace)*1000000 <= now {
         lasttrace = now
         schedtrace(debug.scheddetail > 0)
      }
   }
}

2.5.8 retake函數

枚舉所有的P 如果P在系統調用中(_Psyscall), 且經過了一次sysmon循環(20us~10ms), 則搶佔這個P，調用handoffp解除M和P之間的關聯，如果P在運行中(_Prunning), 且經過了一次sysmon循環並且G運行時間超過forcePreemptNS(10ms), 則搶佔這個P

並設置g.preempt = true，g.stackguard0 = stackPreempt。

爲什麼設置了stackguard就可以實現搶佔?
因爲這個值用於檢查當前棧空間是否足夠, go函數的開頭會比對這個值判斷是否需要擴張棧。
newstack函數判斷g.stackguard0等於stackPreempt, 就知道這是搶佔觸發的, 這時會再檢查一遍是否要搶佔。
搶佔機制保證了不會有一個G長時間的運行導致其他G無法運行的情況發生。

func retake(now int64) uint32 {
   n := 0
   // Prevent allp slice changes. This lock will be completely
   // uncontended unless we're already stopping the world.
   lock(&allpLock)
   // We can't use a range loop over allp because we may
   // temporarily drop the allpLock. Hence, we need to re-fetch
   // allp each time around the loop.
   for i := 0; i < len(allp); i++ {
      _p_ := allp[i]
      if _p_ == nil {
         // This can happen if procresize has grown
         // allp but not yet created new Ps.
         continue
      }
      pd := &_p_.sysmontick
      s := _p_.status
      if s == _Psyscall {
         // Retake P from syscall if it's there for more than 1 sysmon tick (at least 20us).
         t := int64(_p_.syscalltick)
         if int64(pd.syscalltick) != t {
            pd.syscalltick = uint32(t)
            pd.syscallwhen = now
            continue
         }
         // On the one hand we don't want to retake Ps if there is no other work to do,
         // but on the other hand we want to retake them eventually
         // because they can prevent the sysmon thread from deep sleep.
         if runqempty(_p_) && atomic.Load(&sched.nmspinning)+atomic.Load(&sched.npidle) > 0 && pd.syscallwhen+10*1000*1000 > now {
            continue
         }
         // Drop allpLock so we can take sched.lock.
         unlock(&allpLock)
         // Need to decrement number of idle locked M's
         // (pretending that one more is running) before the CAS.
         // Otherwise the M from which we retake can exit the syscall,
         // increment nmidle and report deadlock.
         incidlelocked(-1)
         if atomic.Cas(&_p_.status, s, _Pidle) {
            if trace.enabled {
               traceGoSysBlock(_p_)
               traceProcStop(_p_)
            }
            n++
            _p_.syscalltick++
            handoffp(_p_)
         }
         incidlelocked(1)
         lock(&allpLock)
      } else if s == _Prunning {
         // Preempt G if it's running for too long.
         t := int64(_p_.schedtick)
         if int64(pd.schedtick) != t {
            pd.schedtick = uint32(t)
            pd.schedwhen = now
            continue
         }
         if pd.schedwhen+forcePreemptNS > now {
            continue
         }
         preemptone(_p_)
      }
   }
   unlock(&allpLock)
   return uint32(n)
}

3、調度器總結

3.1 調度器的兩大思想

複用線程：協程本身就是運行在一組線程之上，不需要頻繁的創建、銷燬線程，而是對線程的複用。在調度器中複用線程還有2個體現：1）work stealing，當本線程無可運行的G時，嘗試從其他線程綁定的P偷取G，而不是銷燬線程。2）handoff，當本線程因爲G進行系統調用阻塞時，線程釋放綁定的P，把P轉移給其他空閒的線程執行。
利用並行：GOMAXPROCS設置P的數量，當GOMAXPROCS大於1時，就最多有GOMAXPROCS個線程處於運行狀態，這些線程可能分佈在多個CPU核上同時運行，使得併發利用並行。另外，GOMAXPROCS也限制了併發的程度，比如GOMAXPROCS = 核數/2，則最多利用了一半的CPU核進行並行。

3.2調度器的兩小策略：

搶佔：在coroutine中要等待一個協程主動讓出CPU才執行下一個協程，在Go中，一個goroutine最多佔用CPU 10ms，防止其他goroutine被餓死，這就是goroutine不同於coroutine的一個地方。
全局G隊列：在新的調度器中依然有全局G隊列，但功能已經被弱化了，當M執行work stealing從其他P偷不到G時，它可以從全局G隊列獲取G。

4、參考資料

Golang代碼倉庫：https://github.com/golang/go
《ScalableGo Schedule》：https://docs.google.com/docum...
《GoPreemptive Scheduler》：https://docs.google.com/docum...
網上文章:
https://studygolang.com/artic...
https://studygolang.com/artic...
https://studygolang.com/artic...
https://studygolang.com/artic...
https://studygolang.com/artic... 調度實例分析
https://www.cnblogs.com/sunsk... 搶佔式
https://blog.csdn.net/u010853... schedule 剖析理解分析的很到位--建議大家認真閱讀幾遍-因爲圖形很形象。

【Go源碼分析】Go scheduler 源碼分析

1、進程/線程/協程基本概念

2、Golang簡介

2.1 Goroutine 概念

2.2與其他併發模型的對比

2.3 Goroutine的特點

2.4 結構體

G結構體

Gobuf結構體

M結構體

P結構體

Schedt結構體

2.5具體函數

2.5.1 schedule函數

2.5.2 findrunnable函數

2.5.3 newproc函數

2.5.4 goexit0函數

2.5.5 handoffp函數

2.5.6 startm函數

2.5.7 sysmon函數

2.5.8 retake函數

3、調度器總結

3.1 調度器的兩大思想

3.2調度器的兩小策略：

4、參考資料

【PHP7源碼分析】初探PHP字符串類型中的引用計數

技術團隊的學習方法：每天學習 100 分鐘，成功寫出 2 本書

Qt 信號和槽源碼分析

淺談C/C++鏈接庫

從渲染原理談前端性能優化

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結