golang源碼分析-調度概述

golang源碼分析-調度過程概述

本文主要概述一下golang的調度器的大概工作的流程，衆所周知golang是基於用戶態的協程的調度來完成多任務的執行。在Linux操作系統中，以往的多線程執行都是通過操作系統陷入內核來創建線程並提供給操作系統進行調度，在操作系統中的線程調度可以充分利用操作系統提供的各種資源，當線程執行到阻塞或者等待操作時，操作系統會休眠對應線程直到阻塞的事情來喚醒該線程繼續執行，但是在通過操作系統創建的線程無論是阻塞還是調度都需要陷入內核，從而導致線程在這些過程中的開銷較大。golang中的協程更多的是在用戶態進行調度不需要陷入內核，但是同時這也限制了golang的調度策略並不能使用操作系統提供的阻塞喚醒或者搶佔式調度的機制，本文主要就是探討一下golang在用戶態是如何進行調度執行。

golang的運行模型

golang主要根據CSP模型，通過通信進行數據交互，並且由於是實現的用戶態的協程調度，但是本質上還是對應與操作系統的線程去詳細執行對應的具體內容，故在golang中就設置了三種不同的模型分別爲M，P和G。

Machine(M)操作系統線程

Machine即對應於真正的操作系統創建的線程，這個線程的創建調度與運行都是受操作系統所控制，如果golang執行的是一個阻塞操作，那麼該線程還是會阻塞，知道阻塞完成之後被操作系統喚醒並繼續執行。

Processor§

Processor就是虛擬的提供給g執行的上下文環境，該環境包括一個本地的g的隊列，本地內存的對象等操作資源，只有M在綁定了P之後才能執行對應的G。

Groutine(G)

Groutine就是golang中對應的用戶態的協程的具體內容，默認的用戶態棧的大小是2KB，包括這執行任務的上下文的環境，在切換過程中保存執行的環境，調度器就是調度G到可執行的P中從而完成高效的併發調度操作。

三者整體的運行狀態如圖所示；

golang可能的一個運行狀態圖如上所示，從運行過程也可看出，G的調度過程都是在用戶態進行的，接下來就分析一下調度的場景

golang的調度場景

在golang的初始化過程中，首先第一個M0就是初始化完成的M0，該M0就會在初始化完成之後調度執行對應的G，在golang的啓動過程中可知，golang中的main函數其實也是對應的一個G來調度執行，如果在golang程序中啓動協程來執行，並根據協程的執行情況或者現有的內核線程的工作情況來決定是否重新開啓一個內核線程。

內核線程的啓動過程

在擁有大量的G未執行的時候，或者是有的內核線程在執行系統調用阻塞的情況下，或者有些G長時間運行的情況，會根據情況來開啓一個新的內核線程來執行可執行的G，從而確保G能夠快速被執行。

在golang的啓動過程中，會啓動一個sysmon內核線程，該線程不知道具體的G內容，而是用來監控一些非阻塞的事件是否完成，監控各個正在被執行的G的運行時間，並從事搶佔性調度的標誌位的設置。

func newm(fn func(), _p_ *p) {    // 生成內核工作線程
	mp := allocm(_p_, fn)           // 申請對應的內存設置新的棧信息
	mp.nextp.set(_p_) 
	mp.sigmask = initSigmask
	if gp := getg(); gp != nil && gp.m != nil && (gp.m.lockedExt != 0 || gp.m.incgo) && GOOS != "plan9" {
		// We're on a locked M or a thread that may have been
		// started by C. The kernel state of this thread may
		// be strange (the user may have locked it for that
		// purpose). We don't want to clone that into another
		// thread. Instead, ask a known-good thread to create
		// the thread for us.
		//
		// This is disabled on Plan 9. See golang.org/issue/22227.
		//
		// TODO: This may be unnecessary on Windows, which
		// doesn't model thread creation off fork.
		lock(&newmHandoff.lock)
		if newmHandoff.haveTemplateThread == 0 {
			throw("on a locked thread with no template thread")
		}
		mp.schedlink = newmHandoff.newm
		newmHandoff.newm.set(mp)
		if newmHandoff.waiting {
			newmHandoff.waiting = false
			notewakeup(&newmHandoff.wake)
		}
		unlock(&newmHandoff.lock)
		return
	}
	newm1(mp)      	// 生成該工作線程
}

func newm1(mp *m) {
	if iscgo {
		var ts cgothreadstart
		if _cgo_thread_start == nil {
			throw("_cgo_thread_start missing")
		}
		ts.g.set(mp.g0)
		ts.tls = (*uint64)(unsafe.Pointer(&mp.tls[0]))
		ts.fn = unsafe.Pointer(funcPC(mstart))
		if msanenabled {
			msanwrite(unsafe.Pointer(&ts), unsafe.Sizeof(ts))
		}
		execLock.rlock() // Prevent process clone.
		asmcgocall(_cgo_thread_start, unsafe.Pointer(&ts))
		execLock.runlock()
		return
	}
	execLock.rlock() // Prevent process clone.
	newosproc(mp)    	   // 系統調用線程  Linux主要是clone系統調用
	execLock.runlock()
}

func newosproc(mp *m) {
	stk := unsafe.Pointer(mp.g0.stack.hi)         // 設置棧
	/*
	 * note: strace gets confused if we use CLONE_PTRACE here.
	 */
	if false {
		print("newosproc stk=", stk, " m=", mp, " g=", mp.g0, " clone=", funcPC(clone), " id=", mp.id, " ostk=", &mp, "\n")
	}

	// Disable signals during clone, so that the new thread starts
	// with signals disabled. It will enable them in minit.
	var oset sigset
	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
	ret := clone(cloneFlags, stk, unsafe.Pointer(mp), unsafe.Pointer(mp.g0), unsafe.Pointer(funcPC(mstart)))               	// 系統調用生成線程並設置g0堆棧開始執行mstart函數，從而重新開啓一個線程執行
	sigprocmask(_SIG_SETMASK, &oset, nil)

	if ret < 0 {
		print("runtime: failed to create new OS thread (have ", mcount(), " already; errno=", -ret, ")\n")
		if ret == -_EAGAIN {
			println("runtime: may need to increase max user processes (ulimit -u)")
		}
		throw("newosproc")
	}
}

從流程可知，生成一個工作線程主要通過系統調用生成一個，生成完成之後再重新從mstart函數開始執行任務，重新開始去調度執行G。新增工作內核線程可能會在系統調用的過程中觸發檢查也可能在監控線程中通過retake函數觸發。

schedule調度過程

// One round of scheduler: find a runnable goroutine and execute it.
// Never returns.
func schedule() {
	_g_ := getg()                            

	if _g_.m.locks != 0 {
		throw("schedule: holding locks")
	}

	if _g_.m.lockedg != 0 {
		stoplockedm()
		execute(_g_.m.lockedg.ptr(), false) // Never returns.
	}

	// We should not schedule away from a g that is executing a cgo call,
	// since the cgo call is using the m's g0 stack.
	if _g_.m.incgo {
		throw("schedule: in cgo")
	}

top:
	if sched.gcwaiting != 0 {
		gcstopm()
		goto top
	}
	if _g_.m.p.ptr().runSafePointFn != 0 {
		runSafePointFn()
	}

	var gp *g
	var inheritTime bool
	if trace.enabled || trace.shutdown {
		gp = traceReader()
		if gp != nil {
			casgstatus(gp, _Gwaiting, _Grunnable)
			traceGoUnpark(gp, 0)
		}
	}
	if gp == nil && gcBlackenEnabled != 0 {
		gp = gcController.findRunnableGCWorker(_g_.m.p.ptr())   // 進行GC模式
	}
	if gp == nil {                                  
		// Check the global runnable queue once in a while to ensure fairness.
		// Otherwise two goroutines can completely occupy the local runqueue
		// by constantly respawning each other.
		if _g_.m.p.ptr().schedtick%61 == 0 && sched.runqsize > 0 {    // 爲了公平每隔61個檢查一下全局列表中是否有可執行的G如果有則執行
			lock(&sched.lock)
			gp = globrunqget(_g_.m.p.ptr(), 1)             // 從全局列表中獲取一個G
			unlock(&sched.lock)
		}
	}
	if gp == nil {  																	// 如果全局沒有獲取到或者沒從全局獲取
		gp, inheritTime = runqget(_g_.m.p.ptr())        // 從本地的p的隊列中獲取G
		if gp != nil && _g_.m.spinning {
			throw("schedule: spinning with local work")    // 檢查是否是自選
		}
	}
	if gp == nil {
		gp, inheritTime = findrunnable() // blocks until work is available   從其他地方獲取G如果獲取不到則阻塞在這裏直到找到
	}

	// This thread is going to run a goroutine and is not spinning anymore,
	// so if it was marked as spinning we need to reset it now and potentially
	// start a new spinning M.
	if _g_.m.spinning {
		resetspinning()
	}

	if sched.disable.user && !schedEnabled(gp) {
		// Scheduling of this goroutine is disabled. Put it on
		// the list of pending runnable goroutines for when we
		// re-enable user scheduling and look again.
		lock(&sched.lock)
		if schedEnabled(gp) {
			// Something re-enabled scheduling while we
			// were acquiring the lock.
			unlock(&sched.lock)
		} else {
			sched.disable.runnable.pushBack(gp)
			sched.disable.n++
			unlock(&sched.lock)
			goto top
		}
	}

	if gp.lockedm != 0 {
		// Hands off own p to the locked m,
		// then blocks waiting for a new p.
		startlockedm(gp)
		goto top
	}

	execute(gp, inheritTime)      // 找到之後就執行該G
}

調度函數主要執行的流程就是；

如果隔了61次調度，則本次去全局G列表中去查找一個可執行的G；
如果不是61次或者61次去查找全局G列表的時候未能找到，則獲取本地P中的G列表中的G；
如果本地都還沒有找到則通過findrunnable函數去查找，該函數會分別從全局、poll列表中或者其他的P中去嘗試獲取可運行的G，如果還沒有找到則進入休眠。

G的執行過程

如果在上一步找到了可執行的G，則此時就會執行execute(gp, inheritTime)函數，執行該任務。

G的任務正常執行流程

func execute(gp *g, inheritTime bool) {
	_g_ := getg()

	casgstatus(gp, _Grunnable, _Grunning)         // 設置該G位運行可調用可運行狀態
	gp.waitsince = 0
	gp.preempt = false                            // 是否搶佔式調度標誌位
	gp.stackguard0 = gp.stack.lo + _StackGuard     // 設置堆棧
	if !inheritTime {
		_g_.m.p.ptr().schedtick++
	}
	_g_.m.curg = gp
	gp.m = _g_.m

	// Check whether the profiler needs to be turned on or off.
	hz := sched.profilehz
	if _g_.m.profilehz != hz {
		setThreadCPUProfiler(hz)
	}

	if trace.enabled {
		// GoSysExit has to happen when we have a P, but before GoStart.
		// So we emit it here.
		if gp.syscallsp != 0 && gp.sysblocktraced {
			traceGoSysExit(gp.sysexitticks)
		}
		traceGoStart()
	}

	gogo(&gp.sched)             // 執行G對應的內容
}

主要就是進行了檢查和設置標誌位之後，再就調用gogo執行；

TEXT runtime·gogo(SB), NOSPLIT, $16-8
	MOVQ	buf+0(FP), BX		// gobuf
	MOVQ	gobuf_g(BX), DX
	MOVQ	0(DX), CX		// make sure g != nil
	get_tls(CX)
	MOVQ	DX, g(CX)
	MOVQ	gobuf_sp(BX), SP	// restore SP    將gobuf中保存的現場內容回覆
	MOVQ	gobuf_ret(BX), AX
	MOVQ	gobuf_ctxt(BX), DX
	MOVQ	gobuf_bp(BX), BP
	MOVQ	$0, gobuf_sp(BX)	// clear to help garbage collector
	MOVQ	$0, gobuf_ret(BX)
	MOVQ	$0, gobuf_ctxt(BX)
	MOVQ	$0, gobuf_bp(BX)
	MOVQ	gobuf_pc(BX), BX       // 將要執行的地址放入BX
	JMP	BX                       // 跳轉執行該處代碼

此時我們回到newproc1函數中創建G的過程中的時候，在G執行完成之後的執行地址設置成了goexit函數處。

	newg.sched.pc = funcPC(goexit) + sys.PCQuantum // +PCQuantum so that previous instruction is in same function

此時查看goexit函數的執行過程；

// The top-most function running on a goroutine
// returns to goexit+PCQuantum.
TEXT runtime·goexit(SB),NOSPLIT,$0-0
	BYTE	$0x90	// NOP
	CALL	runtime·goexit1(SB)	// does not return    調用goexit1
	// traceback from goexit1 must hit code range of goexit
	BYTE	$0x90	// NOP

func goexit1() {
	if raceenabled {
		racegoend()
	}
	if trace.enabled {
		traceGoEnd()
	}
	mcall(goexit0)  // 切換到g0釋放該執行完成的g
}

TEXT runtime·mcall(SB), NOSPLIT, $0-8
	MOVQ	fn+0(FP), DI

	get_tls(CX)
	MOVQ	g(CX), AX	// save state in g->sched
	MOVQ	0(SP), BX	// caller's PC
	MOVQ	BX, (g_sched+gobuf_pc)(AX)
	LEAQ	fn+0(FP), BX	// caller's SP
	MOVQ	BX, (g_sched+gobuf_sp)(AX)
	MOVQ	AX, (g_sched+gobuf_g)(AX)
	MOVQ	BP, (g_sched+gobuf_bp)(AX)

	// switch to m->g0 & its stack, call fn    切換棧
	MOVQ	g(CX), BX
	MOVQ	g_m(BX), BX
	MOVQ	m_g0(BX), SI
	CMPQ	SI, AX	// if g == m->g0 call badmcall
	JNE	3(PC)
	MOVQ	$runtime·badmcall(SB), AX
	JMP	AX
	MOVQ	SI, g(CX)	// g = m->g0
	MOVQ	(g_sched+gobuf_sp)(SI), SP	// sp = m->g0->sched.sp   調用g0的sched.sp的內容
	PUSHQ	AX
	MOVQ	DI, DX
	MOVQ	0(DI), DI 
	CALL	DI                                    // 執行該函數
	POPQ	AX
	MOVQ	$runtime·badmcall2(SB), AX
	JMP	AX
	RET


// goexit continuation on g0.
func goexit0(gp *g) {
	_g_ := getg()

	casgstatus(gp, _Grunning, _Gdead)        // 設置狀態爲執行完成
	if isSystemGoroutine(gp, false) {
		atomic.Xadd(&sched.ngsys, -1)
	}
	gp.m = nil                              // 設置m爲空
	locked := gp.lockedm != 0               // 值重新置空
	gp.lockedm = 0
	_g_.m.lockedg = 0
	gp.paniconfault = false
	gp._defer = nil // should be true already but just in case.
	gp._panic = nil // non-nil for Goexit during panic. points at stack-allocated data.
	gp.writebuf = nil
	gp.waitreason = 0
	gp.param = nil
	gp.labels = nil
	gp.timer = nil

	if gcBlackenEnabled != 0 && gp.gcAssistBytes > 0 {
		// Flush assist credit to the global pool. This gives
		// better information to pacing if the application is
		// rapidly creating an exiting goroutines.
		scanCredit := int64(gcController.assistWorkPerByte * float64(gp.gcAssistBytes))
		atomic.Xaddint64(&gcController.bgScanCredit, scanCredit)
		gp.gcAssistBytes = 0
	}

	// Note that gp's stack scan is now "valid" because it has no
	// stack.
	gp.gcscanvalid = true
	dropg()                       // 將該G與M的關係

	if GOARCH == "wasm" { // no threads yet on wasm
		gfput(_g_.m.p.ptr(), gp)
		schedule() // never returns
	}

	if _g_.m.lockedInt != 0 {
		print("invalid m->lockedInt = ", _g_.m.lockedInt, "\n")
		throw("internal lockOSThread error")
	}
	gfput(_g_.m.p.ptr(), gp)      // 放入到空餘列表中
	if locked {
		// The goroutine may have locked this thread because
		// it put it in an unusual kernel state. Kill it
		// rather than returning it to the thread pool.

		// Return to mstart, which will release the P and exit
		// the thread.
		if GOOS != "plan9" { // See golang.org/issue/22227.
			gogo(&_g_.m.g0.sched)
		} else {
			// Clear lockedExt on plan9 since we may end up re-using
			// this thread.
			_g_.m.lockedExt = 0
		}
	}
	schedule() 									// 重新調度
}

至此一個正常的G的一個執行過程就完成了。函數的調用鏈路如下；

總結

本文只是簡單的概述了一下golang中的一些基本場景，然後分析了一下G的調度執行過程，其中有大量的細節還未涉及，只是簡單的把正常的G的創建過程和執行流程梳理了一下，具體的調度策略和實現還需要進一步學習與瞭解。由於本人才疏學淺，如有錯誤請批評指正。

golang源碼分析-調度概述

golang源碼分析-調度過程概述

golang的運行模型

Machine(M)操作系統線程

Processor§

Groutine(G)

golang的調度場景

內核線程的啓動過程

schedule調度過程

G的執行過程

G的任務正常執行流程

總結

.NET開源強大、易於使用的緩存框架 - FusionCache

面試，有時候是個運氣活

Redis的rdb格式學習

遍歷百萬級Redis的鍵值的大結局

租約-代碼實踐

golang源碼分析：調度器chan調度

兩階段提交實際項目V1

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結