golang netpoll封裝詳解

io模型

計算機的io模型區分爲多種,目前用的最多的也就是nio、epoll、select。

結合不同場景使用不同的io模型纔是正解。

具體可以查看我之前寫的io模型演進。io模型演進

golang中網絡io

golang天然適合併發,爲什麼?一個是輕量級的協程,二個是將複雜的io進行了抽象化,簡化了流程。

比如我們簡單的訪問一個http服務,幾行簡單的代碼就能實現:

tr := &recordingTransport{}
client := &Client{Transport: tr}
url := "http://dummy.faketld/"
client.Get(url) // Note: doesn't hit network

那麼golang對Io做了哪些優化呢?能實現如此簡單的切換呢?

groutinue 針對io事件的調度

我們這裏假設你對groutinue調度已經有一定的瞭解了。

我們知道,在go中,每個process綁定一個虛擬的machine,而在machine中,是具有一個g0的,g0在本地遍歷自己的隊列獲取g或者從全局隊列獲取g。

我們也知道了,在g運行的時候,g會把執行權交給g0進行重新調度,那麼在io事件中,g是怎麼把事件交還給g0的呢?這時候就牽扯到我們今天的主角----netpoll。

netpoll

o語言在網絡輪詢器中使用 I/O 多路複用模型處理 I/O 操作,但是他沒有選擇最常見的系統調用 selectselect 也可以提供 I/O 多路複用的能力,但是使用它有比較多的限制:

  • 監聽能力有限 — 最多隻能監聽 1024 個文件描述符,可以通過手動修改limit來改變,但是各方面成本比較大;
  • 內存拷貝開銷大 — 需要維護一個較大的數據結構存儲文件描述符,該結構需要拷貝到內核中;
  • 時間複雜度 — 返回準備就緒的事件個數後,需要遍歷所有的文件描述符;

golang官方統一封裝一個網絡事件的poll,和平臺無關,爲epoll/kqueue/port/AIX/Windows 提供了特定的實現。

  • src/runtime/netpoll_epoll.go
  • src/runtime/netpoll_kqueue.go
  • src/runtime/netpoll_solaris.go
  • src/runtime/netpoll_windows.go
  • src/runtime/netpoll_aix.go
  • src/runtime/netpoll_fake.go

這些模塊在不同平臺上實現了相同的功能,構成了一個常見的樹形結構。編譯器在編譯 Go 語言程序時,會根據目標平臺選擇樹中特定的分支進行編譯

必須實現的方法有:

​ netpollinit 初始化網絡輪詢器,通過 sync.OncenetpollInited 變量保證函數只會調用一次

​ netpollopen 監聽文件描述符上的邊緣觸發事件,創建事件並加入監聽poll_runtime_pollOpen函數,這個函數將用戶態協程的pollDesc信息寫入到epoll所在的單獨線程,從而實現用戶態和內核態的關聯。

​ netpoll 輪詢網絡並返回一組已經準備就緒的 Goroutine,傳入的參數會決定它的行爲:

  - 如果參數小於0,阻塞等待文件就緒
  - 如果參數等於0,非阻塞輪詢
  - 如果參數大於0,阻塞定期輪詢

​ netpollBreak 喚醒網絡輪詢器,例如:計時器向前修改時間時會通過該函數中斷網絡輪詢器

​ netpollIsPollDescriptor 判斷文件描述符是否被輪詢器使用

原文如下:

// Integrated network poller (platform-independent part).
// A particular implementation (epoll/kqueue/port/AIX/Windows)
// must define the following functions:
//
// func netpollinit()
//     Initialize the poller. Only called once.
//
// func netpollopen(fd uintptr, pd *pollDesc) int32
//     Arm edge-triggered notifications for fd. The pd argument is to pass
//     back to netpollready when fd is ready. Return an errno value.
//
// func netpoll(delta int64) gList
//     Poll the network. If delta < 0, block indefinitely. If delta == 0,
//     poll without blocking. If delta > 0, block for up to delta nanoseconds.
//     Return a list of goroutines built by calling netpollready.
//
// func netpollBreak()
//     Wake up the network poller, assumed to be blocked in netpoll.
//
// func netpollIsPollDescriptor(fd uintptr) bool
//     Reports whether fd is a file descriptor used by the poller.

// Error codes returned by runtime_pollReset and runtime_pollWait.
// These must match the values in internal/poll/fd_poll_runtime.go.

netpoll中有2個重要的結構體:

//pollCache  
//pollDesc

type pollDesc struct {
	link *pollDesc // in pollcache, protected by pollcache.lock

	// The lock protects pollOpen, pollSetDeadline, pollUnblock and deadlineimpl operations.
	// This fully covers seq, rt and wt variables. fd is constant throughout the PollDesc lifetime.
	// pollReset, pollWait, pollWaitCanceled and runtime·netpollready (IO readiness notification)
	// proceed w/o taking the lock. So closing, everr, rg, rd, wg and wd are manipulated
	// in a lock-free way by all operations.
	// NOTE(dvyukov): the following code uses uintptr to store *g (rg/wg),
	// that will blow up when GC starts moving objects.
	lock    mutex // protects the following fields
	fd      uintptr
	closing bool
	everr   bool      // marks event scanning error happened
	user    uint32    // user settable cookie
	rseq    uintptr   // protects from stale read timers
	rg      uintptr   // pdReady, pdWait, G waiting for read or nil
	rt      timer     // read deadline timer (set if rt.f != nil)
	rd      int64     // read deadline
	wseq    uintptr   // protects from stale write timers
	wg      uintptr   // pdReady, pdWait, G waiting for write or nil
	wt      timer     // write deadline timer
	wd      int64     // write deadline
	self    *pollDesc // storage for indirect interface. See (*pollDesc).makeArg.
}

type pollCache struct {
	lock  mutex
	first *pollDesc
	// PollDesc objects must be type-stable,
	// because we can get ready notification from epoll/kqueue
	// after the descriptor is closed/reused.
	// Stale notifications are detected using seq variable,
	// seq is incremented when deadlines are changed or descriptor is reused.
}
  • rseqwseq — 表示文件描述符被重用或者計時器被重置;
  • rgwg — 表示二進制的信號量,可能爲 pdReadypdWait、等待文件描述符可讀或者可寫的 Goroutine 以及 nil
  • rdwd — 等待文件描述符可讀或者可寫的截止日期;
  • rtwt — 用於等待文件描述符的計時器;

golang關於io時間做了很多統一的封裝在runtime/netpoll之下(其實調用的是internal/poll包下的):

關於internal有個細節就是這個包是不能被外部調用

func runtime_pollServerInit()  //初始化
func runtime_pollOpen(fd uintptr) (uintptr, int)  //打開
func runtime_pollClose(ctx uintptr)   //關閉
func runtime_pollWait(ctx uintptr, mode int) int //等待
func runtime_pollWaitCanceled(ctx uintptr, mode int) int  //等待並(失敗時)退出
func runtime_pollReset(ctx uintptr, mode int) int  //重置狀態,複用
func runtime_pollSetDeadline(ctx uintptr, d int64, mode int) //設置讀/寫超時時間
func runtime_pollUnblock(ctx uintptr)  // 解鎖 
func runtime_isPollServerDescriptor(fd uintptr) bool  
// 這裏的ctx實際上是一個io fd,不是上下文
// mod 是 r 或者 w  ,io事件畢竟只有有這兩種
// d 意義和time.d差不多,就是關於時間的

但是其實這些包的實現都在runtime下,我們挑幾個重要的看看:

//將就緒好得io事件,寫入就緒的grotion對列
// netpollready is called by the platform-specific netpoll function.
// It declares that the fd associated with pd is ready for I/O.
// The toRun argument is used to build a list of goroutines to return
// from netpoll. The mode argument is 'r', 'w', or 'r'+'w' to indicate
// whether the fd is ready for reading or writing or both.
//
// This may run while the world is stopped, so write barriers are not allowed.
//go:nowritebarrier
func netpollready(toRun *gList, pd *pollDesc, mode int32) {
	var rg, wg *g
	if mode == 'r' || mode == 'r'+'w' {
		rg = netpollunblock(pd, 'r', true)
	}
	if mode == 'w' || mode == 'r'+'w' {
		wg = netpollunblock(pd, 'w', true)
	}
	if rg != nil {
		toRun.push(rg)
	}
	if wg != nil {
		toRun.push(wg)
	}
}
//輪詢時調用的方法,如果io就緒了返回ok,如果沒就緒,返回flase
// returns true if IO is ready, or false if timedout or closed
// waitio - wait only for completed IO, ignore errors
func netpollblock(pd *pollDesc, mode int32, waitio bool) bool {
	gpp := &pd.rg
	if mode == 'w' {
		gpp = &pd.wg
	}

	// set the gpp semaphore to pdWait
	for {
		old := *gpp
		if old == pdReady {
			*gpp = 0
			return true
		}
		if old != 0 {
			throw("runtime: double wait")
		}
		if atomic.Casuintptr(gpp, 0, pdWait) {
			break
		}
	}

	// need to recheck error states after setting gpp to pdWait
	// this is necessary because runtime_pollUnblock/runtime_pollSetDeadline/deadlineimpl
	// do the opposite: store to closing/rd/wd, membarrier, load of rg/wg
	if waitio || netpollcheckerr(pd, mode) == 0 {
	  //gopark是很重要得一個方法,本質上是讓出當前協程執行權,一般是返回到g0讓g0重新調度
		gopark(netpollblockcommit, unsafe.Pointer(gpp), waitReasonIOWait, traceEvGoBlockNet, 5)
	}
	// be careful to not lose concurrent pdReady notification
	old := atomic.Xchguintptr(gpp, 0)
	if old > pdWait {
		throw("runtime: corrupted polldesc")
	}
	return old == pdReady
}

//獲取到當前io所在的協程,如果協程已關閉,直接返回nil
func netpollunblock(pd *pollDesc, mode int32, ioready bool) *g {
	gpp := &pd.rg
	if mode == 'w' {
		gpp = &pd.wg
	}

	for {
		old := *gpp
		if old == pdReady {
			return nil
		}
		if old == 0 && !ioready {
			// Only set pdReady for ioready. runtime_pollWait
			// will check for timeout/cancel before waiting.
			return nil
		}
		var new uintptr
		if ioready {
			new = pdReady
		}
		if atomic.Casuintptr(gpp, old, new) {
			if old == pdWait {
				old = 0
			}
			return (*g)(unsafe.Pointer(old))
		}
	}
}

思考:

  1. a、b兩個協程,b io阻塞,完成後,一直沒有獲取到調度權,會出現什麼後果。
  2. a、b兩個協程,b io阻塞,2s time out,但是a一直佔用執行權,b一直沒有獲取到調度權,5s後才獲得到,b對使用端已經超時,這時候是超時還是不超時

所以設置的timeout,不一定是真實的io waiting,可能是沒有獲取到執行權。

怎麼觸發讀事件的?

因爲寫io是我們主動操作的,那麼讀是怎麼進行操作的呢?這是一個被動的狀態

首先我們瞭解一個結構體。golang中所有的網絡事件和文件讀寫都用fd進行標識(位於internal包下)。

// FD is a file descriptor. The net and os packages use this type as a
// field of a larger type representing a network connection or OS file.
type FD struct {
	// Lock sysfd and serialize access to Read and Write methods.
	fdmu fdMutex

	// System file descriptor. Immutable until Close.
	Sysfd int

	// I/O poller.
	pd pollDesc

	// Writev cache.
	iovecs *[]syscall.Iovec

	// Semaphore signaled when file is closed.
	csema uint32

	// Non-zero if this file has been set to blocking mode.
	isBlocking uint32

	// Whether this is a streaming descriptor, as opposed to a
	// packet-based descriptor like a UDP socket. Immutable.
	IsStream bool

	// Whether a zero byte read indicates EOF. This is false for a
	// message based socket connection.
	ZeroReadIsEOF bool

	// Whether this is a file rather than a network socket.
	isFile bool
}

我們看到,fd中關聯的pollDesc,通過pollDesc調用了runtime包內部的實現的各種平臺的io事件。

當我們進行read操作時(下面是代碼截取)

for {
	n, err := ignoringEINTRIO(syscall.Read, fd.Sysfd, p)
	if err != nil {
		n = 0
		if err == syscall.EAGAIN && fd.pd.pollable() {
			if err = fd.pd.waitRead(fd.isFile); err == nil {
				continue
			}
		}
	}
	err = fd.eofError(n, err)
	return n, err
	}

會阻塞調用waiteRead方法,方法內部主要就是調用的runtime_pollWait。

func poll_runtime_pollWait(pd *pollDesc, mode int) int {
	errcode := netpollcheckerr(pd, int32(mode))
	if errcode != pollNoError {
		return errcode
	}
	// As for now only Solaris, illumos, and AIX use level-triggered IO.
	if GOOS == "solaris" || GOOS == "illumos" || GOOS == "aix" {
		netpollarm(pd, mode)
	}
	for !netpollblock(pd, int32(mode), false) {
		errcode = netpollcheckerr(pd, int32(mode))
		if errcode != pollNoError {
			return errcode
		}
		// Can happen if timeout has fired and unblocked us,
		// but before we had a chance to run, timeout has been reset.
		// Pretend it has not happened and retry.
	}
	return pollNoError
}

這裏主要是由netpollblock控制,netpollblock方法我們上面就說過,當io還未就緒的時候,直接釋放當前的執行權,否則就是已經課讀寫的io事件,直接進行讀取操作即可。

總結

整體流程 listenStream –> bind&listen&init –> pollDesc.Init -> poll_runtime_pollOpen –> runtime.netpollopen -> epollctl(EPOLL_CTL_ADD)

pollDesc是由pollCache進行維護的,並且不受GC監控

// Must be in non-GC memory because can be referenced
// only from epoll/kqueue internals.
mem := persistentalloc(n*pdSize, 0, &memstats.other_sys)
for i := uintptr(0); i < n; i++ {
	pd := (*pollDesc)(add(mem, i*pdSize))
	pd.link = c.first
	c.first = pd
}

golang中遇到io事件時,統一對其做了封裝,首先建立系統事件(本文主要針對epoll),然後讓出cpu(gopark),然後進行協程調度執行其他g,當調度回被io阻塞的g時,從epoll進行交互看是否就緒(epoll就緒列表),就緒則往下執行,未就緒則繼續讓出。

備註

epoll是由系統內核單獨維護的一個線程,不由go本身維護

FD_CLOEXEC用來設置文件的close-on-exec狀態標準。 這,emm 就挺難理解得。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章