前言

本文先介紹了goroutine的原理,懂了原理之後,goroutine也就沒有那麼的神祕了.接下來介紹了goroutine的使用.最後介紹了一些goalng併發常用的模式.

goroutine實現基礎

Go直接在語言裏內置了對併發的支持

Go的runtime scheduler管理了所有的所有需要處理器時間的goroutines,scheduler綁定 logical processors到操作系統上的線程,以來執行goroutines.scheduler掌控每一個goroutines運行在哪個logical processors.

一個操作系統的進程包含的資源

線程是程序運行的最小粒度(操作系統層面),每個進程都至少有一個線程(main線程).

The operating system schedules threads to run against processors regardless of the process they belong to. 不同的操作系統用來調度的算法是在變的,對程序員是透明的.

線程在物理的核上被調度,goroutines是在邏輯的處理器核(logical processors)上被調度.每一個邏輯的核都獨立的綁定了一個操作系統的線程. 在1.5版本,默認是爲沒一個物理的processor分配一個邏輯的processor.在這之前是隻有一個邏輯的processor.這些邏輯的processors被用來執行所有的goroutines.即使只有一個邏輯的processor,也可以高效的運行成千上萬的goroutines.

goroutines被創建然後被放到調度器的全局run queue中,隨後他們被分配了一個邏輯的processor,然後被放置到這個processor一個本地的run queue.在那裏,一個goroutine等待着唄邏輯processor處理.

圖1中G4正在運行,佔用Processor時間,G5~7等待在隊列中.重點是每當goroutine進行一個阻塞的syscall,調度器就把這個線程(M2)和這個goroutine(G4) 和邏輯processor 分離(detach),然後創建一個新的線程(M3)去服務這個processor,再從local run queue中選擇另一個goroutine去運行.而老的線程(M2)就等待這個阻塞的syscall返回.一旦syscall返回,這個goroutine就會被重新放回local run queue,這個線程會被留着以後再使用.

如果goroutine需要進行網絡IO調用,這個過程會有一點不一樣.在這種情況下,這個goroutine會和邏輯processor分離,然後移到運行時集成的network poller(runtime integrated network poller).一個這個poller指示一個可讀/寫操作以及就緒,這個goroutine就被分配回這個邏輯的processor去處理這個操作.

沒有嚴格的內置的logical processor數量限制,但是go的runtime默認限制了每個program最多使用10,000個線程,可以通過SetMaxThreads修改.

當有許多邏輯的processors的時候,go的調度器會平均在processors裏分配goroutines.這會導致goroutine運行在不同的線程上(好像要加鎖,導致多個邏輯processors的性能會有下降).注意,這和你物理核數有多少是無關的.

GOMAXPROCS可以修改調度器使用logical processors的數量.

sync包中的一些API可以停止,rescheduled goroutine,原理是一些scheduler中的一些算法.scheduler也可以防止單個的goroutine過度的持有processor(但是要是一個goroutine做的cpu密集型的工作,並且沒有channel同步,他會耗死他所在邏輯processor,並且餓死其他在這個processor上的goroutine)

下圖展示了Concurrency和Parallelism的區別

goroutine使用

go塊

go的用法很簡單,如下. 如果沒有最外面的括號{}(),會顯示go塊必須是一個函數調用.沒有()只是一個函數的聲明,有了()是一個調用(沒有參數的)

go func() {
  for _,n := range nums {
    out <- n
  }
  close(out)
}()

channel

channel默認上是阻塞的，也就是說，如果Channel滿了，就阻塞寫，如果Channel空了，就阻塞讀。於是，我們就可以使用這種特性來同步我們的發送和接收端。

channel <-,發送一個新的值到通道中 <-channel,從通道中接收一個值,這個更像有兩層含義,一個是會返回一個結果,當做賦值來用:msg := <-channel;另外一個含義是等待這個channel發送消息,所以還有一個等的含義在.所以如果你直接寫fmt.Print(<-channel)本意只是想輸出下這個chan傳來的值,但是其實他還會阻塞住等着channel來發.

默認發送和接收操作是阻塞的，直到發送方和接收方都準備完畢。

func main() {
    messages := make(chan string)
    go func() { messages <- "ping" }()
    msg := <-messages
    fmt.Println(msg)
}

所以你要是這麼寫:是一輩子都不會執行到print的(會死鎖)

func main() {
    messages := make(chan string)
    messages <- "ping"
    msg := <-messages
    fmt.Println(msg)
}

所以在一個go程中,發送messages <- "msg"channel的時候,要格外小心,不然一不留神就死鎖了.(解決方法:1. 用帶緩存的chan; 2. 使用帶有default的select發送)

select {
case messages <- "msg":
    fmt.Println("sent message")
default:
    fmt.Println("no message sent")
}

range

用於channel的range是阻塞的.下面程序會顯示deadloc,去掉註釋就好了.

queue := make(chan string, 2)
//queue <- "one"
//queue <- "two"
//close(queue)
for elem := range queue {
  fmt.Println(elem)
}

通道緩衝

加了緩存之後,就像你向channel發送消息的時候(message <- "ping"),"ping"就已經發送出去了(到緩存).就像一個異步的隊列?到時候,<-message直接從緩存中取值就好了(異步...)

但是你要這麼寫,利用通道緩衝,就可以.無緩衝的意味着只有在對應的接收(<-chan)通道準備好接收時,才允許發送(chan <-),可緩存通道允許在沒有對應接收方的情況下，緩存限定數量的值。

func main() {
  message := make(chan string,1)
  message <- "ping"
  msg := <-message
  fmt.Print(msg)
}

要是多發一個messages <- "channel",fatal error: all goroutines are asleep - deadlock!,要是多接受一個fmt.Println(<-messages),會打印出buffered channel,然後報同樣的error

func main() {
    messages := make(chan string, 2)
    messages <- "buffered"
    messages <- "channel"
    fmt.Println(<-messages)
    fmt.Println(<-messages)
}

通道同步

使用通道同步,如果你把 <- done 這行代碼從程序中移除，程序甚至會在 worker還沒開始運行時就結束了。

func worker(done chan bool) {
    fmt.Print("working...")
    time.Sleep(time.Second) // working
    fmt.Println("done")
    done <- true
}
func main() {
    done := make(chan bool, 1)
    go worker(done)
    <-done //blocking 阻塞在這裏,知道worker執行完畢
}

發送方向

可以指定這個通道是不是隻用來發送或者接收值。這個特性提升了程序的類型安全性。pong 函數允許通道（pings）來接收數據，另一通道（pongs）來發送數據。

func ping(pings chan<- string, msg string) {
    pings <- msg
}

func pong(pings <-chan string, pongs chan<- string) {
    msg := <-pings
    pongs <- msg
}

func main() {
    pings := make(chan string, 1)
    pongs := make(chan string, 1)
    ping(pings, "passed message")
    pong(pings, pongs)
    fmt.Println(<-pongs)
}

select

Go 的select 讓你可以同時等待多個通道操作。(poll/epoll?) 注意select 要麼寫個死循環用超時,要不就定好次數.或者加上default讓select變成非阻塞的

go func() {
    time.Sleep(time.Second * 1)
    c1 <- "one"
}()

go func() {
    time.Sleep(time.Second * 2)
    c2 <- "two"
}()

for i := 0; i < 2; i++ {
    select {
    case msg1 := <-c1:
        fmt.Println("received", msg1)
    case msg2 := <-c2:
        fmt.Println("received", msg2)
    }
}

超時處理

其中time.After返回<-chan Time,直接向select發送消息

select {
case res := <-c1:
    fmt.Println(res)
case <-time.After(time.Second * 1):
    fmt.Println("timeout 1")
}

非阻塞通道操作

default,當監聽的channel都沒有準備好的時候，默認執行的.

select {
case msg := <-messages:
    fmt.Println("received message", msg)
default:
    fmt.Println("no message received")
}

可以使用 select 語句來檢測 chan 是否已經滿了

ch := make (chan int, 1)
ch <- 1
select {
case ch <- 2:
default:
    fmt.Println("channel is full !")
}

通道關閉

一個非空的通道也是可以關閉的，但是通道中剩下的值仍然可以被接收到

queue := make(chan string, 2)
queue <- "one"
queue <- "two"
close(queue)
for elem := range queue {
    fmt.Println(elem)
}

定時器

在未來某一刻執行一次時使用的

定時器表示在未來某一時刻的獨立事件。你告訴定時器需要等待的時間，然後它將提供一個用於通知的通道。可以顯示的關閉

timer1 := time.NewTimer(time.Second * 2)
<-timer1.C

<-timer1.C 直到這個定時器的通道 C 明確的發送了定時器失效的值(2s)之前，將一直阻塞。如果你只是要單純的等待用time.Sleep,定時器是可以在它失效之前把它給取消的stop2 := timer2.Stop()

打點器

當你想要在固定的時間間隔重複執行,定時的執行，直到我們將它停止

func main() {
    //打點器和定時器的機制有點相似：一個通道用來發送數據。這裏我們在這個通道上使用內置的 range 來迭代值每隔500ms 發送一次的值。
    ticker := time.NewTicker(time.Millisecond * 500)
    go func() {
        for t := range ticker.C {
            fmt.Println("Tick at", t)
        }
    }()
    
    //打點器可以和定時器一樣被停止。一旦一個打點停止了，將不能再從它的通道中接收到值。我們將在運行後 1600ms停止這個打點器。
    time.Sleep(time.Millisecond * 1600)
    ticker.Stop()
    fmt.Println("Ticker stopped")
}

生成器

類似於提供了一個服務,不過只是適用於調用不是很頻繁

func rand_generator_2() chan int {
    out := make(chan int)
    go func() {
        for {
            out <- rand.Int()
        }
    }()
    return out
}
 
func main() {
    // 生成隨機數作爲一個服務
    rand_service_handler := rand_generator_2()
    fmt.Printf("%dn", <-rand_service_handler)
}

多路複用

Apache使用處理每個連接都需要一個進程，所以其併發性能不是很好。而Nighx使用多路複用的技術，讓一個進程處理多個連接，所以併發性能比較好。

多路複用技術可以用來整合多個通道。提升性能和操作的便捷。

其實就是整合了多個上面的生成器

func rand_generator_3() chan int {
    rand_generator_1 := rand_generator_2()
    rand_generator_2 := rand_generator_2()
    out := make(chan int)

    go func() {
        for {
            //讀取生成器1中的數據，整合
            out <- <-rand_generator_1
        }
    }()
    go func() {
        for {
            //讀取生成器2中的數據，整合
            out <- <-rand_generator_2
        }
    }()
    return out
}

Furture技術

可以在不準備好參數的情況下調用函數。函數調用和函數參數準備這兩個過程可以完全解耦。可以在調用的時候不關心數據是否準備好，返回值是否計算好的問題。讓程序中的組件在準備好數據的時候自動跑起來。這個最後取得<-q.result也是可以放到execQuery上面的把

Furture技術可以和各個其他技術組合起來用。可以通過多路複用技術，監聽多個結果Channel，當有結果後，自動返回。也可以和生成器組合使用，生成器不斷生產數據，Furture技術逐個處理數據。Furture技術自身還可以首尾相連，形成一個併發的pipe filter。這個pipe filter可以用於讀寫數據流，操作數據流。

type query struct {
    sql chan string
    result chan string
}
 
func execQuery(q query) {
    go func() {
        sql := <-q.sql
        q.result <- "get " + sql
    }()
 
}
 
func main() {
    q := query{make(chan string, 1), make(chan string, 1)}
    execQuery(q)
 
    //準備參數
    q.sql <- "select * from table"
    fmt.Println(<-q.result)
}

Chain Filter技術

程序創建了10個Filter，每個分別過濾一個素數，所以可以輸出前10個素數。

func Generate(ch chan<- int) {
    for i := 2; ; i++ {
        ch <- i 
    }
}
 
func Filter(in <-chan int, out chan<- int, prime int) {
    for {
        i := <-in // Receive value from 'in'.
        if i%prime != 0 {
            out <- i // Send 'i' to 'out'.
        }
    }
}
 
// The prime sieve: Daisy-chain Filter processes.
func main() {
    ch := make(chan int) // Create a new channel.
    go Generate(ch)      // Launch Generate goroutine.
    for i := 0; i < 10; i++ {
        prime := <-ch
        print(prime, "n")
        ch1 := make(chan int)
        go Filter(ch, ch1, prime)
        ch = ch1
    }
}

共享變量

有些時候使用共享變量可以讓代碼更加簡潔

type sharded_var struct {
    reader chan int
    writer chan int
}
 
func sharded_var_whachdog(v sharded_var) {//共享變量維護協程
    go func() {
        var value int = 0
        for { //監聽讀寫通道，完成服務
            select {
            case value = <-v.writer:
            case v.reader <- value:
            }
        }
    }()
}
 
func main() {
    v := sharded_var{make(chan int), make(chan int)} //初始化，並開始維護協程
    sharded_var_whachdog(v)
 
    fmt.Println(<-v.reader)
    v.writer <- 1
    fmt.Println(<-v.reader)
}

Concurrency patterns

下面介紹了一些常用的併發模式.

Runner

當你的程序會運行在後臺,可以是cron job或者是Iron.io這樣的worker-based雲環境.這個程序就可以監控和中斷你的程序,如果你的程序運行的太久了.

定義了三個channel來通知任務狀態.

interrupt:接收系統的終止信號(比如ctrl-c),接收到之後系統就優雅的退出
complete:指示任務完成狀態或者返回錯誤
timeout:當超時了之後,系統就優雅的退出

tasks是一個函數類型的slice,你可以往裏面存放簽名爲func funcName(id int){}的函數,作爲你的任務.task(id)就是在執行任務了(當然只是用來模擬任務,可以定義一個任務接口來存放任務,此處是爲了簡便). 注意tasks裏面的任務是串行執行的,這些任務的執行發生在一個單獨的goroutine中.

New方法裏的interrupt channel buffer設置爲1,也就是說當用戶重複ctrl+c的時候,程序也只會收到一個信號,其他的信號會被丟棄.

在run()方法中,在開始執行任務前(task(id)),會前檢查執行流程有沒有被中斷(if r.gotInterrupt() {}),這裏用了一個帶default語句的select.一旦收到中斷的事件,程序就不再接受任何其他事件了(signal.Stop(r.interrupt)).

在Start()方法中,在go塊中執行run()方法,任何當前的goroutine會阻塞在select這邊,直到收到run()返回的complete channel或者超時返回.

// Runner runs a set of tasks within a given timeout and can be shut down on an operating system interrupt.
type Runner struct {
	// interrupt channel reports a signal from the operating system.
	interrupt chan os.Signal

	// complete channel reports that processing is done.
	complete chan error

	// timeout reports that time has run out.
	timeout <-chan time.Time

	// tasks holds a set of functions that are executed
	// synchronously in index order.
	tasks []func(int)
}

// ErrTimeout is returned when a value is received on the timeout channel.
var ErrTimeout = errors.New("received timeout")

// ErrInterrupt is returned when an event from the OS is received.
var ErrInterrupt = errors.New("received interrupt")

// New returns a new ready-to-use Runner.
func New(d time.Duration) *Runner {
	return &Runner{
		interrupt: make(chan os.Signal, 1),
		complete:  make(chan error),
		timeout:   time.After(d),
	}
}

// Add attaches tasks to the Runner. A task is a function that takes an int ID. ...表示可以傳入多個參數
func (r *Runner) Add(tasks ...func(int)) { 
	r.tasks = append(r.tasks, tasks...)
}

// Start runs all tasks and monitors channel events.
func (r *Runner) Start() error {
	// We want to receive all interrupt based signals.
	signal.Notify(r.interrupt, os.Interrupt)

	// Run the different tasks on a different goroutine.
	go func() {
		r.complete <- r.run()
	}()

	select {
	// Signaled when processing is done.
	case err := <-r.complete:
		return err

	// Signaled when we run out of time.
	case <-r.timeout:
		return ErrTimeout
	}
}

// run executes each registered task.
func (r *Runner) run() error {
	for id, task := range r.tasks {
		// Check for an interrupt signal from the OS.
		if r.gotInterrupt() {
			return ErrInterrupt
		}

		// Execute the registered task.
		task(id)
	}

	return nil
}

// gotInterrupt verifies if the interrupt signal has been issued.
func (r *Runner) gotInterrupt() bool {
	select {
	// Signaled when an interrupt event is sent.
	case <-r.interrupt:
		// Stop receiving any further signals.
		signal.Stop(r.interrupt)
		return true

	// Continue running as normal.
	default:
		return false
	}
}

main方法

const timeout = 3 * time.Second

// main is the entry point for the program.
func main() {
	log.Println("Starting work.")

	// Create a new timer value for this run.
	r := runner.New(timeout)

	// Add the tasks to be run.
	r.Add(createTask(), createTask(), createTask())

	// Run the tasks and handle the result.
	if err := r.Start(); err != nil {
		switch err {
		case runner.ErrTimeout:
			log.Println("Terminating due to timeout.")
			os.Exit(1)
		case runner.ErrInterrupt:
			log.Println("Terminating due to interrupt.")
			os.Exit(2)
		}
	}

	log.Println("Process ended.")
}

// createTask returns an example task that sleeps for the specified
// number of seconds based on the id.
func createTask() func(int) {
	return func(id int) {
		log.Printf("Processor - Task #%d.", id)
		time.Sleep(time.Duration(id) * time.Second)
	}
}

Pooling

當你有一些特定的資源要共享,比如數據庫連接或者內存buffers,這個模式就非常有用

goroutine要用一個資源,就去pool中去拿,用完了就還回去.

例子中的資源是隻要實現了io.Closer接口即可.

m用來保證多goroutine下對Poll的操作都是value-safe的.
resources將會是一個buffered channel,會包含將要分享的資源.
factory的作用是創建一個新的資源,當poll有需要的時候.
closed用來指示pool有無被關閉

New函數接受一個用來創建新資源的函數對象(fn func() (io.Closer, error),返回一個資源)還有一個size參數.

Acquire函數先從pool中取資源,要是取不到用factory新建一個

func (p *Pool) Acquire() (io.Closer, error) {
	select {
	// Check for a free resource.
	case r, _ := <-p.resources:
		return r, nil

	// Provide a new resource since there are none available.
	default:
		return p.factory()
	}
}

Release函數:如果pool已經關閉,就直接return.否則就向resource這個buffered channel裏發送要釋放的資源.default語句是如果resource已經滿了,就關閉這個pool.

Close函數:當程序運行完關閉pool的時候,應該調用Close函數,這個函數首先關閉resource這個buffered channel,然後再把buffered channel中的任務關閉(io.Closer).注意這個加鎖.

// Pool manages a set of resources that can be shared safely by multiple goroutines.
// The resource being managed must implement  the io.Closer interface.
type Pool struct {
	m         sync.Mutex
	resources chan io.Closer
	factory   func() (io.Closer, error)
	closed    bool
}

// ErrPoolClosed is returned when an Acquire returns on a closed pool.
var ErrPoolClosed = errors.New("Pool has been closed.")

// New creates a pool that manages resources. A pool requires a
// function that can allocate a new resource and the size of the pool.
func New(fn func() (io.Closer, error), size uint) (*Pool, error) {
	if size <= 0 {
		return nil, errors.New("Size value too small.")
	}

	return &Pool{
		factory:   fn,
		resources: make(chan io.Closer, size),
	}, nil
}

// Acquire retrieves a resource	from the pool.
func (p *Pool) Acquire() (io.Closer, error) {
	select {
	// Check for a free resource.
	case r, ok := <-p.resources:
		log.Println("Acquire:", "Shared Resource")
		if !ok {
			return nil, ErrPoolClosed
		}
		return r, nil

	// Provide a new resource since there are none available.
	default:
		log.Println("Acquire:", "New Resource")
		return p.factory()
	}
}

// Release places a new resource onto the pool.
func (p *Pool) Release(r io.Closer) {
	// Secure this operation with the Close operation.
	p.m.Lock()
	defer p.m.Unlock()

	// If the pool is closed, discard the resource.
	if p.closed {
		r.Close()
		return
	}

	select {
	// Attempt to place the new resource on the queue.
	case p.resources <- r:
		log.Println("Release:", "In Queue")

	// If the queue is already at cap we close the resource.
	default:
		log.Println("Release:", "Closing")
		r.Close()
	}
}

// Close will shutdown the pool and close all existing resources.
func (p *Pool) Close() {
	// Secure this operation with the Release operation.
	p.m.Lock()
	defer p.m.Unlock()

	// If the pool is already close, don't do anything.
	if p.closed {
		return
	}

	// Set the pool as closed.
	p.closed = true

	// Close the channel before we drain the channel of its
	// resources. If we don't do this, we will have a deadlock.
	close(p.resources)

	// Close the resources
	for r := range p.resources {
		r.Close()
	}
}

main

const (
	maxGoroutines   = 25 // the number of routines to use.
	pooledResources = 2  // number of resources in the pool
)

// dbConnection simulates a resource to share.
type dbConnection struct {
	ID int32
}

// Close implements the io.Closer interface so dbConnection can be managed by the pool. Close performs any resource release management.
func (dbConn *dbConnection) Close() error {
	log.Println("Close: Connection", dbConn.ID)
	return nil
}

// idCounter provides support for giving each connection a unique id.
var idCounter int32

// createConnection is a factory method that will be called by the pool when a new connection is needed.
func createConnection() (io.Closer, error) {
	id := atomic.AddInt32(&idCounter, 1)
	log.Println("Create: New Connection", id)

	return &dbConnection{id}, nil
}

// main is the entry point for all Go programs.
func main() {
	var wg sync.WaitGroup
	wg.Add(maxGoroutines)

	// Create the pool to manage our connections.
	p, err := pool.New(createConnection, pooledResources)
	if err != nil {
		log.Println(err)
	}

	// Perform queries using connections from the pool.
	for query := 0; query < maxGoroutines; query++ {
		// Each goroutine needs its own copy of the query value else they will all be sharing the same query variable.
		go func(q int) {
			performQueries(q, p)
			wg.Done()
		}(query)
	}

	// Wait for the goroutines to finish.
	wg.Wait()

	// Close the pool.
	log.Println("Shutdown Program.")
	p.Close()
}

// performQueries tests the resource pool of connections.
func performQueries(query int, p *pool.Pool) {
	// Acquire a connection from the pool.
	conn, err := p.Acquire()
	if err != nil {
		log.Println(err)
		return
	}

	// Release the connection back to the pool.
	defer p.Release(conn)

	// Wait to simulate a query response.
	time.Sleep(time.Duration(rand.Intn(1000)) * time.Millisecond)
	log.Printf("Query: QID[%d] CID[%d]\n", query, conn.(*dbConnection).ID)
}

Work

New函數開啓了固定個數(maxGoroutines)個goroutine,注意這邊work是一個unbuffered channel.這個for range會阻塞直到channel中有值可以取.要是work這個channel被關閉了,這個for range就結束,然後調用wg.Done

Run函數提交任務到pool中去w.work <- w.注意這個work是一個unbuffered channel,所以得等一個goroutine把它取走,否則會阻塞住.這是我們需要保證的,因爲我們想要調用者保證這個任務被提交之後立即開始運行

type Worker interface {
	Task()
}

// Pool provides a pool of goroutines that can execute any Worker
// tasks that are submitted.
type Pool struct {
	work chan Worker
	wg   sync.WaitGroup
}

// New creates a new work pool.
func New(maxGoroutines int) *Pool {
	p := Pool{
		work: make(chan Worker),
	}

	p.wg.Add(maxGoroutines)
	for i := 0; i < maxGoroutines; i++ {
		go func() {
			for w := range p.work {
				w.Task()
			}
			p.wg.Done()
		}()
	}

	return &p
}

// Run submits work to the pool.
func (p *Pool) Run(w Worker) {
	p.work <- w
}

// Shutdown waits for all the goroutines to shutdown.
func (p *Pool) Shutdown() {
	close(p.work)
	p.wg.Wait()
}