Go-垃圾回收機制

轉載：https://segmentfault.com/a/1190000020086769
Go的GC自打出生的時候就開始被人詬病，但是在引入v1.5的三色標記和v1.8的混合寫屏障後，正常的GC已經縮短到10us左右，已經變得非常優秀，了不起了，我們接下來探索一下Go的GC的原理吧

三色標記原理
我們首先看一張圖，大概就會對三色標記法有一個大致的瞭解：

原理：

首先把所有的對象都放到白色的集合中
從根節點開始遍歷對象，遍歷到的白色對象從白色集合中放到灰色集合中
遍歷灰色集合中的對象，把灰色對象引用的白色集合的對象放入到灰色集合中，同時把遍歷過的灰色集合中的對象放到黑色的集合中
循環步驟3，知道灰色集合中沒有對象
步驟4結束後，白色集合中的對象就是不可達對象，也就是垃圾，進行回收
寫屏障
Go在進行三色標記的時候並沒有STW，也就是說，此時的對象還是可以進行修改

那麼我們考慮一下，下面的情況

我們在進行三色標記中掃描灰色集合中，掃描到了對象A，並標記了對象A的所有引用，這時候，開始掃描對象D的引用，而此時，另一個goroutine修改了D->E的引用，變成了如下圖所示

這樣會不會導致E對象就掃描不到了，而被誤認爲爲白色對象，也就是垃圾

寫屏障就是爲了解決這樣的問題，引入寫屏障後，在上述步驟後，E會被認爲是存活的，即使後面E被A對象拋棄，E會被在下一輪的GC中進行回收，這一輪GC中是不會對對象E進行回收的

Go1.9中開始啓用了混合寫屏障，僞代碼如下

writePointer(slot, ptr):
shade(*slot)
if any stack is grey:
shade(ptr)
*slot = ptr
混合寫屏障會同時標記指針寫入目標的"原指針"和“新指針".

標記原指針的原因是, 其他運行中的線程有可能會同時把這個指針的值複製到寄存器或者棧上的本地變量
因爲複製指針到寄存器或者棧上的本地變量不會經過寫屏障, 所以有可能會導致指針不被標記, 試想下面的情況：

[go] b = obj
[go] oldx = nil
[gc] scan oldx…
[go] oldx = b.x // 複製b.x到本地變量, 不進過寫屏障
[go] b.x = ptr // 寫屏障應該標記b.x的原值
[gc] scan b…
如果寫屏障不標記原值, 那麼oldx就不會被掃描到.
標記新指針的原因是, 其他運行中的線程有可能會轉移指針的位置, 試想下面的情況:

[go] a = ptr
[go] b = obj
[gc] scan b…
[go] b.x = a // 寫屏障應該標記b.x的新值
[go] a = nil
[gc] scan a…
如果寫屏障不標記新值, 那麼ptr就不會被掃描到.
混合寫屏障可以讓GC在並行標記結束後不需要重新掃描各個G的堆棧, 可以減少Mark Termination中的STW時間

除了寫屏障外, 在GC的過程中所有新分配的對象都會立刻變爲黑色, 在上面的mallocgc函數中可以看到

回收流程
GO的GC是並行GC, 也就是GC的大部分處理和普通的go代碼是同時運行的, 這讓GO的GC流程比較複雜.
首先GC有四個階段, 它們分別是:

Sweep Termination: 對未清掃的span進行清掃, 只有上一輪的GC的清掃工作完成纔可以開始新一輪的GC
Mark: 掃描所有根對象, 和根對象可以到達的所有對象, 標記它們不被回收
Mark Termination: 完成標記工作, 重新掃描部分根對象(要求STW)
Sweep: 按標記結果清掃span
下圖是比較完整的GC流程, 並按顏色對這四個階段進行了分類:

在GC過程中會有兩種後臺任務(G), 一種是標記用的後臺任務, 一種是清掃用的後臺任務.
標記用的後臺任務會在需要時啓動, 可以同時工作的後臺任務數量大約是P的數量的25%, 也就是go所講的讓25%的cpu用在GC上的根據.
清掃用的後臺任務在程序啓動時會啓動一個, 進入清掃階段時喚醒.

目前整個GC流程會進行兩次STW(Stop The World), 第一次是Mark階段的開始, 第二次是Mark Termination階段.
第一次STW會準備根對象的掃描, 啓動寫屏障(Write Barrier)和輔助GC(mutator assist).
第二次STW會重新掃描部分根對象, 禁用寫屏障(Write Barrier)和輔助GC(mutator assist).
需要注意的是, 不是所有根對象的掃描都需要STW, 例如掃描棧上的對象只需要停止擁有該棧的G.
寫屏障的實現使用了Hybrid Write Barrier, 大幅減少了第二次STW的時間.

源碼分析
gcStart
func gcStart(mode gcMode, trigger gcTrigger) {
// Since this is called from malloc and malloc is called in
// the guts of a number of libraries that might be holding
// locks, don’t attempt to start GC in non-preemptible or
// potentially unstable situations.
// 判斷當前g是否可以搶佔，不可搶佔時不觸發GC
mp := acquirem()
if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != “” {
releasem(mp)
return
}
releasem(mp)
mp = nil

// Pick up the remaining unswept/not being swept spans concurrently
//
// This shouldn't happen if we're being invoked in background
// mode since proportional sweep should have just finished
// sweeping everything, but rounding errors, etc, may leave a
// few spans unswept. In forced mode, this is necessary since
// GC can be forced at any point in the sweeping cycle.
//
// We check the transition condition continuously here in case
// this G gets delayed in to the next GC cycle.
// 清掃 殘留的未清掃的垃圾
for trigger.test() && gosweepone() != ^uintptr(0) {
    sweep.nbgsweep++
}

// Perform GC initialization and the sweep termination
// transition.
semacquire(&work.startSema)
// Re-check transition condition under transition lock.
// 判斷gcTrriger的條件是否成立
if !trigger.test() {
    semrelease(&work.startSema)
    return
}

// For stats, check if this GC was forced by the user
// 判斷並記錄GC是否被強制執行的，runtime.GC()可以被用戶調用並強制執行
work.userForced = trigger.kind == gcTriggerAlways || trigger.kind == gcTriggerCycle

// In gcstoptheworld debug mode, upgrade the mode accordingly.
// We do this after re-checking the transition condition so
// that multiple goroutines that detect the heap trigger don't
// start multiple STW GCs.
// 設置gc的mode
if mode == gcBackgroundMode {
    if debug.gcstoptheworld == 1 {
        mode = gcForceMode
    } else if debug.gcstoptheworld == 2 {
        mode = gcForceBlockMode
    }
}

// Ok, we're doing it! Stop everybody else
semacquire(&worldsema)

if trace.enabled {
    traceGCStart()
}
// 啓動後臺標記任務
if mode == gcBackgroundMode {
    gcBgMarkStartWorkers()
}
// 重置gc 標記相關的狀態
gcResetMarkState()

work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs
if work.stwprocs > ncpu {
    // This is used to compute CPU time of the STW phases,
    // so it can't be more than ncpu, even if GOMAXPROCS is.
    work.stwprocs = ncpu
}
work.heap0 = atomic.Load64(&memstats.heap_live)
work.pauseNS = 0
work.mode = mode

now := nanotime()
work.tSweepTerm = now
work.pauseStart = now
if trace.enabled {
    traceGCSTWStart(1)
}
// STW,停止世界
systemstack(stopTheWorldWithSema)
// Finish sweep before we start concurrent scan.
// 先清掃上一輪的垃圾，確保上輪GC完成
systemstack(func() {
    finishsweep_m()
})
// clearpools before we start the GC. If we wait they memory will not be
// reclaimed until the next GC cycle.
// 清理 sync.pool sched.sudogcache、sched.deferpool，這裏不展開，sync.pool已經說了，剩餘的後面的文章會涉及
clearpools()

// 增加GC技術
work.cycles++
if mode == gcBackgroundMode { // Do as much work concurrently as possible
    gcController.startCycle()
    work.heapGoal = memstats.next_gc

    // Enter concurrent mark phase and enable
    // write barriers.
    //
    // Because the world is stopped, all Ps will
    // observe that write barriers are enabled by
    // the time we start the world and begin
    // scanning.
    //
    // Write barriers must be enabled before assists are
    // enabled because they must be enabled before
    // any non-leaf heap objects are marked. Since
    // allocations are blocked until assists can
    // happen, we want enable assists as early as
    // possible.
    // 設置GC的狀態爲 gcMark
    setGCPhase(_GCmark)

    // 更新 bgmark 的狀態
    gcBgMarkPrepare() // Must happen before assist enable.
    // 計算並排隊root 掃描任務，並初始化相關掃描任務狀態
    gcMarkRootPrepare()

    // Mark all active tinyalloc blocks. Since we're
    // allocating from these, they need to be black like
    // other allocations. The alternative is to blacken
    // the tiny block on every allocation from it, which
    // would slow down the tiny allocator.
    // 標記 tiny 對象
    gcMarkTinyAllocs()

    // At this point all Ps have enabled the write
    // barrier, thus maintaining the no white to
    // black invariant. Enable mutator assists to
    // put back-pressure on fast allocating
    // mutators.
    // 設置 gcBlackenEnabled 爲 1，啓用寫屏障
    atomic.Store(&gcBlackenEnabled, 1)

    // Assists and workers can start the moment we start
    // the world.
    gcController.markStartTime = now

    // Concurrent mark.
    systemstack(func() {
        now = startTheWorldWithSema(trace.enabled)
    })
    work.pauseNS += now - work.pauseStart
    work.tMark = now
} else {
    // 非並行模式
    // 記錄完成標記階段的開始時間
    if trace.enabled {
        // Switch to mark termination STW.
        traceGCSTWDone()
        traceGCSTWStart(0)
    }
    t := nanotime()
    work.tMark, work.tMarkTerm = t, t
    work.heapGoal = work.heap0

    // Perform mark termination. This will restart the world.
    // stw,進行標記，清掃並start the world
    gcMarkTermination(memstats.triggerRatio)
}

semrelease(&work.startSema)

}
gcBgMarkStartWorkers
這個函數準備一些執行bg mark工作的goroutine，但是這些goroutine並不是立即工作的，而是到等到GC的狀態被標記爲gcMark 纔開始工作，見上個函數的119行

func gcBgMarkStartWorkers() {
// Background marking is performed by per-P G’s. Ensure that
// each P has a background GC G.
for _, p := range allp {
if p.gcBgMarkWorker == 0 {
go gcBgMarkWorker§
// 等待gcBgMarkWorker goroutine 的 bgMarkReady信號再繼續
notetsleepg(&work.bgMarkReady, -1)
noteclear(&work.bgMarkReady)
}
}
}
gcBgMarkWorker
後臺標記任務的函數

func gcBgMarkWorker(p *p) {
gp := getg()
// 用於休眠結束後重新獲取p和m
type parkInfo struct {
m muintptr // Release this m on park.
attach puintptr // If non-nil, attach to this p on park.
}
// We pass park to a gopark unlock function, so it can’t be on
// the stack (see gopark). Prevent deadlock from recursively
// starting GC by disabling preemption.
gp.m.preemptoff = “GC worker init”
park := new(parkInfo)
gp.m.preemptoff = “”
// 設置park的m和p的信息，留着後面傳給gopark，在被gcController.findRunnable喚醒的時候，便於找回
park.m.set(acquirem())
park.attach.set(p)
// Inform gcBgMarkStartWorkers that this worker is ready.
// After this point, the background mark worker is scheduled
// cooperatively by gcController.findRunnable. Hence, it must
// never be preempted, as this would put it into _Grunnable
// and put it on a run queue. Instead, when the preempt flag
// is set, this puts itself into _Gwaiting to be woken up by
// gcController.findRunnable at the appropriate time.
// 讓gcBgMarkStartWorkers notetsleepg停止等待並繼續及退出
notewakeup(&work.bgMarkReady)

for {
    // Go to sleep until woken by gcController.findRunnable.
    // We can't releasem yet since even the call to gopark
    // may be preempted.
    // 讓g進入休眠
    gopark(func(g *g, parkp unsafe.Pointer) bool {
        park := (*parkInfo)(parkp)

        // The worker G is no longer running, so it's
        // now safe to allow preemption.
        // 釋放當前搶佔的m
        releasem(park.m.ptr())

        // If the worker isn't attached to its P,
        // attach now. During initialization and after
        // a phase change, the worker may have been
        // running on a different P. As soon as we
        // attach, the owner P may schedule the
        // worker, so this must be done after the G is
        // stopped.
        // 設置關聯p，上面已經設置過了
        if park.attach != 0 {
            p := park.attach.ptr()
            park.attach.set(nil)
            // cas the worker because we may be
            // racing with a new worker starting
            // on this P.
            if !p.gcBgMarkWorker.cas(0, guintptr(unsafe.Pointer(g))) {
                // The P got a new worker.
                // Exit this worker.
                return false
            }
        }
        return true
    }, unsafe.Pointer(park), waitReasonGCWorkerIdle, traceEvGoBlock, 0)

    // Loop until the P dies and disassociates this
    // worker (the P may later be reused, in which case
    // it will get a new worker) or we failed to associate.
    // 檢查P的gcBgMarkWorker是否和當前的G一致, 不一致時結束當前的任務
    if _p_.gcBgMarkWorker.ptr() != gp {
        break
    }

    // Disable preemption so we can use the gcw. If the
    // scheduler wants to preempt us, we'll stop draining,
    // dispose the gcw, and then preempt.
    // gopark第一個函數中釋放了m，這裏再搶佔回來
    park.m.set(acquirem())

    if gcBlackenEnabled == 0 {
        throw("gcBgMarkWorker: blackening not enabled")
    }

    startTime := nanotime()
    // 設置gcmark的開始時間
    _p_.gcMarkWorkerStartTime = startTime

    decnwait := atomic.Xadd(&work.nwait, -1)
    if decnwait == work.nproc {
        println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc)
        throw("work.nwait was > work.nproc")
    }
    // 切換到g0工作
    systemstack(func() {
        // Mark our goroutine preemptible so its stack
        // can be scanned. This lets two mark workers
        // scan each other (otherwise, they would
        // deadlock). We must not modify anything on
        // the G stack. However, stack shrinking is
        // disabled for mark workers, so it is safe to
        // read from the G stack.
        // 設置G的狀態爲waiting，以便於另一個g掃描它的棧(兩個g可以互相掃描對方的棧)
        casgstatus(gp, _Grunning, _Gwaiting)
        switch _p_.gcMarkWorkerMode {
        default:
            throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
        case gcMarkWorkerDedicatedMode:
            // 專心執行標記工作的模式
            gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
            if gp.preempt {
                // 被搶佔了，把所有本地運行隊列中的G放到全局運行隊列中
                // We were preempted. This is
                // a useful signal to kick
                // everything out of the run
                // queue so it can run
                // somewhere else.
                lock(&sched.lock)
                for {
                    gp, _ := runqget(_p_)
                    if gp == nil {
                        break
                    }
                    globrunqput(gp)
                }
                unlock(&sched.lock)
            }
            // Go back to draining, this time
            // without preemption.
            // 繼續執行標記工作
            gcDrain(&_p_.gcw, gcDrainNoBlock|gcDrainFlushBgCredit)
        case gcMarkWorkerFractionalMode:
            // 執行標記工作，知道被搶佔
            gcDrain(&_p_.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit)
        case gcMarkWorkerIdleMode:
            // 空閒的時候執行標記工作
            gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)
        }
        // 把G的waiting狀態轉換到runing狀態
        casgstatus(gp, _Gwaiting, _Grunning)
    })

    // If we are nearing the end of mark, dispose
    // of the cache promptly. We must do this
    // before signaling that we're no longer
    // working so that other workers can't observe
    // no workers and no work while we have this
    // cached, and before we compute done.
    // 及時處理本地緩存，上交到全局的隊列中
    if gcBlackenPromptly {
        _p_.gcw.dispose()
    }

    // Account for time.
    // 累加耗時
    duration := nanotime() - startTime
    switch _p_.gcMarkWorkerMode {
    case gcMarkWorkerDedicatedMode:
        atomic.Xaddint64(&gcController.dedicatedMarkTime, duration)
        atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 1)
    case gcMarkWorkerFractionalMode:
        atomic.Xaddint64(&gcController.fractionalMarkTime, duration)
        atomic.Xaddint64(&_p_.gcFractionalMarkTime, duration)
    case gcMarkWorkerIdleMode:
        atomic.Xaddint64(&gcController.idleMarkTime, duration)
    }

    // Was this the last worker and did we run out
    // of work?
    incnwait := atomic.Xadd(&work.nwait, +1)
    if incnwait > work.nproc {
        println("runtime: p.gcMarkWorkerMode=", _p_.gcMarkWorkerMode,
            "work.nwait=", incnwait, "work.nproc=", work.nproc)
        throw("work.nwait > work.nproc")
    }

    // If this worker reached a background mark completion
    // point, signal the main GC goroutine.
    if incnwait == work.nproc && !gcMarkWorkAvailable(nil) {
        // Make this G preemptible and disassociate it
        // as the worker for this P so
        // findRunnableGCWorker doesn't try to
        // schedule it.
        // 取消p m的關聯
        _p_.gcBgMarkWorker.set(nil)
        releasem(park.m.ptr())

        gcMarkDone()

        // Disable preemption and prepare to reattach
        // to the P.
        //
        // We may be running on a different P at this
        // point, so we can't reattach until this G is
        // parked.
        park.m.set(acquirem())
        park.attach.set(_p_)
    }
}

}
gcDrain
三色標記的主要實現

gcDrain掃描所有的roots和對象，並表黑灰色對象，知道所有的roots和對象都被標記

func gcDrain(gcw *gcWork, flags gcDrainFlags) {
if !writeBarrier.needed {
throw(“gcDrain phase incorrect”)
}

gp := getg().m.curg
// 看到搶佔標識是否要返回
preemptible := flags&gcDrainUntilPreempt != 0
// 沒有任務時是否要等待任務
blocking := flags&(gcDrainUntilPreempt|gcDrainIdle|gcDrainFractional|gcDrainNoBlock) == 0
// 是否計算後臺的掃描量來減少輔助GC和喚醒等待中的G
flushBgCredit := flags&gcDrainFlushBgCredit != 0
// 是否在空閒的時候執行標記任務
idle := flags&gcDrainIdle != 0
// 記錄初始的已經執行過的掃描任務
initScanWork := gcw.scanWork

// checkWork is the scan work before performing the next
// self-preempt check.
// 設置對應模式的工作檢查函數
checkWork := int64(1<<63 - 1)
var check func() bool
if flags&(gcDrainIdle|gcDrainFractional) != 0 {
    checkWork = initScanWork + drainCheckThreshold
    if idle {
        check = pollWork
    } else if flags&gcDrainFractional != 0 {
        check = pollFractionalWorkerExit
    }
}

// Drain root marking jobs.
// 如果root對象沒有掃描完，則掃描
if work.markrootNext < work.markrootJobs {
    for !(preemptible && gp.preempt) {
        job := atomic.Xadd(&work.markrootNext, +1) - 1
        if job >= work.markrootJobs {
            break
        }
        // 執行root掃描任務
        markroot(gcw, job)
        if check != nil && check() {
            goto done
        }
    }
}

// Drain heap marking jobs.
// 循環直到被搶佔
for !(preemptible && gp.preempt) {
    // Try to keep work available on the global queue. We used to
    // check if there were waiting workers, but it's better to
    // just keep work available than to make workers wait. In the
    // worst case, we'll do O(log(_WorkbufSize)) unnecessary
    // balances.
    if work.full == 0 {
        // 平衡工作，如果全局的標記隊列爲空，則分一部分工作到全局隊列中
        gcw.balance()
    }

    var b uintptr
    if blocking {
        b = gcw.get()
    } else {
        b = gcw.tryGetFast()
        if b == 0 {
            b = gcw.tryGet()
        }
    }
    // 獲取任務失敗，跳出循環
    if b == 0 {
        // work barrier reached or tryGet failed.
        break
    }
    // 掃描獲取的到對象
    scanobject(b, gcw)

    // Flush background scan work credit to the global
    // account if we've accumulated enough locally so
    // mutator assists can draw on it.
    // 如果當前掃描的數量超過了 gcCreditSlack，就把掃描的對象數量加到全局的數量，批量更新
    if gcw.scanWork >= gcCreditSlack {
        atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
        if flushBgCredit {
            gcFlushBgCredit(gcw.scanWork - initScanWork)
            initScanWork = 0
        }
        checkWork -= gcw.scanWork
        gcw.scanWork = 0
        // 如果掃描的對象數量已經達到了 執行下次搶佔的目標數量 checkWork， 則調用對應模式的函數
        // idle模式爲 pollWork， Fractional模式爲 pollFractionalWorkerExit ，在第20行
        if checkWork <= 0 {
            checkWork += drainCheckThreshold
            if check != nil && check() {
                break
            }
        }
    }
}

// In blocking mode, write barriers are not allowed after this
// point because we must preserve the condition that the work
// buffers are empty.

done:
// Flush remaining scan work credit.
if gcw.scanWork > 0 {
// 把掃描的對象數量添加到全局
atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
if flushBgCredit {
gcFlushBgCredit(gcw.scanWork - initScanWork)
}
gcw.scanWork = 0
}
}
markroot
這個被用於根對象掃描

func markroot(gcw *gcWork, i uint32) {
// TODO(austin): This is a bit ridiculous. Compute and store
// the bases in gcMarkRootPrepare instead of the counts.
baseFlushCache := uint32(fixedRootCount)
baseData := baseFlushCache + uint32(work.nFlushCacheRoots)
baseBSS := baseData + uint32(work.nDataRoots)
baseSpans := baseBSS + uint32(work.nBSSRoots)
baseStacks := baseSpans + uint32(work.nSpanRoots)
end := baseStacks + uint32(work.nStackRoots)

// Note: if you add a case here, please also update heapdump.go:dumproots.
switch {
// 釋放mcache中的span
case baseFlushCache <= i && i < baseData:
    flushmcache(int(i - baseFlushCache))
// 掃描可讀寫的全局變量
case baseData <= i && i < baseBSS:
    for _, datap := range activeModules() {
        markrootBlock(datap.data, datap.edata-datap.data, datap.gcdatamask.bytedata, gcw, int(i-baseData))
    }
// 掃描只讀的全局隊列
case baseBSS <= i && i < baseSpans:
    for _, datap := range activeModules() {
        markrootBlock(datap.bss, datap.ebss-datap.bss, datap.gcbssmask.bytedata, gcw, int(i-baseBSS))
    }
// 掃描Finalizer隊列
case i == fixedRootFinalizers:
    // Only do this once per GC cycle since we don't call
    // queuefinalizer during marking.
    if work.markrootDone {
        break
    }
    for fb := allfin; fb != nil; fb = fb.alllink {
        cnt := uintptr(atomic.Load(&fb.cnt))
        scanblock(uintptr(unsafe.Pointer(&fb.fin[0])), cnt*unsafe.Sizeof(fb.fin[0]), &finptrmask[0], gcw)
    }
// 釋放已經終止的stack
case i == fixedRootFreeGStacks:
    // Only do this once per GC cycle; preferably
    // concurrently.
    if !work.markrootDone {
        // Switch to the system stack so we can call
        // stackfree.
        systemstack(markrootFreeGStacks)
    }
// 掃描MSpan.specials
case baseSpans <= i && i < baseStacks:
    // mark MSpan.specials
    markrootSpans(gcw, int(i-baseSpans))

default:
    // the rest is scanning goroutine stacks
    // 獲取需要掃描的g
    var gp *g
    if baseStacks <= i && i < end {
        gp = allgs[i-baseStacks]
    } else {
        throw("markroot: bad index")
    }

    // remember when we've first observed the G blocked
    // needed only to output in traceback
    status := readgstatus(gp) // We are not in a scan state
    if (status == _Gwaiting || status == _Gsyscall) && gp.waitsince == 0 {
        gp.waitsince = work.tstart
    }

    // scang must be done on the system stack in case
    // we're trying to scan our own stack.
    // 轉交給g0進行掃描
    systemstack(func() {
        // If this is a self-scan, put the user G in
        // _Gwaiting to prevent self-deadlock. It may
        // already be in _Gwaiting if this is a mark
        // worker or we're in mark termination.
        userG := getg().m.curg
        selfScan := gp == userG && readgstatus(userG) == _Grunning
        // 如果是掃描自己的，則轉換自己的g的狀態
        if selfScan {
            casgstatus(userG, _Grunning, _Gwaiting)
            userG.waitreason = waitReasonGarbageCollectionScan
        }

        // TODO: scang blocks until gp's stack has
        // been scanned, which may take a while for
        // running goroutines. Consider doing this in
        // two phases where the first is non-blocking:
        // we scan the stacks we can and ask running
        // goroutines to scan themselves; and the
        // second blocks.
        // 掃描g的棧
        scang(gp, gcw)

        if selfScan {
            casgstatus(userG, _Gwaiting, _Grunning)
        }
    })
}

}
markRootBlock
根據 ptrmask0，來掃描[b0, b0+n0)區域

func markrootBlock(b0, n0 uintptr, ptrmask0 uint8, gcw gcWork, shard int) {
if rootBlockBytes%(8sys.PtrSize) != 0 {
// This is necessary to pick byte offsets in ptrmask0.
throw("rootBlockBytes must be a multiple of 8ptrSize")
}

b := b0 + uintptr(shard)*rootBlockBytes
// 如果需掃描的block區域，超出b0+n0的區域，直接返回
if b >= b0+n0 {
    return
}
ptrmask := (*uint8)(add(unsafe.Pointer(ptrmask0), uintptr(shard)*(rootBlockBytes/(8*sys.PtrSize))))
n := uintptr(rootBlockBytes)
if b+n > b0+n0 {
    n = b0 + n0 - b
}

// Scan this shard.
// 掃描給定block的shard
scanblock(b, n, ptrmask, gcw)

}
scanblock
func scanblock(b0, n0 uintptr, ptrmask *uint8, gcw *gcWork) {
// Use local copies of original parameters, so that a stack trace
// due to one of the throws below shows the original block
// base and extent.
b := b0
n := n0

for i := uintptr(0); i < n; {
    // Find bits for the next word.
    // 找到bitmap中對應的bits
    bits := uint32(*addb(ptrmask, i/(sys.PtrSize*8)))
    if bits == 0 {
        i += sys.PtrSize * 8
        continue
    }
    for j := 0; j < 8 && i < n; j++ {
        if bits&1 != 0 {
            // 如果該地址包含指針
            // Same work as in scanobject; see comments there.
            obj := *(*uintptr)(unsafe.Pointer(b + i))
            if obj != 0 {
                // 如果該地址下找到了對應的對象，標灰
                if obj, span, objIndex := findObject(obj, b, i); obj != 0 {
                    greyobject(obj, b, i, span, gcw, objIndex)
                }
            }
        }
        bits >>= 1
        i += sys.PtrSize
    }
}

}
greyobject
標灰對象其實就是找到對應bitmap，標記存活並扔進隊列

func greyobject(obj, base, off uintptr, span *mspan, gcw *gcWork, objIndex uintptr) {
// obj should be start of allocation, and so must be at least pointer-aligned.
if obj&(sys.PtrSize-1) != 0 {
throw(“greyobject: obj not pointer-aligned”)
}
mbits := span.markBitsForIndex(objIndex)

if useCheckmark {
    // 這裏是用來debug，確保所有的對象都被正確標識
    if !mbits.isMarked() {
        // 這個對象沒有被標記
        printlock()
        print("runtime:greyobject: checkmarks finds unexpected unmarked object obj=", hex(obj), "\n")
        print("runtime: found obj at *(", hex(base), "+", hex(off), ")\n")

        // Dump the source (base) object
        gcDumpObject("base", base, off)

        // Dump the object
        gcDumpObject("obj", obj, ^uintptr(0))

        getg().m.traceback = 2
        throw("checkmark found unmarked object")
    }
    hbits := heapBitsForAddr(obj)
    if hbits.isCheckmarked(span.elemsize) {
        return
    }
    hbits.setCheckmarked(span.elemsize)
    if !hbits.isCheckmarked(span.elemsize) {
        throw("setCheckmarked and isCheckmarked disagree")
    }
} else {
    if debug.gccheckmark > 0 && span.isFree(objIndex) {
        print("runtime: marking free object ", hex(obj), " found at *(", hex(base), "+", hex(off), ")\n")
        gcDumpObject("base", base, off)
        gcDumpObject("obj", obj, ^uintptr(0))
        getg().m.traceback = 2
        throw("marking free object")
    }

    // If marked we have nothing to do.
    // 對象被正確標記了，無需做其他的操作
    if mbits.isMarked() {
        return
    }
    // mbits.setMarked() // Avoid extra call overhead with manual inlining.
    // 標記對象
    atomic.Or8(mbits.bytep, mbits.mask)
    // If this is a noscan object, fast-track it to black
    // instead of greying it.
    // 如果對象不是指針，則只需要標記，不需要放進隊列，相當於直接標黑
    if span.spanclass.noscan() {
        gcw.bytesMarked += uint64(span.elemsize)
        return
    }
}

// Queue the obj for scanning. The PREFETCH(obj) logic has been removed but
// seems like a nice optimization that can be added back in.
// There needs to be time between the PREFETCH and the use.
// Previously we put the obj in an 8 element buffer that is drained at a rate
// to give the PREFETCH time to do its work.
// Use of PREFETCHNTA might be more appropriate than PREFETCH
// 判斷對象是否被放進隊列，沒有則放入，標灰步驟完成
if !gcw.putFast(obj) {
    gcw.put(obj)
}

}
gcWork.putFast
work有wbuf1 wbuf2兩個隊列用於保存灰色對象，首先會往wbuf1隊列里加入灰色對象，wbuf1滿了後，交換wbuf1和wbuf2，這事wbuf2便晉升爲wbuf1，繼續存放灰色對象，兩個隊列都滿了，則想全局進行申請

putFast這裏進嘗試將對象放進wbuf1隊列中

func (w *gcWork) putFast(obj uintptr) bool {
wbuf := w.wbuf1
if wbuf == nil {
// 沒有申請緩存隊列，返回false
return false
} else if wbuf.nobj == len(wbuf.obj) {
// wbuf1隊列滿了，返回false
return false
}

// 向未滿wbuf1隊列中加入對象
wbuf.obj[wbuf.nobj] = obj
wbuf.nobj++
return true

}
gcWork.put
put不僅嘗試將對象放入wbuf1，還會再wbuf1滿的時候，嘗試更換wbuf1 wbuf2的角色，都滿的話，則想全局進行申請，並將滿的隊列上交到全局隊列

func (w *gcWork) put(obj uintptr) {
flushed := false
wbuf := w.wbuf1
if wbuf == nil {
// 如果wbuf1不存在，則初始化wbuf1 wbuf2兩個隊列
w.init()
wbuf = w.wbuf1
// wbuf is empty at this point.
} else if wbuf.nobj == len(wbuf.obj) {
// wbuf1滿了，更換wbuf1 wbuf2的角色
w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
wbuf = w.wbuf1
if wbuf.nobj == len(wbuf.obj) {
// 更換角色後，wbuf1也滿了，說明兩個隊列都滿了
// 把 wbuf1上交全局並獲取一個空的隊列
putfull(wbuf)
wbuf = getempty()
w.wbuf1 = wbuf
// 設置隊列上交的標誌位
flushed = true
}
}

wbuf.obj[wbuf.nobj] = obj
wbuf.nobj++

// If we put a buffer on full, let the GC controller know so
// it can encourage more workers to run. We delay this until
// the end of put so that w is in a consistent state, since
// enlistWorker may itself manipulate w.
// 此時全局已經有標記滿的隊列，GC controller選擇調度更多work進行工作
if flushed && gcphase == _GCmark {
    gcController.enlistWorker()
}

}
到這裏，接下來，我們繼續分析gcDrain裏面的函數，追蹤一下，我們標灰的對象是如何被標黑的

gcw.balance()
繼續分析 gcDrain的58行，balance work是什麼

func (w *gcWork) balance() {
if w.wbuf1 == nil {
// 這裏wbuf1 wbuf2隊列還沒有初始化
return
}
// 如果wbuf2不爲空，則上交到全局，並獲取一個空島隊列給wbuf2
if wbuf := w.wbuf2; wbuf.nobj != 0 {
putfull(wbuf)
w.wbuf2 = getempty()
} else if wbuf := w.wbuf1; wbuf.nobj > 4 {
// 把未滿的wbuf1分成兩半，並把其中一半上交的全局隊列
w.wbuf1 = handoff(wbuf)
} else {
return
}
// We flushed a buffer to the full list, so wake a worker.
// 這裏，全局隊列有滿的隊列了，其他work可以工作了
if gcphase == _GCmark {
gcController.enlistWorker()
}
}
gcw.get()
繼續分析 gcDrain的63行，這裏就是首先從本地的隊列獲取一個對象，如果本地隊列的wbuf1沒有，嘗試從wbuf2獲取，如果兩個都沒有，則嘗試從全局隊列獲取一個滿的隊列，並獲取一個對象

func (w *gcWork) get() uintptr {
wbuf := w.wbuf1
if wbuf == nil {
w.init()
wbuf = w.wbuf1
// wbuf is empty at this point.
}
if wbuf.nobj == 0 {
// wbuf1空了，更換wbuf1 wbuf2的角色
w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
wbuf = w.wbuf1
// 原wbuf2也是空的，嘗試從全局隊列獲取一個滿的隊列
if wbuf.nobj == 0 {
owbuf := wbuf
wbuf = getfull()
// 獲取不到，則返回
if wbuf == nil {
return 0
}
// 把空的隊列上傳到全局空隊列，並把獲取的滿的隊列，作爲自身的wbuf1
putempty(owbuf)
w.wbuf1 = wbuf
}
}

// TODO: This might be a good place to add prefetch code

wbuf.nobj--
return wbuf.obj[wbuf.nobj]

}
gcw.tryGet() gcw.tryGetFast() 邏輯差不多，相對比較簡單，就不繼續分析了

scanobject
我們繼續分析到 gcDrain 的L76，這裏已經獲取到了b，開始消費隊列

func scanobject(b uintptr, gcw *gcWork) {
// Find the bits for b and the size of the object at b.
//
// b is either the beginning of an object, in which case this
// is the size of the object to scan, or it points to an
// oblet, in which case we compute the size to scan below.
// 獲取b對應的bits
hbits := heapBitsForAddr(b)
// 獲取b所在的span
s := spanOfUnchecked(b)
n := s.elemsize
if n == 0 {
throw(“scanobject n == 0”)
}
// 對象過大，則切割後再掃描，maxObletBytes爲128k
if n > maxObletBytes {
// Large object. Break into oblets for better
// parallelism and lower latency.
if b == s.base() {
// It’s possible this is a noscan object (not
// from greyobject, but from other code
// paths), in which case we must not enqueue
// oblets since their bitmaps will be
// uninitialized.
// 如果不是指針，直接標記返回，相當於標黑了
if s.spanclass.noscan() {
// Bypass the whole scan.
gcw.bytesMarked += uint64(n)
return
}

        // Enqueue the other oblets to scan later.
        // Some oblets may be in b's scalar tail, but
        // these will be marked as "no more pointers",
        // so we'll drop out immediately when we go to
        // scan those.
        // 按maxObletBytes切割後放入到 隊列
        for oblet := b + maxObletBytes; oblet < s.base()+s.elemsize; oblet += maxObletBytes {
            if !gcw.putFast(oblet) {
                gcw.put(oblet)
            }
        }
    }

    // Compute the size of the oblet. Since this object
    // must be a large object, s.base() is the beginning
    // of the object.
    n = s.base() + s.elemsize - b
    if n > maxObletBytes {
        n = maxObletBytes
    }
}

var i uintptr
for i = 0; i < n; i += sys.PtrSize {
    // Find bits for this word.
    // 獲取到對應的bits
    if i != 0 {
        // Avoid needless hbits.next() on last iteration.
        hbits = hbits.next()
    }
    // Load bits once. See CL 22712 and issue 16973 for discussion.
    bits := hbits.bits()
    // During checkmarking, 1-word objects store the checkmark
    // in the type bit for the one word. The only one-word objects
    // are pointers, or else they'd be merged with other non-pointer
    // data into larger allocations.
    if i != 1*sys.PtrSize && bits&bitScan == 0 {
        break // no more pointers in this object
    }
    // 不是指針，繼續
    if bits&bitPointer == 0 {
        continue // not a pointer
    }

    // Work here is duplicated in scanblock and above.
    // If you make changes here, make changes there too.
    obj := *(*uintptr)(unsafe.Pointer(b + i))

    // At this point we have extracted the next potential pointer.
    // Quickly filter out nil and pointers back to the current object.
    if obj != 0 && obj-b >= n {
        // Test if obj points into the Go heap and, if so,
        // mark the object.
        //
        // Note that it's possible for findObject to
        // fail if obj points to a just-allocated heap
        // object because of a race with growing the
        // heap. In this case, we know the object was
        // just allocated and hence will be marked by
        // allocation itself.
        // 找到指針對應的對象，並標灰
        if obj, span, objIndex := findObject(obj, b, i); obj != 0 {
            greyobject(obj, b, i, span, gcw, objIndex)
        }
    }
}
gcw.bytesMarked += uint64(n)
gcw.scanWork += int64(i)

}
綜上，我們可以發現，標灰就是標記並放進隊列，標黑就是標記，所以當灰色對象從隊列中取出後，我們就可以認爲這個對象是黑色對象了

至此，gcDrain的標記工作分析完成，我們繼續回到gcBgMarkWorker分析

gcMarkDone
gcMarkDone會將mark1階段進入到mark2階段， mark2階段進入到mark termination階段

mark1階段：包括所有root標記，全局緩存隊列和本地緩存隊列

mark2階段：本地緩存隊列會被禁用

func gcMarkDone() {
top:
semacquire(&work.markDoneSema)

// Re-check transition condition under transition lock.
if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) {
    semrelease(&work.markDoneSema)
    return
}

// Disallow starting new workers so that any remaining workers
// in the current mark phase will drain out.
//
// TODO(austin): Should dedicated workers keep an eye on this
// and exit gcDrain promptly?
// 禁止新的標記任務
atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, -0xffffffff)
prevFractionalGoal := gcController.fractionalUtilizationGoal
gcController.fractionalUtilizationGoal = 0

// 如果gcBlackenPromptly表名需要所有本地緩存隊列立即上交到全局隊列，並禁用本地緩存隊列
if !gcBlackenPromptly {
    // Transition from mark 1 to mark 2.
    //
    // The global work list is empty, but there can still be work
    // sitting in the per-P work caches.
    // Flush and disable work caches.

    // Disallow caching workbufs and indicate that we're in mark 2.
    // 禁用本地緩存隊列，進入mark2階段
    gcBlackenPromptly = true

    // Prevent completion of mark 2 until we've flushed
    // cached workbufs.
    atomic.Xadd(&work.nwait, -1)

    // GC is set up for mark 2. Let Gs blocked on the
    // transition lock go while we flush caches.
    semrelease(&work.markDoneSema)
    // 切換到g0執行，本地緩存上傳到全局的操作
    systemstack(func() {
        // Flush all currently cached workbufs and
        // ensure all Ps see gcBlackenPromptly. This
        // also blocks until any remaining mark 1
        // workers have exited their loop so we can
        // start new mark 2 workers.
        forEachP(func(_p_ *p) {
            wbBufFlush1(_p_)
            _p_.gcw.dispose()
        })
    })

    // Check that roots are marked. We should be able to
    // do this before the forEachP, but based on issue
    // #16083 there may be a (harmless) race where we can
    // enter mark 2 while some workers are still scanning
    // stacks. The forEachP ensures these scans are done.
    //
    // TODO(austin): Figure out the race and fix this
    // properly.
    // 檢查所有的root是否都被標記了
    gcMarkRootCheck()

    // Now we can start up mark 2 workers.
    atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 0xffffffff)
    gcController.fractionalUtilizationGoal = prevFractionalGoal

    incnwait := atomic.Xadd(&work.nwait, +1)
    // 如果沒有更多的任務，則執行第二次調用，從mark2階段轉換到mark termination階段
    if incnwait == work.nproc && !gcMarkWorkAvailable(nil) {
        // This loop will make progress because
        // gcBlackenPromptly is now true, so it won't
        // take this same "if" branch.
        goto top
    }
} else {
    // Transition to mark termination.
    now := nanotime()
    work.tMarkTerm = now
    work.pauseStart = now
    getg().m.preemptoff = "gcing"
    if trace.enabled {
        traceGCSTWStart(0)
    }
    systemstack(stopTheWorldWithSema)
    // The gcphase is _GCmark, it will transition to _GCmarktermination
    // below. The important thing is that the wb remains active until
    // all marking is complete. This includes writes made by the GC.

    // Record that one root marking pass has completed.
    work.markrootDone = true

    // Disable assists and background workers. We must do
    // this before waking blocked assists.
    atomic.Store(&gcBlackenEnabled, 0)

    // Wake all blocked assists. These will run when we
    // start the world again.
    // 喚醒所有的輔助GC
    gcWakeAllAssists()

    // Likewise, release the transition lock. Blocked
    // workers and assists will run when we start the
    // world again.
    semrelease(&work.markDoneSema)

    // endCycle depends on all gcWork cache stats being
    // flushed. This is ensured by mark 2.
    // 計算下一次gc出發的閾值
    nextTriggerRatio := gcController.endCycle()

    // Perform mark termination. This will restart the world.
    // start the world，並進入完成階段
    gcMarkTermination(nextTriggerRatio)
}

}
gcMarkTermination
結束標記，並進行清掃等工作

func gcMarkTermination(nextTriggerRatio float64) {
// World is stopped.
// Start marktermination which includes enabling the write barrier.
atomic.Store(&gcBlackenEnabled, 0)
gcBlackenPromptly = false
// 設置GC的階段標識
setGCPhase(_GCmarktermination)

work.heap1 = memstats.heap_live
startTime := nanotime()

mp := acquirem()
mp.preemptoff = "gcing"
_g_ := getg()
_g_.m.traceback = 2
gp := _g_.m.curg
// 設置當前g的狀態爲waiting狀態
casgstatus(gp, _Grunning, _Gwaiting)
gp.waitreason = waitReasonGarbageCollection

// Run gc on the g0 stack. We do this so that the g stack
// we're currently running on will no longer change. Cuts
// the root set down a bit (g0 stacks are not scanned, and
// we don't need to scan gc's internal state).  We also
// need to switch to g0 so we can shrink the stack.
systemstack(func() {
    // 通過g0掃描當前g的棧
    gcMark(startTime)
    // Must return immediately.
    // The outer function's stack may have moved
    // during gcMark (it shrinks stacks, including the
    // outer function's stack), so we must not refer
    // to any of its variables. Return back to the
    // non-system stack to pick up the new addresses
    // before continuing.
})

systemstack(func() {
    work.heap2 = work.bytesMarked
    if debug.gccheckmark > 0 {
        // Run a full stop-the-world mark using checkmark bits,
        // to check that we didn't forget to mark anything during
        // the concurrent mark process.
        // 如果啓用了gccheckmark，則檢查所有可達對象是否都有標記
        gcResetMarkState()
        initCheckmarks()
        gcMark(startTime)
        clearCheckmarks()
    }

    // marking is complete so we can turn the write barrier off
    // 設置gc的階段標識，GCoff時會關閉寫屏障
    setGCPhase(_GCoff)
    // 開始清掃
    gcSweep(work.mode)

    if debug.gctrace > 1 {
        startTime = nanotime()
        // The g stacks have been scanned so
        // they have gcscanvalid==true and gcworkdone==true.
        // Reset these so that all stacks will be rescanned.
        gcResetMarkState()
        finishsweep_m()

        // Still in STW but gcphase is _GCoff, reset to _GCmarktermination
        // At this point all objects will be found during the gcMark which
        // does a complete STW mark and object scan.
        setGCPhase(_GCmarktermination)
        gcMark(startTime)
        setGCPhase(_GCoff) // marking is done, turn off wb.
        gcSweep(work.mode)
    }
})

_g_.m.traceback = 0
casgstatus(gp, _Gwaiting, _Grunning)

if trace.enabled {
    traceGCDone()
}

// all done
mp.preemptoff = ""

if gcphase != _GCoff {
    throw("gc done but gcphase != _GCoff")
}

// Update GC trigger and pacing for the next cycle.
// 更新下次出發gc的增長比
gcSetTriggerRatio(nextTriggerRatio)

// Update timing memstats
// 更新用時
now := nanotime()
sec, nsec, _ := time_now()
unixNow := sec*1e9 + int64(nsec)
work.pauseNS += now - work.pauseStart
work.tEnd = now
atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user
atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us
memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS)
memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow)
memstats.pause_total_ns += uint64(work.pauseNS)

// Update work.totaltime.
sweepTermCpu := int64(work.stwprocs) * (work.tMark - work.tSweepTerm)
// We report idle marking time below, but omit it from the
// overall utilization here since it's "free".
markCpu := gcController.assistTime + gcController.dedicatedMarkTime + gcController.fractionalMarkTime
markTermCpu := int64(work.stwprocs) * (work.tEnd - work.tMarkTerm)
cycleCpu := sweepTermCpu + markCpu + markTermCpu
work.totaltime += cycleCpu

// Compute overall GC CPU utilization.
totalCpu := sched.totaltime + (now-sched.procresizetime)*int64(gomaxprocs)
memstats.gc_cpu_fraction = float64(work.totaltime) / float64(totalCpu)

// Reset sweep state.
// 重置清掃的狀態
sweep.nbgsweep = 0
sweep.npausesweep = 0

// 如果是強制開啓的gc，標識增加
if work.userForced {
    memstats.numforcedgc++
}

// Bump GC cycle count and wake goroutines waiting on sweep.
// 統計執行GC的次數然後喚醒等待清掃的G
lock(&work.sweepWaiters.lock)
memstats.numgc++
injectglist(work.sweepWaiters.head.ptr())
work.sweepWaiters.head = 0
unlock(&work.sweepWaiters.lock)

// Finish the current heap profiling cycle and start a new
// heap profiling cycle. We do this before starting the world
// so events don't leak into the wrong cycle.
mProf_NextCycle()
// start the world
systemstack(func() { startTheWorldWithSema(true) })

// Flush the heap profile so we can start a new cycle next GC.
// This is relatively expensive, so we don't do it with the
// world stopped.
mProf_Flush()

// Prepare workbufs for freeing by the sweeper. We do this
// asynchronously because it can take non-trivial time.
prepareFreeWorkbufs()

// Free stack spans. This must be done between GC cycles.
systemstack(freeStackSpans)

// Print gctrace before dropping worldsema. As soon as we drop
// worldsema another cycle could start and smash the stats
// we're trying to print.
if debug.gctrace > 0 {
    util := int(memstats.gc_cpu_fraction * 100)

    var sbuf [24]byte
    printlock()
    print("gc ", memstats.numgc,
        " @", string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6, 3)), "s ",
        util, "%: ")
    prev := work.tSweepTerm
    for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} {
        if i != 0 {
            print("+")
        }
        print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev))))
        prev = ns
    }
    print(" ms clock, ")
    for i, ns := range []int64{sweepTermCpu, gcController.assistTime, gcController.dedicatedMarkTime + gcController.fractionalMarkTime, gcController.idleMarkTime, markTermCpu} {
        if i == 2 || i == 3 {
            // Separate mark time components with /.
            print("/")
        } else if i != 0 {
            print("+")
        }
        print(string(fmtNSAsMS(sbuf[:], uint64(ns))))
    }
    print(" ms cpu, ",
        work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ",
        work.heapGoal>>20, " MB goal, ",
        work.maxprocs, " P")
    if work.userForced {
        print(" (forced)")
    }
    print("\n")
    printunlock()
}

semrelease(&worldsema)
// Careful: another GC cycle may start now.

releasem(mp)
mp = nil

// now that gc is done, kick off finalizer thread if needed
// 如果不是並行GC，則讓當前M開始調度
if !concurrentSweep {
    // give the queued finalizers, if any, a chance to run
    Gosched()
}

}
goSweep
清掃任務

func gcSweep(mode gcMode) {
if gcphase != _GCoff {
throw(“gcSweep being done but phase is not GCoff”)
}

lock(&mheap_.lock)
// sweepgen在每次GC之後都會增長2，每次GC之後sweepSpans的角色都會互換
mheap_.sweepgen += 2
mheap_.sweepdone = 0
if mheap_.sweepSpans[mheap_.sweepgen/2%2].index != 0 {
    // We should have drained this list during the last
    // sweep phase. We certainly need to start this phase
    // with an empty swept list.
    throw("non-empty swept list")
}
mheap_.pagesSwept = 0
unlock(&mheap_.lock)
// 如果不是並行GC，或者強制GC
if !_ConcurrentSweep || mode == gcForceBlockMode {
    // Special case synchronous sweep.
    // Record that no proportional sweeping has to happen.
    lock(&mheap_.lock)
    mheap_.sweepPagesPerByte = 0
    unlock(&mheap_.lock)
    // Sweep all spans eagerly.
    // 清掃所有的span
    for sweepone() != ^uintptr(0) {
        sweep.npausesweep++
    }
    // Free workbufs eagerly.
    // 釋放所有的 workbufs
    prepareFreeWorkbufs()
    for freeSomeWbufs(false) {
    }
    // All "free" events for this mark/sweep cycle have
    // now happened, so we can make this profile cycle
    // available immediately.
    mProf_NextCycle()
    mProf_Flush()
    return
}

// Background sweep.
lock(&sweep.lock)
// 喚醒後臺清掃任務,也就是 bgsweep 函數，清掃流程跟上面非並行清掃差不多
if sweep.parked {
    sweep.parked = false
    ready(sweep.g, 0, true)
}
unlock(&sweep.lock)

}
sweepone
接下來我們就分析一下sweepone 清掃的流程

func sweepone() uintptr {
g := getg()
sweepRatio := mheap_.sweepPagesPerByte // For debugging

// increment locks to ensure that the goroutine is not preempted
// in the middle of sweep thus leaving the span in an inconsistent state for next GC
_g_.m.locks++
// 檢查是否已經完成了清掃
if atomic.Load(&mheap_.sweepdone) != 0 {
    _g_.m.locks--
    return ^uintptr(0)
}
// 增加清掃的worker數量
atomic.Xadd(&mheap_.sweepers, +1)

npages := ^uintptr(0)
sg := mheap_.sweepgen
for {
    // 循環獲取需要清掃的span
    s := mheap_.sweepSpans[1-sg/2%2].pop()
    if s == nil {
        atomic.Store(&mheap_.sweepdone, 1)
        break
    }
    if s.state != mSpanInUse {
        // This can happen if direct sweeping already
        // swept this span, but in that case the sweep
        // generation should always be up-to-date.
        if s.sweepgen != sg {
            print("runtime: bad span s.state=", s.state, " s.sweepgen=", s.sweepgen, " sweepgen=", sg, "\n")
            throw("non in-use span in unswept list")
        }
        continue
    }
    // sweepgen == h->sweepgen - 2, 表示這個span需要清掃
    // sweepgen == h->sweepgen - 1, 表示這個span正在被清掃
    // 這是裏確定span的狀態及嘗試轉換span的狀態
    if s.sweepgen != sg-2 || !atomic.Cas(&s.sweepgen, sg-2, sg-1) {
        continue
    }
    npages = s.npages
    // 單個span的清掃
    if !s.sweep(false) {
        // Span is still in-use, so this returned no
        // pages to the heap and the span needs to
        // move to the swept in-use list.
        npages = 0
    }
    break
}

// Decrement the number of active sweepers and if this is the
// last one print trace information.
// 當前worker清掃任務完成，更新sweepers的數量
if atomic.Xadd(&mheap_.sweepers, -1) == 0 && atomic.Load(&mheap_.sweepdone) != 0 {
    if debug.gcpacertrace > 0 {
        print("pacer: sweep done at heap size ", memstats.heap_live>>20, "MB; allocated ", (memstats.heap_live-mheap_.sweepHeapLiveBasis)>>20, "MB during sweep; swept ", mheap_.pagesSwept, " pages at ", sweepRatio, " pages/byte\n")
    }
}
_g_.m.locks--
return npages

}
mspan.sweep
func (s *mspan) sweep(preserve bool) bool {
// It’s critical that we enter this function with preemption disabled,
// GC must not start while we are in the middle of this function.
g := getg()
if g.m.locks == 0 && g.m.mallocing == 0 && g != g.m.g0 {
throw(“MSpan_Sweep: m is not locked”)
}
sweepgen := mheap_.sweepgen
// 只有正在清掃中狀態的span纔可以正常執行
if s.state != mSpanInUse || s.sweepgen != sweepgen-1 {
print(“MSpan_Sweep: state=”, s.state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, “\n”)
throw(“MSpan_Sweep: bad span state”)
}

if trace.enabled {
    traceGCSweepSpan(s.npages * _PageSize)
}
// 先更新清掃的page數
atomic.Xadd64(&mheap_.pagesSwept, int64(s.npages))

spc := s.spanclass
size := s.elemsize
res := false

c := _g_.m.mcache
freeToHeap := false

// The allocBits indicate which unmarked objects don't need to be
// processed since they were free at the end of the last GC cycle
// and were not allocated since then.
// If the allocBits index is >= s.freeindex and the bit
// is not marked then the object remains unallocated
// since the last GC.
// This situation is analogous to being on a freelist.

// Unlink & free special records for any objects we're about to free.
// Two complications here:
// 1. An object can have both finalizer and profile special records.
//    In such case we need to queue finalizer for execution,
//    mark the object as live and preserve the profile special.
// 2. A tiny object can have several finalizers setup for different offsets.
//    If such object is not marked, we need to queue all finalizers at once.
// Both 1 and 2 are possible at the same time.
specialp := &s.specials
special := *specialp
// 判斷在special中的對象是否存活，是否至少有一個finalizer，釋放沒有finalizer的對象，把有finalizer的對象組成隊列
for special != nil {
    // A finalizer can be set for an inner byte of an object, find object beginning.
    objIndex := uintptr(special.offset) / size
    p := s.base() + objIndex*size
    mbits := s.markBitsForIndex(objIndex)
    if !mbits.isMarked() {
        // This object is not marked and has at least one special record.
        // Pass 1: see if it has at least one finalizer.
        hasFin := false
        endOffset := p - s.base() + size
        for tmp := special; tmp != nil && uintptr(tmp.offset) < endOffset; tmp = tmp.next {
            if tmp.kind == _KindSpecialFinalizer {
                // Stop freeing of object if it has a finalizer.
                mbits.setMarkedNonAtomic()
                hasFin = true
                break
            }
        }
        // Pass 2: queue all finalizers _or_ handle profile record.
        for special != nil && uintptr(special.offset) < endOffset {
            // Find the exact byte for which the special was setup
            // (as opposed to object beginning).
            p := s.base() + uintptr(special.offset)
            if special.kind == _KindSpecialFinalizer || !hasFin {
                // Splice out special record.
                y := special
                special = special.next
                *specialp = special
                freespecial(y, unsafe.Pointer(p), size)
            } else {
                // This is profile record, but the object has finalizers (so kept alive).
                // Keep special record.
                specialp = &special.next
                special = *specialp
            }
        }
    } else {
        // object is still live: keep special record
        specialp = &special.next
        special = *specialp
    }
}

if debug.allocfreetrace != 0 || raceenabled || msanenabled {
    // Find all newly freed objects. This doesn't have to
    // efficient; allocfreetrace has massive overhead.
    mbits := s.markBitsForBase()
    abits := s.allocBitsForIndex(0)
    for i := uintptr(0); i < s.nelems; i++ {
        if !mbits.isMarked() && (abits.index < s.freeindex || abits.isMarked()) {
            x := s.base() + i*s.elemsize
            if debug.allocfreetrace != 0 {
                tracefree(unsafe.Pointer(x), size)
            }
            if raceenabled {
                racefree(unsafe.Pointer(x), size)
            }
            if msanenabled {
                msanfree(unsafe.Pointer(x), size)
            }
        }
        mbits.advance()
        abits.advance()
    }
}

// Count the number of free objects in this span.
// 獲取需要釋放的alloc對象的總數
nalloc := uint16(s.countAlloc())
// 如果sizeclass爲0，卻分配的總數量爲0，則釋放到mheap
if spc.sizeclass() == 0 && nalloc == 0 {
    s.needzero = 1
    freeToHeap = true
}
nfreed := s.allocCount - nalloc
if nalloc > s.allocCount {
    print("runtime: nelems=", s.nelems, " nalloc=", nalloc, " previous allocCount=", s.allocCount, " nfreed=", nfreed, "\n")
    throw("sweep increased allocation count")
}

s.allocCount = nalloc
// 判斷span是否empty
wasempty := s.nextFreeIndex() == s.nelems
// 重置freeindex
s.freeindex = 0 // reset allocation index to start of span.
if trace.enabled {
    getg().m.p.ptr().traceReclaimed += uintptr(nfreed) * s.elemsize
}

// gcmarkBits becomes the allocBits.
// get a fresh cleared gcmarkBits in preparation for next GC
// 重置 allocBits爲 gcMarkBits
s.allocBits = s.gcmarkBits
// 重置 gcMarkBits
s.gcmarkBits = newMarkBits(s.nelems)

// Initialize alloc bits cache.
// 更新allocCache
s.refillAllocCache(0)

// We need to set s.sweepgen = h.sweepgen only when all blocks are swept,
// because of the potential for a concurrent free/SetFinalizer.
// But we need to set it before we make the span available for allocation
// (return it to heap or mcentral), because allocation code assumes that a
// span is already swept if available for allocation.
if freeToHeap || nfreed == 0 {
    // The span must be in our exclusive ownership until we update sweepgen,
    // check for potential races.
    if s.state != mSpanInUse || s.sweepgen != sweepgen-1 {
        print("MSpan_Sweep: state=", s.state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
        throw("MSpan_Sweep: bad span state after sweep")
    }
    // Serialization point.
    // At this point the mark bits are cleared and allocation ready
    // to go so release the span.
    atomic.Store(&s.sweepgen, sweepgen)
}

if nfreed > 0 && spc.sizeclass() != 0 {
    c.local_nsmallfree[spc.sizeclass()] += uintptr(nfreed)
    // 把span釋放到mcentral上
    res = mheap_.central[spc].mcentral.freeSpan(s, preserve, wasempty)
    // MCentral_FreeSpan updates sweepgen
} else if freeToHeap {
    // 這裏是大對象的span釋放，與117行呼應
    // Free large span to heap

    // NOTE(rsc,dvyukov): The original implementation of efence
    // in CL 22060046 used SysFree instead of SysFault, so that
    // the operating system would eventually give the memory
    // back to us again, so that an efence program could run
    // longer without running out of memory. Unfortunately,
    // calling SysFree here without any kind of adjustment of the
    // heap data structures means that when the memory does
    // come back to us, we have the wrong metadata for it, either in
    // the MSpan structures or in the garbage collection bitmap.
    // Using SysFault here means that the program will run out of
    // memory fairly quickly in efence mode, but at least it won't
    // have mysterious crashes due to confused memory reuse.
    // It should be possible to switch back to SysFree if we also
    // implement and then call some kind of MHeap_DeleteSpan.
    if debug.efence > 0 {
        s.limit = 0 // prevent mlookup from finding this span
        sysFault(unsafe.Pointer(s.base()), size)
    } else {
        // 把sapn釋放到mheap上
        mheap_.freeSpan(s, 1)
    }
    c.local_nlargefree++
    c.local_largefree += size
    res = true
}
if !res {
    // The span has been swept and is still in-use, so put
    // it on the swept in-use list.
    // 如果span未釋放到mcentral或mheap，表示span仍然處於in-use狀態
    mheap_.sweepSpans[sweepgen/2%2].push(s)
}
return res

}
ok，至此Go的GC流程已經分析完成了，結合最上面開始的圖，可能會容易理解一點

參考文檔
Golang源碼探索(三) GC的實現原理
《Go語言學習筆記》
一張圖瞭解三色標記法
轉載：https://segmentfault.com/a/1190000020086769

Go-垃圾回收機制

lightdb hash index的性能和限制

OSC 在線更改表結構

日誌處理Es+Kinbana+Spark

Go-垃圾回收機制

Go的併發模型CSP實現

MySQL技術內幕：SQL編程

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結