剖析golang map的實現

[TOC]


本文參考的是golang 1.10源碼實現。

golang中map是一個kv對集合。底層使用hash table,用鏈表來解決衝突,通過編譯器配合runtime,所有的map對象都是共用一份代碼。

對比其他語言
c++使用紅黑樹組織,性能稍低但是穩定性很好。使用模版在編譯期生成代碼,好處是效率高,但是缺點是代碼膨脹、編譯時間也會變長。

java使用的是hash table+鏈表/紅黑樹,當bucket內元素超過某個閾值時,該bucket的鏈表會轉換成紅黑樹。java爲了所有map共用一份代碼,規定了只有Object的子類才能使用作爲map的key,缺點是基礎數據類型必須使用object包裝一下才能使用map。

1. 函數選擇

hash函數,有加密型和非加密型。加密型的一般用於加密數據、數字摘要等,典型代表就是md5、sha1、sha256、aes256這種;非加密型的一般就是查找。在map的應用場景中,用的是查找。選擇hash函數主要考察的是兩點:性能、碰撞概率。

具體hash函數的性能比較可以看:http://aras-p.info/blog/2016/08/09/More-Hash-Function-Tests/

golang使用的hash算法根據硬件選擇,如果cpu支持aes,那麼使用aes hash,否則使用memhash,memhash是參考xxhash、cityhash實現的,性能炸裂。

把hash值映射到buckte時,golang會把bucket的數量規整爲2的次冪,而有m=2b,則n%m=n&(m-1),用位運算規避mod的昂貴代價。

2. 結構組成

首先我們看下map的結構:

// A header for a Go map.
type hmap struct {
    // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
    // Make sure this stays in sync with the compiler's definition.
    count     int // # live cells == size of map.  Must be first (used by len() builtin)
    flags     uint8
    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
    hash0     uint32 // hash seed

    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

    extra *mapextra // optional fields
}

// mapextra holds fields that are not present on all maps.
type mapextra struct {
    // If both key and value do not contain pointers and are inline, then we mark bucket
    // type as containing no pointers. This avoids scanning such maps.
    // However, bmap.overflow is a pointer. In order to keep overflow buckets
    // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
    // overflow and oldoverflow are only used if key and value do not contain pointers.
    // overflow contains overflow buckets for hmap.buckets.
    // oldoverflow contains overflow buckets for hmap.oldbuckets.
    // The indirection allows to store a pointer to the slice in hiter.
    overflow    *[]*bmap
    oldoverflow *[]*bmap

    // nextOverflow holds a pointer to a free overflow bucket.
    nextOverflow *bmap
}

// A bucket for a Go map.
type bmap struct {
    // tophash generally contains the top byte of the hash value
    // for each key in this bucket. If tophash[0] < minTopHash,
    // tophash[0] is a bucket evacuation state instead.
    tophash [bucketCnt]uint8
    // Followed by bucketCnt keys and then bucketCnt values.
    // NOTE: packing all the keys together and then all the values together makes the
    // code a bit more complicated than alternating key/value/key/value/... but it allows
    // us to eliminate padding which would be needed for, e.g., map[int64]int8.
    // Followed by an overflow pointer.
}

一個map主要是由三個結構構成:

  1. hmap --- map的最外層的數據結構,包括了map的各種基礎信息、如大小、bucket。
  2. mapextra --- 記錄map的額外信息,例如overflow bucket。
  3. bmap --- 代表bucket,每一個bucket最多放8個kv,最後由一個overflow字段指向下一個bmap,注意key、value、overflow字段都不顯示定義,而是通過maptype計算偏移獲取的。
hmap.001.png

其中hmap.extra.nextOverflow指向的是預分配的overflow bucket,預分配的用完了那麼值就變成nil。
hmap.noverflow是overflow bucket的數量,當B小於16時是準確值,大於等於16時是大概的值。
hmap.count是當前map的元素個數,也就是len()返回的值。

2.1 設計原理

介紹完結構,我們就細說一下這麼設計的原因。

2.1.1 bmap細節

在golang map中出現衝突時,不是每一個key都申請一個結構通過鏈表串起來,而是以bmap爲最小粒度掛載,一個bmap可以放8個kv。這樣減少對象數量,減輕管理內存的負擔,利於gc。
如果插入時,bmap中key超過8,那麼就會申請一個新的bmap(overflow bucket)掛在這個bmap的後面形成鏈表,優先用預分配的overflow bucket,如果預分配的用完了,那麼就malloc一個掛上去。注意golang的map不會shrink,內存只會越用越多,overflow bucket中的key全刪了也不會釋放

hash值的高8位存儲在bucket中的tophash字段。每個桶最多放8個kv對,所以tophash類型是數組[8]uint8。把高八位存儲起來,這樣不用完整比較key就能過濾掉不符合的key,加快查詢速度。實際上當hash值的高八位小於常量minTopHash時,會加上minTopHash,區間[0, minTophash)的值用於特殊標記。查找key時,計算hash值,用hash值的高八位在tophash中查找,有tophash相等的,再去比較key值是否相同。

????? 這裏我不太清楚,1.爲啥小於minTopHash才加 2.爲什麼不是位運算而用加。 剛好top在[0,minHash),或着加上minHash之後溢出到這個區間,豈不是可能誤判?

// tophash calculates the tophash value for hash.
func tophash(hash uintptr) uint8 {
    top := uint8(hash >> (sys.PtrSize*8 - 8))
    if top < minTopHash {
        top += minTopHash
    }
    return top
}

bmap中所有key存在一塊,所有value存在一塊,這樣做方便內存對齊。
當key大於128字節時,bucket的key字段存儲的會是指針,指向key的實際內容;value也是一樣。

我們還知道golang中沒有範型,爲了支持map的範型,golang定義了一個maptype類型,定義了這類key用什麼hash函數、bucket的大小、怎麼比較之類的,通過這個變量來實現範型。

2.1.2 擴容設計

bcuket掛接的鏈表越來越長,性能會退化,那麼就要進行擴容,擴大bucket的數量。

當元素個數/bucket個數大於等於6.5時,就會進行擴容,把bucket數量擴成原本的兩倍,當hash表擴容之後,需要將那些老數據遷移到新table上(源代碼中稱之爲evacuate), 數據搬遷不是一次性完成,而是逐步的完成(在insert和remove時進行搬移),這樣就分攤了擴容的耗時。同時爲了避免有個bucket一直訪問不到導致擴容無法完成,還會進行一個順序擴容,每次因爲寫操作搬遷對應bucket後,還會按順序搬遷未搬遷的bucket,所以最差情況下n次寫操作,就保證搬遷完大小爲n的map。

擴容會建立一個大小是原來2倍的新的表,將舊的bucket搬到新的表中之後,並不會將舊的bucket從oldbucket中刪除,而是加上一個已刪除的標記。

只有當所有的bucket都從舊錶移到新表之後,纔會將oldbucket釋放掉。 如果擴容過程中,閾值又超了呢?如果正在擴容,那麼不會再進行擴容。

總體思路描述完,就看源碼創建、查詢、賦值、刪除的具體實現。

3. 源碼實現

3.1 創建

// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {
    if hint < 0 || hint > int(maxSliceCap(t.bucket.size)) {
        hint = 0
    }

    // initialize Hmap
    if h == nil {
        h = new(hmap)
    }
    h.hash0 = fastrand()

    // find size parameter which will hold the requested # of elements
    B := uint8(0)
    for overLoadFactor(hint, B) {
        B++
    }
    h.B = B

    // allocate initial hash table
    // if B == 0, the buckets field is allocated lazily later (in mapassign)
    // If hint is large zeroing this memory could take a while.
    if h.B != 0 {
        var nextOverflow *bmap
        h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
        if nextOverflow != nil {
            h.extra = new(mapextra)
            h.extra.nextOverflow = nextOverflow
        }
    }

    return h
}

hint是一個啓發值,啓發初建map時創建多少個bucket,如果hint是0那麼就先不分配bucket,lazy分配。大概流程就是設置一下hash seed、bucket數量、實際申請bucket之類的,流程很簡單。

然後我們在看下申請bucket實際幹了啥:

// makeBucketArray initializes a backing array for map buckets.
// 1<<b is the minimum number of buckets to allocate.
// dirtyalloc should either be nil or a bucket array previously
// allocated by makeBucketArray with the same t and b parameters.
// If dirtyalloc is nil a new backing array will be alloced and
// otherwise dirtyalloc will be cleared and reused as backing array.
func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {
    base := bucketShift(b)
    nbuckets := base
    // For small b, overflow buckets are unlikely.
    // Avoid the overhead of the calculation.
    if b >= 4 {
        // Add on the estimated number of overflow buckets
        // required to insert the median number of elements
        // used with this value of b.
        nbuckets += bucketShift(b - 4)
        sz := t.bucket.size * nbuckets
        up := roundupsize(sz)
        if up != sz {
            nbuckets = up / t.bucket.size
        }
    }

    if dirtyalloc == nil {
        buckets = newarray(t.bucket, int(nbuckets))
    } else {
        // dirtyalloc was previously generated by
        // the above newarray(t.bucket, int(nbuckets))
        // but may not be empty.
        buckets = dirtyalloc
        size := t.bucket.size * nbuckets
        if t.bucket.kind&kindNoPointers == 0 {
            memclrHasPointers(buckets, size)
        } else {
            memclrNoHeapPointers(buckets, size)
        }
    }

    if base != nbuckets {
        // We preallocated some overflow buckets.
        // To keep the overhead of tracking these overflow buckets to a minimum,
        // we use the convention that if a preallocated overflow bucket's overflow
        // pointer is nil, then there are more available by bumping the pointer.
        // We need a safe non-nil pointer for the last overflow bucket; just use buckets.
        nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))
        last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))
        last.setoverflow(t, (*bmap)(buckets))
    }
    return buckets, nextOverflow
}

默認創建2b個bucket,如果b大於等於4,那麼就預先額外創建一些overflow bucket。除了最後一個overflow bucket,其餘overflow bucket的overflow指針都是nil,最後一個overflow bucket的overflow指針指向bucket數組第一個元素,作爲哨兵,說明到了到結尾了.

創建簡單流程

3.2 查詢

// mapaccess1 returns a pointer to h[key].  Never returns nil, instead
// it will return a reference to the zero object for the value type if
// the key is not in the map.
// NOTE: The returned pointer may keep the whole map live, so don't
// hold onto it for very long.
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    if raceenabled && h != nil {
        callerpc := getcallerpc()
        pc := funcPC(mapaccess1)
        racereadpc(unsafe.Pointer(h), callerpc, pc)
        raceReadObjectPC(t.key, key, callerpc, pc)
    }
    if msanenabled && h != nil {
        msanread(key, t.key.size)
    }
    if h == nil || h.count == 0 {
        return unsafe.Pointer(&zeroVal[0])
    }
    if h.flags&hashWriting != 0 {
        throw("concurrent map read and map write")
    }
    alg := t.key.alg
    hash := alg.hash(key, uintptr(h.hash0))
    m := bucketMask(h.B)
    b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
    if c := h.oldbuckets; c != nil {
        if !h.sameSizeGrow() {
            // There used to be half as many buckets; mask down one more power of two.
            m >>= 1
        }
        oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
        if !evacuated(oldb) {
            b = oldb
        }
    }
    top := tophash(hash)
    for ; b != nil; b = b.overflow(t) {
        for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey {
                k = *((*unsafe.Pointer)(k))
            }
            if alg.equal(key, k) {
                v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
                if t.indirectvalue {
                    v = *((*unsafe.Pointer)(v))
                }
                return v
            }
        }
    }
    return unsafe.Pointer(&zeroVal[0])
}
  1. 先定位出bucket,如果正在擴容,並且這個bucket還沒搬到新的hash表中,那麼就從老的hash表中查找。

  2. 在bucket中進行順序查找,使用高八位進行快速過濾,高八位相等,再比較key是否相等,找到就返回value。如果當前bucket找不到,就往下找overflow bucket,都沒有就返回零值。

這裏我們可以看到,訪問的時候,並不進行擴容的數據搬遷。並且併發有寫操作時拋異常

這裏要注意的是,t.bucketsize並不是bmap的size,而是bmap加上存儲key、value、overflow指針,所以查找bucket的時候時候用的不是bmap的szie。


查詢簡單流程

3.3 賦值

// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    if h == nil {
        panic(plainError("assignment to entry in nil map"))
    }
    if raceenabled {
        callerpc := getcallerpc()
        pc := funcPC(mapassign)
        racewritepc(unsafe.Pointer(h), callerpc, pc)
        raceReadObjectPC(t.key, key, callerpc, pc)
    }
    if msanenabled {
        msanread(key, t.key.size)
    }
    if h.flags&hashWriting != 0 {
        throw("concurrent map writes")
    }
    alg := t.key.alg
    hash := alg.hash(key, uintptr(h.hash0))

    // Set hashWriting after calling alg.hash, since alg.hash may panic,
    // in which case we have not actually done a write.
    h.flags |= hashWriting

    if h.buckets == nil {
        h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
    }

again:
    bucket := hash & bucketMask(h.B)
    if h.growing() {
        growWork(t, h, bucket)
    }
    b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))
    top := tophash(hash)

    var inserti *uint8
    var insertk unsafe.Pointer
    var val unsafe.Pointer
    for {
        for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                if b.tophash[i] == empty && inserti == nil {
                    inserti = &b.tophash[i]
                    insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
                    val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
                }
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey {
                k = *((*unsafe.Pointer)(k))
            }
            if !alg.equal(key, k) {
                continue
            }
            // already have a mapping for key. Update it.
            if t.needkeyupdate {
                typedmemmove(t.key, k, key)
            }
            val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
            goto done
        }
        ovf := b.overflow(t)
        if ovf == nil {
            break
        }
        b = ovf
    }

    // Did not find mapping for key. Allocate new cell & add entry.

    // If we hit the max load factor or we have too many overflow buckets,
    // and we're not already in the middle of growing, start growing.
    if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
        hashGrow(t, h)
        goto again // Growing the table invalidates everything, so try again
    }

    if inserti == nil {
        // all current buckets are full, allocate a new one.
        newb := h.newoverflow(t, b)
        inserti = &newb.tophash[0]
        insertk = add(unsafe.Pointer(newb), dataOffset)
        val = add(insertk, bucketCnt*uintptr(t.keysize))
    }

    // store new key/value at insert position
    if t.indirectkey {
        kmem := newobject(t.key)
        *(*unsafe.Pointer)(insertk) = kmem
        insertk = kmem
    }
    if t.indirectvalue {
        vmem := newobject(t.elem)
        *(*unsafe.Pointer)(val) = vmem
    }
    typedmemmove(t.key, insertk, key)
    *inserti = top
    h.count++

done:
    if h.flags&hashWriting == 0 {
        throw("concurrent map writes")
    }
    h.flags &^= hashWriting
    if t.indirectvalue {
        val = *((*unsafe.Pointer)(val))
    }
    return val
}
  1. hash表如果正在擴容,並且這次要操作的bucket還沒搬到新hash表中,那麼先進行搬遷(擴容細節下面細說)。

  2. 在buck中尋找key,同時記錄下第一個空位置,如果找不到,那麼就在空位置中插入數據;如果找到了,那麼就更新對應的value;

  3. 找不到key就看下需不需要擴容,需要擴容並且沒有正在擴容,那麼就進行擴容,然後回到第一步。

  4. 找不到key,不需要擴容,但是沒有空slot,那麼就分配一個overflow bucket掛在鏈表結尾,用新bucket的第一個slot放存放數據。

3.4 刪除

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
    if raceenabled && h != nil {
        callerpc := getcallerpc()
        pc := funcPC(mapdelete)
        racewritepc(unsafe.Pointer(h), callerpc, pc)
        raceReadObjectPC(t.key, key, callerpc, pc)
    }
    if msanenabled && h != nil {
        msanread(key, t.key.size)
    }
    if h == nil || h.count == 0 {
        return
    }
    if h.flags&hashWriting != 0 {
        throw("concurrent map writes")
    }

    alg := t.key.alg
    hash := alg.hash(key, uintptr(h.hash0))

    // Set hashWriting after calling alg.hash, since alg.hash may panic,
    // in which case we have not actually done a write (delete).
    h.flags |= hashWriting

    bucket := hash & bucketMask(h.B)
    if h.growing() {
        growWork(t, h, bucket)
    }
    b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
    top := tophash(hash)
search:
    for ; b != nil; b = b.overflow(t) {
        for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            k2 := k
            if t.indirectkey {
                k2 = *((*unsafe.Pointer)(k2))
            }
            if !alg.equal(key, k2) {
                continue
            }
            // Only clear key if there are pointers in it.
            if t.indirectkey {
                *(*unsafe.Pointer)(k) = nil
            } else if t.key.kind&kindNoPointers == 0 {
                memclrHasPointers(k, t.key.size)
            }
            v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
            if t.indirectvalue {
                *(*unsafe.Pointer)(v) = nil
            } else if t.elem.kind&kindNoPointers == 0 {
                memclrHasPointers(v, t.elem.size)
            } else {
                memclrNoHeapPointers(v, t.elem.size)
            }
            b.tophash[i] = empty
            h.count--
            break search
        }
    }

    if h.flags&hashWriting == 0 {
        throw("concurrent map writes")
    }
    h.flags &^= hashWriting
}
  1. 如果正在擴容,並且操作的bucket還沒搬遷完,那麼搬遷bucket。

  2. 找出對應的key,如果key、value是包含指針的那麼會清理指針指向的內存,否則不會回收內存。

3.5 擴容

首先通過賦值、刪除流程,我們可以知道,觸發擴容的是賦值、刪除操作,具體判斷要不要擴容的代碼片段如下:


// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.
func overLoadFactor(count int, B uint8) bool {
    return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}

// tooManyOverflowBuckets reports whether noverflow buckets is too many for a map with 1<<B buckets.
// Note that most of these overflow buckets must be in sparse use;
// if use was dense, then we'd have already triggered regular map growth.
func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
    // If the threshold is too low, we do extraneous work.
    // If the threshold is too high, maps that grow and shrink can hold on to lots of unused memory.
    // "too many" means (approximately) as many overflow buckets as regular buckets.
    // See incrnoverflow for more details.
    if B > 15 {
        B = 15
    }
    // The compiler doesn't see here that B < 16; mask B to generate shorter shift code.
    return noverflow >= uint16(1)<<(B&15)
}

{
    ....
    // If we hit the max load factor or we have too many overflow buckets,
    // and we're not already in the middle of growing, start growing.
    if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
        hashGrow(t, h)
        goto again // Growing the table invalidates everything, so try again
    }
    ....
}

翻譯一下代碼,意思就是:

func overLoadFactor(countint, Buint8) bool {
  // return count>bucketCnt&&uintptr(count) >loadFactorNum*(bucketShift(B)/loadFactorDen)
   return 元素個數>8 && count>bucket數量*6.5
   其中loadFactorNum是常量13,loadFactorDen是常量2,所以是6.5
   bucket數量不算overflow bucket
}
​
func tooManyOverflowBuckets(noverflowuint16, Buint8) bool{
   if B > 15 {
       B=15
   }
   // The compiler doesn't see here that B < 16; mask B to generate shorter shift code.
   return noverflow>=uint16(1)<<(B&15)
}


if (不是正在擴容 && (元素個數/bucket數超過某個值 || 太多overflow bucket)) {
    進行擴容
}

判斷完擴容後,如果需要擴容,那麼第一步需要做的,就是對hash表進行擴容:

//僅對hash表進行擴容,這裏不進行搬遷
func hashGrow(t *maptype, h *hmap) {
    // If we've hit the load factor, get bigger.
    // Otherwise, there are too many overflow buckets,
    // so keep the same number of buckets and "grow" laterally.
    bigger := uint8(1)
    if !overLoadFactor(h.count+1, h.B) {
        bigger = 0
        h.flags |= sameSizeGrow
    }
    oldbuckets := h.buckets
    newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

    flags := h.flags &^ (iterator | oldIterator)
    if h.flags&iterator != 0 {
        flags |= oldIterator
    }
    // commit the grow (atomic wrt gc)
    h.B += bigger
    h.flags = flags
    h.oldbuckets = oldbuckets
    h.buckets = newbuckets
    h.nevacuate = 0
    h.noverflow = 0

    if h.extra != nil && h.extra.overflow != nil {
        // Promote current overflow buckets to the old generation.
        if h.extra.oldoverflow != nil {
            throw("oldoverflow is not nil")
        }
        h.extra.oldoverflow = h.extra.overflow
        h.extra.overflow = nil
    }
    if nextOverflow != nil {
        if h.extra == nil {
            h.extra = new(mapextra)
        }
        h.extra.nextOverflow = nextOverflow
    }

    // the actual copying of the hash table data is done incrementally
    // by growWork() and evacuate().
}

擴容的函數hashGrow其實僅僅是進行一些空間分配,字段的初始化,實際的搬遷操作是在growWork函數中

func growWork(t *maptype, h *hmap, bucket uintptr) {
    // make sure we evacuate the oldbucket corresponding
    // to the bucket we're about to use
    evacuate(t, h, bucket&h.oldbucketmask())

    // evacuate one more oldbucket to make progress on growing
    if h.growing() {
        evacuate(t, h, h.nevacuate)
    }
}

evacuate是進行具體搬遷某個bucket的函數,可以看出growWork會搬遷兩個bucket,一個是入參bucket;另一個是h.nevacuate。這個nevacuate是一個順序累加的值。可以想想如果每次僅僅搬遷進行寫操作(賦值/刪除)的bucket,那麼有可能某些bucket就是一直沒有機會訪問到,那麼擴容就一直沒法完成,總是在擴容中的狀態,因此會額外進行一次順序遷移,理論上,有N個old bucket,最多N次寫操作,那麼必定會搬遷完。

然後我們再看下evacuate具體的實現

func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
    b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
    newbit := h.noldbuckets()
    if !evacuated(b) {
        // TODO: reuse overflow buckets instead of using new ones, if there
        // is no iterator using the old buckets.  (If !oldIterator.)

        // xy contains the x and y (low and high) evacuation destinations.
        var xy [2]evacDst
        x := &xy[0]
        x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
        x.k = add(unsafe.Pointer(x.b), dataOffset)
        x.v = add(x.k, bucketCnt*uintptr(t.keysize))

        if !h.sameSizeGrow() {
            // Only calculate y pointers if we're growing bigger.
            // Otherwise GC can see bad pointers.
            y := &xy[1]
            y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
            y.k = add(unsafe.Pointer(y.b), dataOffset)
            y.v = add(y.k, bucketCnt*uintptr(t.keysize))
        }

        for ; b != nil; b = b.overflow(t) {
            k := add(unsafe.Pointer(b), dataOffset)
            v := add(k, bucketCnt*uintptr(t.keysize))
            for i := 0; i < bucketCnt; i, k, v = i+1, add(k, uintptr(t.keysize)), add(v, uintptr(t.valuesize)) {
                top := b.tophash[I]
                if top == empty {
                    b.tophash[i] = evacuatedEmpty
                    continue
                }
                if top < minTopHash {
                    throw("bad map state")
                }
                k2 := k
                if t.indirectkey {
                    k2 = *((*unsafe.Pointer)(k2))
                }
                var useY uint8
                if !h.sameSizeGrow() {
                    // Compute hash to make our evacuation decision (whether we need
                    // to send this key/value to bucket x or bucket y).
                    hash := t.key.alg.hash(k2, uintptr(h.hash0))
                    if h.flags&iterator != 0 && !t.reflexivekey && !t.key.alg.equal(k2, k2) {
                        // If key != key (NaNs), then the hash could be (and probably
                        // will be) entirely different from the old hash. Moreover,
                        // it isn't reproducible. Reproducibility is required in the
                        // presence of iterators, as our evacuation decision must
                        // match whatever decision the iterator made.
                        // Fortunately, we have the freedom to send these keys either
                        // way. Also, tophash is meaningless for these kinds of keys.
                        // We let the low bit of tophash drive the evacuation decision.
                        // We recompute a new random tophash for the next level so
                        // these keys will get evenly distributed across all buckets
                        // after multiple grows.
                        useY = top & 1
                        top = tophash(hash)
                    } else {
                        if hash&newbit != 0 {
                            useY = 1
                        }
                    }
                }

                if evacuatedX+1 != evacuatedY {
                    throw("bad evacuatedN")
                }

                b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
                dst := &xy[useY]                 // evacuation destination

                if dst.i == bucketCnt {
                    dst.b = h.newoverflow(t, dst.b)
                    dst.i = 0
                    dst.k = add(unsafe.Pointer(dst.b), dataOffset)
                    dst.v = add(dst.k, bucketCnt*uintptr(t.keysize))
                }
                dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
                if t.indirectkey {
                    *(*unsafe.Pointer)(dst.k) = k2 // copy pointer
                } else {
                    typedmemmove(t.key, dst.k, k) // copy value
                }
                if t.indirectvalue {
                    *(*unsafe.Pointer)(dst.v) = *(*unsafe.Pointer)(v)
                } else {
                    typedmemmove(t.elem, dst.v, v)
                }
                dst.i++
                // These updates might push these pointers past the end of the
                // key or value arrays.  That's ok, as we have the overflow pointer
                // at the end of the bucket to protect against pointing past the
                // end of the bucket.
                dst.k = add(dst.k, uintptr(t.keysize))
                dst.v = add(dst.v, uintptr(t.valuesize))
            }
        }
        // Unlink the overflow buckets & clear key/value to help GC.
        if h.flags&oldIterator == 0 && t.bucket.kind&kindNoPointers == 0 {
            b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
            // Preserve b.tophash because the evacuation
            // state is maintained there.
            ptr := add(b, dataOffset)
            n := uintptr(t.bucketsize) - dataOffset
            memclrHasPointers(ptr, n)
        }
    }

    if oldbucket == h.nevacuate {
        advanceEvacuationMark(h, t, newbit)
    }
}

在advanceEvacuationMark中進行nevacuate的累加,遇到已經遷移的bucket會繼續累加,一次最多加1024。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章