redis底層數據結構之dict 字典2

針對上一文中提出的問題，這一次就進行解答：

由rehash過程可以看出，在rehash過程中，ht[0]和ht[1]同時具有條目，即字典中的所有條目分佈在ht[0]和ht[1]中，

這時麻煩也就出來了。主要有以下問題：（現在暫不解答是如何解決的）

1.如何查找key。

2.如何插入新的key。

3.如何刪除一個key。

4.如何確保rehash過程不斷插入、刪除條目，而rehash沒有出錯。

5.如何遍歷dict所有條目，如何確保遍歷順序。

6.如何確保迭代器有效，且正確。

1. 如何查找key

dictEntry *dictFind(dict *d, const void *key)
{
    dictEntry *he;
    unsigned int h, idx, table;
    if (d->ht[0].size == 0) return NULL; /* We don't have a table at all */
    if (dictIsRehashing(d)) _dictRehashStep(d);//如果正在進行rehash，則進行一次rehash操作
    h = dictHashKey(d, key);//計算key的哈希值
    //先在ht[0]表上查找
    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask;
        he = d->ht[table].table[idx];
        while(he) {
            if (dictCompareKeys(d, key, he->key))
                return he;
            he = he->next;
        }
        //在ht[0]上找不到時，如果現在正進行rehash，key有可能在ht[1]上，需要在ht[1]上查找
        if (!dictIsRehashing(d)) return NULL;
    }
    return NULL;
}

因爲rehash時，ht[0]與ht[1]上都有條目，所以需要在兩個表中都查找不到元素時，才能確定元素是否存在。至於先查找哪一個表，並不會影響結果。

在查找過程中，如果正在進行rehash，則會進行一次rehash操作，這樣的做法跟rehash的實現是相對應的，因爲rehash並不會一次完成，需要分成多次完成。那麼如何分成多次，什麼時候該執行一次rehash操作？在dictRehash函數中已經知道是如何分成多次的，執行則是分散到一些操作中，如查找元素等。這樣分散rehash步驟不會對一次查詢請求有很大的影響，保持查詢性能的穩定。

2. 如何插入新的key

//添加條目到字典中
/* Add an element to the target hash table */
int dictAdd(dict *d, void *key, void *val)
{
    dictEntry *entry = dictAddRaw(d,key);//插入key
    if (!entry) return DICT_ERR;
    dictSetVal(d, entry, val);//設置key所對應的value
    return DICT_OK;
}
/* Low level add. This function adds the entry but instead of setting
 * a value returns the dictEntry structure to the user, that will make
 * sure to fill the value field as he wishes.
 *
 * This function is also directly exposed to the user API to be called
 * mainly in order to store non-pointers inside the hash value, example:
 *
 * entry = dictAddRaw(dict,mykey);
 * if (entry != NULL) dictSetSignedIntegerVal(entry,1000);
 *
 * Return values:
 *
 * If key already exists NULL is returned.
 * If key was added, the hash entry is returned to be manipulated by the caller.
 */
dictEntry *dictAddRaw(dict *d, void *key)
{
    int index;
    dictEntry *entry;
    dictht *ht;
    if (dictIsRehashing(d)) _dictRehashStep(d);  //rehash
    //如果key已經存在，則返回null
    /* Get the index of the new element, or -1 if
     * the element already exists. */
    if ((index = _dictKeyIndex(d, key)) == -1)
        return NULL;
    //如果正在進行rehash，則就把新的元素插入到ht[1]中，否則插入到ht[0]
    /* Allocate the memory and store the new entry */
    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
    entry = zmalloc(sizeof(*entry));
    entry->next = ht->table[index];
    ht->table[index] = entry;
    ht->used++;
    /* Set the hash entry fields. */
    dictSetKey(d, entry, key);  //插入
    return entry;
}

當dict沒有進行rehash時，元素插入到ht[0]是比較容易的。但如果正在進行rehash，則要把元素插入到ht[1]中。爲什麼一定要把元素插入到ht[1]中，而不能是ht[0]？原因就在rehash的過程。rehash的過程是把條目由ht[0]移動到ht[1]的過程，當所有條目都移動完畢時，rehash的過程也就完成。要保證rehash過程能完成，需要注意幾點：

a. ht[0]的元素不能一直在增，即使元素在增長也不能快於移動元素到ht[1]的速度。

b. 確定下一個要移動的條目（如按某種方法支確定下一個條目，能否遍歷所有ht[0]上的條目）

c. 確定何時移動完所有條目

元素不能插入到ht[0]的原因，就是確保b。rehash過程中，通過rehashidx記錄已經處理過的桶，因爲rehashidx是線性增長的，終會遍歷完ht[0]上所有的桶，但要想rehash能遍歷所有的條目，則還需要確保被處理過的桶不能再插入新的元素。所以新的元素只能插入到ht[1]上。另外，因爲沒有新的元素插入到ht[0]中，a 也得到確保。

3.如何刪除一個key。

//先在ht[0]中查找，如找不到則在ht[1]中查找，有則刪除。
/* Search and remove an element */
static int dictGenericDelete(dict *d, const void *key, int nofree)
{
    unsigned int h, idx;
    dictEntry *he, *prevHe;
    int table;
    if (d->ht[0].size == 0) return DICT_ERR; /* d->ht[0].table is NULL */
    if (dictIsRehashing(d)) _dictRehashStep(d);
    h = dictHashKey(d, key);
    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask;
        he = d->ht[table].table[idx];
        prevHe = NULL;
        while(he) {
            if (dictCompareKeys(d, key, he->key)) {
                /* Unlink the element from the list */
                if (prevHe)
                    prevHe->next = he->next;
                else
                    d->ht[table].table[idx] = he->next;
                if (!nofree) {
                    dictFreeKey(d, he);
                    dictFreeVal(d, he);
                }
                zfree(he);
                d->ht[table].used--;
                return DICT_OK;
            }
            prevHe = he;
            he = he->next;
        }
        if (!dictIsRehashing(d)) break;
    }
    return DICT_ERR; /* not found */
}

4.如何確保rehash過程不斷插入、刪除條目，而rehash沒有出錯。

從插入和刪除過程可以看出，是不會使rehash出錯的。

5. 如何遍歷dict所有條目，如何確保遍歷順序。

6.如何確保迭代器有效，且正確。

dict的遍歷是用迭代器，迭代器有兩種，一種是普通的迭代器，一種是安全迭代器，相比而言，普通迭代器就是不安全了。

迭代器是很多數據結構（容器）都會有的用於遍歷數據元素的工具。使用迭代器需要注意一些問題：

a. 迭代器的遍歷順序

b. 迭代器遍歷元素過程中是否可以改變容器的元素，如改變容器的元素會有什麼影響，如遍歷順序、迭代器失效

現在了看看dict的迭代器。

遍歷順序不確定，基本可認爲是無序。

普通迭代器不允許在遍歷過程中個性dict。安全迭代器則允許。

下面看代碼，

//創建一個普通迭代器
dictIterator *dictGetIterator(dict *d)
{
    dictIterator *iter = zmalloc(sizeof(*iter));
    iter->d = d;  //記錄dict
    iter->table = 0;
    iter->index = -1;
    iter->safe = 0; //普通迭代器
    iter->entry = NULL;
    iter->nextEntry = NULL;
    return iter;
}

//創建一個安全迭代器
dictIterator *dictGetSafeIterator(dict *d) {
    dictIterator *i = dictGetIterator(d);
    i->safe = 1;  //安全迭代器
    return i;
}
//遍歷過程
dictEntry *dictNext(dictIterator *iter)
{
    while (1) {
        if (iter->entry == NULL) {
            //當前條目爲null，可能是剛創建，可能是一個爲空的桶，可能是到達桶的最後一個條目，也可能是遍歷完所有的桶
            dictht *ht = &iter->d->ht[iter->table];
            if (iter->index == -1 && iter->table == 0) {
                //剛創建的迭代器
                if (iter->safe)
                    iter->d->iterators++; //如是安全迭代器，dict中記下
                else
                    iter->fingerprint = dictFingerprint(iter->d); //普通迭代器，記下當前的Fringerprint
            }
            iter->index++; //下一個桶
            if (iter->index >= (long) ht->size) {
                //如果已經遍歷完表，如果當前正在進行rehash，且遍歷完ht[0]，則遍歷ht[1]
                if (dictIsRehashing(iter->d) && iter->table == 0) {
                    iter->table++;
                    iter->index = 0;
                    ht = &iter->d->ht[1];
                } else {
                    break; //遍歷完畢
                }
            }
            //記下當前條目
            iter->entry = ht->table[iter->index];
        } else {
            //指向下一個條目
            iter->entry = iter->nextEntry;
        }
        if (iter->entry) {
            //找到條目，記下此條目的下一個條目
            /* We need to save the 'next' here, the iterator user
             * may delete the entry we are returning. */
            iter->nextEntry = iter->entry->next;
            return iter->entry; //返回找到的條目
        }
    }
    //找不到條目了，已經遍歷完dict
    return NULL;
}

從上面的遍歷過程可以看到迭代器遍歷的三個順序：

a. 先遍歷ht[0]，如果正在進行rehash，則遍歷完ht[0]的所有桶後，遍歷ht[1]

b. 在一個ht中，遍歷是按桶從小到大遍歷

c. 同一個桶中的多個條目，遍歷順序是從鏈頭遍歷到鏈尾，但是條目在鏈中的位置本身也是不確定的。

從上面三個順序中可以得出，迭代器遍歷過程是無序的。

下面來討論迭代器是否能遍歷所有條目的問題。此時要分開普通迭代器與安全迭代器來討論。

普通迭代器，從代碼上看到在普通迭代器開始遍歷時會計算dict的fingerprint，遍歷過程中可以允許dict插入、刪除條目，以及進行rehash。但是，在釋放迭代器時，會比較遍歷完的dict跟遍歷前的dict的fingerprint是否一致，如不一致則程序退出。此時便可以知道，普通迭代器其實並不允許遍歷，儘管遍歷時代碼上並沒有阻止，但最後卻會導致程序出錯退出。不過，比較fingerprint相同，並不能說明dict沒有變化，只能說如果fingerprint不同dict一定發出了變化。

void dictReleaseIterator(dictIterator *iter)

{

if (!(iter->index == -1 && iter->table == 0)) {

if (iter->safe)

iter->d->iterators--;

else

assert(iter->fingerprint == dictFingerprint(iter->d));

}

zfree(iter);

}

安全迭代器，在開始遍歷時會在dict上記下，遍歷過程則跟普通迭代器無區別。那麼在dict上記下有安全迭代器是用來做什麼的呢？通過查找代碼，可以看到使用dict的安全迭代器計數器的地方是 _dictRehashStep 函數。

/* This function performs just a step of rehashing, and only if there are

* no safe iterators bound to our hash table. When we have iterators in the

* middle of a rehashing we can't mess with the two hash tables otherwise

* some element can be missed or duplicated.

* This function is called by common lookup or update operations in the

* dictionary so that the hash table automatically migrates from H1 to H2

* while it is actively used. */

static void _dictRehashStep(dict *d) {

if (d->iterators == 0) dictRehash(d,1); //如果安全迭代器計數器爲0，則允許進行rehash操作

}

而從釋放迭代器的函數 dictReleaseIterator 可以看到並沒有檢查 fingerprint的操作，因此可以得出所謂的安全迭代器，實則是指：

a. 迭代過程中可以允許插入、刪除條目

b. 迭代過程中不會進行rehash，如開始迭代前已經進行了rehash，則迭代開始後rehash會被暫停，直到迭代完成後rehash接着進行。

既然遍歷過程中允許插入、刪除，那如何遍歷過程。

插入元素時，對遍歷過程無大影響，但能否遍歷到剛插入的元素則是不確定的。

刪除元素時，要分四種情況：刪除已經遍歷的元素，刪除當前元素，刪除下一個要遍歷的元素，刪除非下一個要遍歷的未遍歷的元素。

刪除已經遍歷的元素，對遍歷過程是無影響的。

刪除當前元素，對遍歷過程也是無影響的，因爲當前元素已經被訪問，迭代器取下一個元素時不再依靠當前元素。

刪除下一個要遍歷的元素，又可以分成兩種情況，下一個元素已經記錄在迭代器的nextEntry中和沒有記錄在迭代器中。如果下一個元素沒有記錄在迭代器的nextEntry中，對遍歷過程是無影響的。如果已經被記錄在nextEntry中，則迭代器此時失效，企圖訪問下一個元素將會產生不可預期的效果。

刪除非下一個要遍歷的未遍歷的元素，對遍歷過程也是影響的，只是已經刪除了的元素是不會被遍歷到了。

從上面的討論可知，安全迭代器其實也並不是真正的安全，刪除元素時有可能引起迭代器失效。

現在討論爲什麼安全迭代器在遍歷過程中不允許rehash，因爲如果允許rehash，遍歷過程將無法保證，有些元素可能會遍歷多次，有些元素會沒有遍歷到。下面舉一些情景：

a. 迭代器現在遍歷到ht[0]某個元素x，此時x位於2號桶，由於rehash可以進行，剛好把ht[0]的1號桶的元素Y移動到ht[1]中，此後迭代器遍歷完ht[0]後就會遍歷到ht[1]，會把Y再一次遍歷。

b. 迭代器此時正遍歷到ht[1]的4號桶，後面的桶都還沒遍歷，此時rehash過程進行且剛好把ht[0]的所有元素都移動到ht[1]上，rehash過程完成，ht[1]切換到ht[0]。由於迭代器中記錄目前正在遍歷ht[1]，所以此後迭代器遍歷ht[1]（原來的ht[0]）的4號桶後的元素時已經沒有元素了，遍歷過程結束，而實際上還有一些元素沒有被遍歷。

從上面討論可以看出，遍歷過程中是不能允許rehash的。

綜合上面的討論，可以看出，使用安全迭代器，只要不進行刪除元素的操作，遍歷過程基本是沒有問題的，在遍歷開始時已經存在的元素是會被遍歷到的。只不過使用安全迭代器本身對dict是有一定的影響的。一是暫停rehash過程，二是如果一直持有安全迭代器不釋放，rehash過程無法進行下去。

redis底層數據結構之dict 字典2

C語言--右移左移

12款高效開源Wiki系統推薦，打造團隊知識管理利器

一個開源且全面的C#算法實戰教程

dotnet 基於 DirectML 控制檯運行 Phi-3 模型

自定義MyBatis插件

一款.NET開源、功能強大、跨平臺的繪圖庫 - OxyPlot

常用的 Git 指令

鼠標控制軟件有可能和虛擬機軟件產生衝突

sm4加密工具類

大整數字符串相加

逆轉單向鏈表

redis底層數據結構之dict 字典1

簡單的棧

linux中斷

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結