Redis 之跳錶

跳錶，又稱跳躍表，在 Redis 中表現爲 skiplist，是一種有序的數據結構，它通過在每個節點中維持多個指向其他節點的指針，從而達到快速訪問節點的目的。

在正式介紹跳錶前，先來看看 Redis 中的有序集合。

zadd class 87.5 alice 87.5 fred 65.5 charles 94.5 emily

向 class 有序集合裏插入 4 條數據，查看下底層編碼實現。

127.0.0.1:6379> object encoding class
"ziplist"

爲壓縮列表，在上一講裏詳細介紹過，是由一系列特殊編碼的連續內存塊組成的順序型數據結構。

// redis.conf

# Similarly to hashes and lists, sorted sets are also specially encoded in
# order to save a lot of space. This encoding is only used when the length and
# elements of a sorted set are below the following limits:
zset-max-ziplist-entries 128
zset-max-ziplist-value 64

在 Redis 配置文件中，一旦有序集合裏元素不超過 128 個或元素裏的值最大長度不超過 64 時就用 ziplist。在 ziplist 中有序集合是數值在前，score 在後，數據和 score 是一一對應的，因此不超過 128 個，在 ziplist 中就是不超過 128*2 = 256 個，其內存佈局大致如下。

爲了進入到今天的主題，這裏把配置文件中的 zset-max-ziplist-entries 調整爲 1，也就是超過了 1 就是用 skiplist 實現。

127.0.0.1:6379> zadd class 87.5 alice 87.5 fred 65.5 charles 94.5 emily
(integer) 4
127.0.0.1:6379> object encoding class
"skiplist"

這時編碼就顯示爲 skiplist 。

// server.h

typedef struct zset {
    dict *dict;
    zskiplist *zsl;
} zset;

Redis 中使用的是 zset 結構體來表存儲有序集合，裏面包含了字典和跳錶兩種結構。字典用來存儲數值（鍵）和 score（值），從而實現 O(1) 複雜度獲取數值對應的 score；跳躍表用來處理區間查詢的相關操作。

字典在前面已經詳細介紹過，那麼接下來就看看 skiplist 的數據結構。

// server.h

#define ZSKIPLIST_MAXLEVEL 64 /* Should be enough for 2^64 elements */

/* ZSETs use a specialized version of Skiplists */
typedef struct zskiplistNode {
    sds ele; /* 數值 */
    double score; /* 分值，排序用 */
    struct zskiplistNode *backward; /* 向前指針 */
    struct zskiplistLevel {
        struct zskiplistNode *forward; /* 本層下一個節點 */
        unsigned long span; /* 本層下一個節點與當前節點之間的元素個數 */
    } level[]; /* 柔性數組，存儲節點層級相關數據，最大層高 64，可以存儲 2^64 個節點數 */
} zskiplistNode;

typedef struct zskiplist {
    struct zskiplistNode *header, *tail; /* 跳錶的表頭和表尾，頭節點是個特殊的節點，層高 64，在初始化時生成 */
    unsigned long length; /* 跳錶節點長度，除了表頭節點之外的總數 */
    int level; /* 跳錶的高度，除了表頭節點以外的最大高度 */
} zskiplist;

和字典類似，有個 zskiplistNode 結構來存放數值和分數，順序是從小到大，如果分值一致時，則按照數值的字典序排序；有個方便操作的 zskiplist 結構，比如 O(1) 時間複雜度的獲取跳錶中節點的個數，依據表頭或表尾來實現正向或逆向遍歷的 header 和 tail 指針。

如圖所示，zskiplist 是個 zskiplistNode 節點的概括和統籌。記錄了有幾個節點，最高的層級是多少，表頭節點指向哪（方便正向遍歷），表尾節點指向（方便反向遍歷）。

zskiplistNode 的頭節點和第一個存放數組分數的節點的 backward 皆爲 NULL，同時頭結點默認有 64 層，是爲了避免後續高度增加時重新分配內存，其他屬性都爲默認值。層級箭頭上面的數字就是 span ，是爲了計算排名（rank）的，在 Redis 中排名是從 0 開始的。需要注意的是，這裏 score 是按照由小到大排序的，比如要計算 charles 的排名，就是在查找路徑中，把 span 累加起來再減一，1 - 1 爲 0（這裏減一就如前面提到的，在 Redis 中排名是從 0 開始的）；要計算排名日常生活中的排名（由大到小），則需要總長度減去經過的 span 節點數，比如 emily 排名爲 4-4 = 0。

127.0.0.1:6379> zrank class charles
(integer) 0
127.0.0.1:6379> zrevrank class emily
(integer) 0

關於 Redis 的跳錶基礎就介紹到這，接下來說說初始化及常用的 API。

// t_zset.c

/* Create a new skiplist. */
zskiplist *zslCreate(void) {
    int j;
    zskiplist *zsl;

    zsl = zmalloc(sizeof(*zsl));
    zsl->level = 1;
    zsl->length = 0;
    zsl->header = zslCreateNode(ZSKIPLIST_MAXLEVEL,0,NULL);
    for (j = 0; j < ZSKIPLIST_MAXLEVEL; j++) {
        zsl->header->level[j].forward = NULL;
        zsl->header->level[j].span = 0;
    }
    zsl->header->backward = NULL;
    zsl->tail = NULL;
    return zsl;
}

/* Create a skiplist node with the specified number of levels.
 * The SDS string 'ele' is referenced by the node after the call. */
zskiplistNode *zslCreateNode(int level, double score, sds ele) {
    zskiplistNode *zn =
        zmalloc(sizeof(*zn)+level*sizeof(struct zskiplistLevel));
    zn->score = score;
    zn->ele = ele;
    return zn;
}

初始化 skiplist 的兩個函數的邏輯一目瞭然。先計算 zskiplist 結構佔的內存然後申請對應的內存，默認級別爲 1，節點數爲 0，儘管後面申請了頭部節點，但沒算在內，跳錶的表尾節點指向 NULL。跳錶的表頭節點指向動態生成 zskiplistNode 節點，向前指針爲 NULL，數值爲 NULL，分值爲 0，默認創建 64 層，每次都指向 NULL，跨度爲 0。初始化後內存佈局如下。

爲了加深理解，這裏詳細介紹下給跳錶添加節點。節點添加的順序就如同一開始 zadd 命令添加順序，節點的層級如同上上圖中假設的來。

// t_zset.c

/* Insert a new node in the skiplist. Assumes the element does not already
 * exist (up to the caller to enforce that). The skiplist takes ownership
 * of the passed SDS string 'ele'. */
zskiplistNode *zslInsert(zskiplist *zsl, double score, sds ele) {
    zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x; /* udpate 存儲搜索路徑 */
    unsigned int rank[ZSKIPLIST_MAXLEVEL]; /* 存儲跨度 */
    int i, level;

    serverAssert(!isnan(score));
    x = zsl->header;
    for (i = zsl->level-1; i >= 0; i--) {
        /* store rank that is crossed to reach the insert position */
        rank[i] = i == (zsl->level-1) ? 0 : rank[i+1];
        while (x->level[i].forward &&
                (x->level[i].forward->score < score ||
                    (x->level[i].forward->score == score &&
                    sdscmp(x->level[i].forward->ele,ele) < 0)))
        {
            rank[i] += x->level[i].span;
            x = x->level[i].forward;
        }
        update[i] = x;
    }
    /* we assume the element is not already inside, since we allow duplicated
     * scores, reinserting the same element should never happen since the
     * caller of zslInsert() should test in the hash table if the element is
     * already inside or not. */
    level = zslRandomLevel();
    if (level > zsl->level) {
        for (i = zsl->level; i < level; i++) {
            rank[i] = 0;
            update[i] = zsl->header;
            update[i]->level[i].span = zsl->length;
        }
        zsl->level = level;
    }
    x = zslCreateNode(level,score,ele);
    for (i = 0; i < level; i++) {
        x->level[i].forward = update[i]->level[i].forward;
        update[i]->level[i].forward = x;

        /* update span covered by update[i] as x is inserted here */
        x->level[i].span = update[i]->level[i].span - (rank[0] - rank[i]);
        update[i]->level[i].span = (rank[0] - rank[i]) + 1;
    }

    /* increment span for untouched levels */
    for (i = level; i < zsl->level; i++) {
        update[i]->level[i].span++;
    }

    x->backward = (update[0] == zsl->header) ? NULL : update[0];
    if (x->level[0].forward)
        x->level[0].forward->backward = x;
    else
        zsl->tail = x;
    zsl->length++;
    return x;
}

/* Returns a random level for the new skiplist node we are going to create.
 * The return value of this function is between 1 and ZSKIPLIST_MAXLEVEL
 * (both inclusive), with a powerlaw-alike distribution where higher
 * levels are less likely to be returned. */
int zslRandomLevel(void) { /* 隨機返回 1~64 的層級 */
    int level = 1;
    while ((random()&0xFFFF) < (ZSKIPLIST_P * 0xFFFF))
        level += 1;
    return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}

首先添加 87.5 alice

x = zsl->header; /* 表頭賦給 x */
for (i = zsl->level-1; i >= 0; i--) { /* 遍歷頭節點的每個層級，從下標最大層減 1 到 0。由於是首次寫入， zsl->level 爲 1，那麼 i 的值爲 1-1=0 */
    /* store rank that is crossed to reach the insert position */
    rank[i] = i == (zsl->level-1) ? 0 : rank[i+1]; /* 和上面分析一樣， rank[0] = 0 */
    while (x->level[i].forward && /* 由於是首次寫入，頭節點 x 的 forward 節點都指向 NULL， 退出循環 */
            (x->level[i].forward->score < score ||
                (x->level[i].forward->score == score &&
                sdscmp(x->level[i].forward->ele,ele) < 0)))
    {
        rank[i] += x->level[i].span;
        x = x->level[i].forward;
    }
    update[i] = x; /* update[0] = x */
}

第一步，查找要插入的位置。由於是首次插入節點，update[0] 指向 header 節點（update 數組存放搜索路徑），rank[0] 爲 0（rank 數組存放 update 對應節點到待插入節點的 span 值）。

/* we assume the element is not already inside, since we allow duplicated
 * scores, reinserting the same element should never happen since the
 * caller of zslInsert() should test in the hash table if the element is
 * already inside or not. */
level = zslRandomLevel(); /* 獲取層級，範圍在 1~64，假設返回 2 */
if (level > zsl->level) { /* 主要是存儲超過表頭層級的排名及節點 */
    for (i = zsl->level; i < level; i++) { /* 1 < 2 滿足 for 循環 */
        rank[i] = 0; /* rank[1] = 0 */
        update[i] = zsl->header; /* udpate[1] 存儲頭節點 */
        update[i]->level[i].span = zsl->length; /* 更新頭節點的 span 值爲當前節點個數，刨除頭部節點 */
    }
    zsl->level = level; /* 更新 zskiplist 的層級爲最新的 2 */
}

第二步，調整跳錶高度。如果要插入節點的高度大於跳錶的高度，那麼就分別用 rank 和 update 存儲高出的那部分層級的節點信息。

x = zslCreateNode(level,score,ele); /* 創建 87.5 alice 節點 */
for (i = 0; i < level; i++) { /* 這裏 level 爲 2 */
    x->level[i].forward = update[i]->level[i].forward; /* x->level[0].forward = NULL，因爲 update[0] 爲頭結點，而這又是首次寫入 */
    update[i]->level[i].forward = x; /* 這裏很巧妙，更新頭結點的後置指針 */

    /* update span covered by update[i] as x is inserted here */
    x->level[i].span = update[i]->level[i].span - (rank[0] - rank[i]); /* 這裏爲 0 */
    update[i]->level[i].span = (rank[0] - rank[i]) + 1; /* 這裏爲 1 */
}

/* increment span for untouched levels */
for (i = level; i < zsl->level; i++) { /* 如果添加的節點小於默認層級，則更新層級對應的排名 */
    update[i]->level[i].span++;
}

第三步，插入節點。

x->backward = (update[0] == zsl->header) ? NULL : update[0]; // 指向 NULL
if (x->level[0].forward)
    x->level[0].forward->backward = x;
else
    zsl->tail = x; // 走這裏，尾節點執行 x
zsl->length++;  // 節點數加 1

第四步，調整 backward、 zskiplist 的尾結點指針、節點數量。

接下來插入 87.5 fred 節點。

x = zsl->header; /* 表頭賦給 x */
for (i = zsl->level-1; i >= 0; i--) { /* 遍歷頭節點的每個層級，從下標最大層減 1 到 0；i=2-1=1 */
    /* store rank that is crossed to reach the insert position */
    rank[i] = i == (zsl->level-1) ? 0 : rank[i+1]; /* rank[1] = 0,rank[0] = rank[1] = 0 */
    while (x->level[i].forward && /* 前置節點不爲 NULL */
            (x->level[i].forward->score < score || /* 如果前置節點的分值小於當前要插入的節點的分值 */
                (x->level[i].forward->score == score && /* 當分數相同時，則按照字典序來比較兩個數值 */
                sdscmp(x->level[i].forward->ele,ele) < 0))) /* 因爲字典序 fred 大於 alice，因此進入 while 循環  */
    {
        rank[i] += x->level[i].span; /* while 內第一遍歷，rank[1] = 0 + 1 = 1，第二次遍歷 rank[0] = 0 + 1 = 1 */
        x = x->level[i].forward; /* x 此時指向了 alice 節點  */
    }
    update[i] = x; /* update[i] 指向各層的本層下一個節點 */
}

第一步，查找要插入的位置。由於分值相同，都是 87.5，那麼用字典序比較數值，發現 fred 比 alice 大，那麼位置就在 alice 後面。

第二步，更新跳錶的高度。由於 fred 的層級爲 1，這一步跳過。

第三步，插入節點。

第四步，調整 backward、 zskiplist 的尾結點指針、節點數量。

寫入就介紹到這了，這裏主要說下，跳錶寫入節點，時間都耗在查找上面了。這裏着重說下兩個變量，update 數組存放的節點都是 forward 指向要插入的節點，rank 存放的都是 update 裏節點距離要插入節點的跨度。

最後說下刪除節點。

// t_zset.c

/* Internal function used by zslDelete, zslDeleteByScore and zslDeleteByRank */
void zslDeleteNode(zskiplist *zsl, zskiplistNode *x, zskiplistNode **update) {
    int i;
    for (i = 0; i < zsl->level; i++) {
        if (update[i]->level[i].forward == x) {
            update[i]->level[i].span += x->level[i].span - 1;
            update[i]->level[i].forward = x->level[i].forward;
        } else {
            update[i]->level[i].span -= 1; /* 這裏之所以減一，是因爲 udpate[i] 雖然沒有直接指向刪除節點，但高度上超過了 */
        }
    }
    if (x->level[0].forward) {
        x->level[0].forward->backward = x->backward;
    } else {
        zsl->tail = x->backward;
    }
    while(zsl->level > 1 && zsl->header->level[zsl->level-1].forward == NULL)
        zsl->level--;
    zsl->length--;
}

/* Delete an element with matching score/element from the skiplist.
 * The function returns 1 if the node was found and deleted, otherwise
 * 0 is returned.
 *
 * If 'node' is NULL the deleted node is freed by zslFreeNode(), otherwise
 * it is not freed (but just unlinked) and *node is set to the node pointer,
 * so that it is possible for the caller to reuse the node (including the
 * referenced SDS string at node->ele). */
int zslDelete(zskiplist *zsl, double score, sds ele, zskiplistNode **node) {
    zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x;
    int i;

    x = zsl->header;
    for (i = zsl->level-1; i >= 0; i--) {
        while (x->level[i].forward &&
                (x->level[i].forward->score < score ||
                    (x->level[i].forward->score == score &&
                     sdscmp(x->level[i].forward->ele,ele) < 0)))
        {
            x = x->level[i].forward;
        }
        update[i] = x;
    }
    /* We may have multiple elements with the same score, what we need
     * is to find the element with both the right score and object. */
    x = x->level[0].forward;
    if (x && score == x->score && sdscmp(x->ele,ele) == 0) {
        zslDeleteNode(zsl, x, update);
        if (!node)
            zslFreeNode(x);
        else
            *node = x;
        return 1;
    }
    return 0; /* not found */
}

/* Free the specified skiplist node. The referenced SDS string representation
 * of the element is freed too, unless node->ele is set to NULL before calling
 * this function. */
void zslFreeNode(zskiplistNode *node) {
    sdsfree(node->ele);
    zfree(node);
}

這裏分三步：第一步，依據層級高度，遍歷頭結點，把指向刪除節點的各層級節點放入 update 路徑搜索數組；第二步，依據層級高度，遍歷路徑搜索數組，更新對應的 forward 和 span（因爲下一步要釋放刪除節點所佔內存），如要刪除的節點是跳錶的最大高度，則調整跳錶高度；第三步，釋放節點內存。

【注】此博文中的 Redis 版本爲 5.0。

參考書籍：

【1】redis設計與實現（第二版）
【2】Redis 5設計與源碼分析

DAPPER 事務 TRANSACTION

Postman 批量測試接口

C 和指針第 6 章指針筆記

Redis 之跳錶

git rebase 合併多個 commit

Go 面向對象

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結