Redis 之壓縮列表

Redis 中的五種類型，在底層存儲上並不是唯一的，而是依據 redisObject 中 encoding 來選擇更適合的編碼方式。比如上一篇介紹的字符串，就有 int、embstr、raw 三種，而且在不同的場景是動態變動的，比如 embstr 進行 append 操作後 encoding 就改成了 raw。

127.0.0.1:6379> hmset person name molaifeng age 18 sex female
OK
127.0.0.1:6379> object encoding person
"ziplist"

今天介紹的 ziplist 也就是壓縮列表也是如此，列表、哈希、有序數組的在底層存儲中都直接或間接的用到了它。通讀了 ziplist 相關源碼，發現精華就體現在壓縮二字上，列表作爲其輔助，共同構成了一種節約內存的線性數據結構。

壓縮列表在存儲結構上比較特殊，沒有像 dict、sds 相關的結構體，而是使用 char *zl 字節數組來表示.

// ziplist.c

/* Create a new empty ziplist. */
unsigned char *ziplistNew(void) {
    unsigned int bytes = ZIPLIST_HEADER_SIZE+ZIPLIST_END_SIZE;
    unsigned char *zl = zmalloc(bytes);
    ZIPLIST_BYTES(zl) = intrev32ifbe(bytes);
    ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE);
    ZIPLIST_LENGTH(zl) = 0;
    zl[bytes-1] = ZIP_END;
    return zl;
}

下面使用一個 模擬的結構體 來介紹下各個成員。

struct ziplist {
	uint32_t uzlbytes; /* 4 個字節，表示整個 ziplist 佔用的字節數 */
	uint32_t zltail; /* 4 個字節，存儲到鏈表最後一個節點的偏移值 */
	uint16_t zllen; /* 2 個字節，存儲到鏈表中節點的個數 */
	uint8_t zlend; /* 1 個字節，硬編碼 0xFF 標識鏈表的結束 */
} ziplist;

內存佈局如下：

再來看看操作 ziplist_header 常用的宏

// ziplist.c

/* Return total bytes a ziplist is composed of. */
#define ZIPLIST_BYTES(zl)       (*((uint32_t*)(zl)))

zl 爲 ziplist 字節數組的首地址，zlbyte 類型爲 uint32_t，那麼 (*((uint32_t*)(zl))) 就是指向ziplist 中 zlbyte 字段。使用這個宏就可以進而獲取整個 ziplist 所佔的內存總字節數了。

// ziplist.c

/* Return the offset of the last item inside the ziplist. */
#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t))))

*((uint32_t*)((zl) 就是上面的 ZIPLIST_BYTES 宏，指向 zlbyte，再加上 4 個字節，就指向 zltail 了，因爲 zlbyte 本身佔四個字節。獲取 zltail 的偏移量，利用首地址 zltail 偏移，就獲取最後一個 zlentry 。

// ziplist.c

/* Return the length of a ziplist, or UINT16_MAX if the length cannot be
 * determined without scanning the whole ziplist. */
#define ZIPLIST_LENGTH(zl)      (*((uint16_t*)((zl)+sizeof(uint32_t)*2)))

參照之前的模擬結構體，通過首地址偏移 2*4 個字節，就得到了 zllen，也就知道了 ziplist 有多少個節點。

// ziplist.c

/* The size of a ziplist header: two 32 bit integers for the total
 * bytes count and last item offset. One 16 bit integer for the number
 * of items field. */
#define ZIPLIST_HEADER_SIZE     (sizeof(uint32_t)*2+sizeof(uint16_t))

獲取整個 ziplist header 佔用的字節數 2*4+2 = 10，推導方法還是剛剛提到的模擬結構體。

// ziplist.c

/* Size of the "end of ziplist" entry. Just one byte. */
#define ZIPLIST_END_SIZE        (sizeof(uint8_t))

ziplist 結尾標識所佔的內存，1 個字節。

// ziplist.c

/* Return the pointer to the first entry of a ziplist. */
#define ZIPLIST_ENTRY_HEAD(zl)  ((zl)+ZIPLIST_HEADER_SIZE)

獲取第一個 zlentry 節點地址，利用前面提到的 ZIPLIST_HEADER_SIZE 宏可以得知整個 ziplist header 所佔的字節數，zl+ZIPLIST_HEADER_SIZE 就獲取第一個節點地址了。

// ziplist.c

/* Return the pointer to the last entry of a ziplist, using the
 * last entry offset inside the ziplist header. */
#define ZIPLIST_ENTRY_TAIL(zl)  ((zl)+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)))

獲取最後一個節點的地址，ZIPLIST_TAIL_OFFSET 通過這個宏能知道 zltail 地址，然後 zl + zltail 就指向了最後一個節點。

// ziplist.c

/* Return the pointer to the last byte of a ziplist, which is, the
 * end of ziplist FF entry. */
#define ZIPLIST_ENTRY_END(zl)   ((zl)+intrev32ifbe(ZIPLIST_BYTES(zl))-1)

獲取 ziplist 的 zlend 地址，ZIPLIST_BYTES 表示整個 ziplist 所佔的字節數，-1 就是向前偏移一個字節，就是 zlend 。

介紹完這些常用的宏，再回頭看看一開始說 ziplist 是字節數組時貼出的代碼段，就一目瞭然了。

壓縮列表頭部介紹完了，接下來就是重頭戲壓縮節點，知識點挺多的，且看我娓娓道來。

// ziplist.c

/* We use this function to receive information about a ziplist entry.
 * Note that this is not how the data is actually encoded, is just what we
 * get filled by a function in order to operate more easily. */
typedef struct zlentry {
    unsigned int prevrawlensize; /* 存儲 prevrawlen 所需的字節大小 */
    unsigned int prevrawlen;     /* 上一個節點的長度 */
    unsigned int lensize;        /* 存儲 len 所需要的字節大小 */
    unsigned int len;            /* 當前節點的長度 */
    unsigned int headersize;     /* 當前節點的頭部大小（prevrawlensize + lensize），即非數據域的大小 */
    unsigned char encoding;      /* 編碼類型，說明節點存儲的是整型還是字符串 */
    unsigned char *p;            /* 指向節點的指針，也就是當前元素的首地址 */
} zlentry;

別看上面的結構體有 7 個字段，其實有的字段是爲了快速計算用的，比如 headersize，定位某個節點，偏移 headersize 個字節數，就能快速定位到節點所存儲值的首地址。下面看看簡化版的 zlentry 結構圖。

prevrawlen 表示前一個節點的字節長度，佔 1 個或 5 個字節。

前一個節點長度小於 254 個字節點時，用 1 個字節表示。
前一個字節長度大於等於 254 個字節時，用 5 個字節表示。這 5 個字節中的第一個字節爲 0xFE(也就是二進制的 254)，後面的 4 個字節纔是表示前一個節點的長度。至於爲什麼不是 255，因爲在 ziplist 字節數組裏提到， zlend 爲結束標識，十六進制爲 0xFF，其實換算成二進制就是 255，如此一來就形成了歧義，因此就用 0xFE。

假設當前節點的首地址爲 p，那麼 p-prevrawlen 就可以定位到上一個節點的首地址，反向迭代，從而實現壓縮列表從尾到頭的遍歷。

len/encoding，len 表示元素數據內容的長度，encoding 表示編碼類型，也就是存儲的值爲字符串還是整數，這裏面用到的算法就深深體現了壓縮列表的壓縮二字。

字符串
- 00 xxxxxx： 00 表示編碼，長度使用 1 個字節表示，剩餘的 6 位比特位用來表示具體的字節長度；
- 01 xxxxxx xxxxxxxx：01 表示編碼，長度使用 2 個字節表示，剩餘的 14 位比特用來表示具體的字節長度；
- 10______ xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx：10 用來表示編碼，長度使用 5 個字節，10 接下來的 6 位比特不使用，再接下來的 4 個字節用來表示具體的字節長度。
整數，encoding 長度皆爲 1 個字節，皆爲 11 開頭，第 3 位和第 4 位可判斷整數的具體類型。
- 11 00 0000：表示 int16_t；
- 11 01 0000：表示 int32_t；
- 11 10 0000：表示 int64_t；
- 11 11 0000：3 字節長有符號整數；
- 11 11 1110：表示 int8_t；
- 11 11 xxxx：該項比較特殊，編碼和 *p 是放在一起，該項 xxxx 表示實際的數據項，由於 0xFF與 zlend 衝突，0xFE 與 int8_t 編碼衝突，0x10 與 3 字節有符號整數衝突，因此 0XFF/0XFE/0x00 均不使用，而最小的值爲 0，最大的值爲 12

具體的實現可參照下面宏，其中 ZIP_STR_ 爲字符串相關宏， ZIP_INT_ 爲整數相關宏。

// ziplist.c

#define ZIP_STR_MASK 0xc0 /* 1100 0000 */
#define ZIP_INT_MASK 0x30 /* 0011 000 */
#define ZIP_STR_06B (0 << 6) /* 0000 0000，字符串編碼類型 */
#define ZIP_STR_14B (1 << 6) /* 0100 0000，字符串編碼類型 */
#define ZIP_STR_32B (2 << 6) /* 1000 0000，字符串編碼類型 */
#define ZIP_INT_16B (0xc0 | 0<<4)/* 1100 0000，整數編碼類型 */
#define ZIP_INT_32B (0xc0 | 1<<4)/* 1101 0000，整數編碼類型 */
#define ZIP_INT_64B (0xc0 | 2<<4)/* 1110 0000，整數編碼類型 */
#define ZIP_INT_24B (0xc0 | 3<<4)/* 1111 0000，整數編碼類型 */
#define ZIP_INT_8B 0xfe /* 1111 1110，整數編碼類型 */

/* 4 bit integer immediate encoding */
#define ZIP_INT_IMM_MASK 0x0f /* 0000 1111 */
#define ZIP_INT_IMM_MIN 0xf1    /* 1111 0001 */
#define ZIP_INT_IMM_MAX 0xfd    /* 1111 1101 */
#define ZIP_INT_IMM_VAL(v) (v & ZIP_INT_IMM_MASK)

規則就介紹到這，下面來看看 Redis 是如何解碼壓縮列表的元素再存儲於 zlentry 結構體的。

// ziplist.c

/* Return a struct with all information about an entry. */
void zipEntry(unsigned char *p, zlentry *e) {

    ZIP_DECODE_PREVLEN(p, e->prevrawlensize, e->prevrawlen);
    ZIP_DECODE_LENGTH(p + e->prevrawlensize, e->encoding, e->lensize, e->len);
    e->headersize = e->prevrawlensize + e->lensize;
    e->p = p;
}

函數體內兩個宏，兩個賦值語句，實現從指針 p 中提取出節點的各個屬性，並將屬性保存到 zlentry 結構，然後返回。

// ziplist.c

/* Return the length of the previous element, and the number of bytes that
 * are used in order to encode the previous element length.
 * 'ptr' must point to the prevlen prefix of an entry (that encodes the
 * length of the previous entry in order to navigate the elements backward).
 * The length of the previous entry is stored in 'prevlen', the number of
 * bytes needed to encode the previous entry length are stored in
 * 'prevlensize'. */
#define ZIP_DECODE_PREVLEN(ptr, prevlensize, prevlen) do {                     \
    ZIP_DECODE_PREVLENSIZE(ptr, prevlensize);                                  \
    if ((prevlensize) == 1) {                                                  \
        (prevlen) = (ptr)[0];                                                  \
    } else if ((prevlensize) == 5) {                                           \
        assert(sizeof((prevlen)) == 4);                                        \
        memcpy(&(prevlen), ((char*)(ptr)) + 1, 4);                             \
        memrev32ifbe(&prevlen);                                                \
    }                                                                          \
} while(0);

/* Return the number of bytes used to encode the length of the previous
 * entry. The length is returned by setting the var 'prevlensize'. */
#define ZIP_DECODE_PREVLENSIZE(ptr, prevlensize) do {                          \
    if ((ptr)[0] < ZIP_BIG_PREVLEN) {                                          \
        (prevlensize) = 1;                                                     \
    } else {                                                                   \
        (prevlensize) = 5;                                                     \
    }                                                                          \
} while(0);

通過 ZIP_DECODE_PREVLEN 這個宏，把 ptr 節點的上一個節點的長度存儲於 prevrawlen，prevrawlensize 則存儲着具體的值。比如上一個節點長度爲 255，那麼 prevrawlen 存放 255，同時由於 prevrawlen 不小於 254 則用 5 個字節存放，於是 prevrawlensize 值爲 5，又由於第一個字節爲 0xFE，後四個字節存放具體的長度，便用 C 的 memcpy(&(prevlen), ((char*)(ptr)) + 1, 4) 來存放。

// ziplist.c

/* Decode the entry encoding type and data length (string length for strings,
 * number of bytes used for the integer for integer entries) encoded in 'ptr'.
 * The 'encoding' variable will hold the entry encoding, the 'lensize'
 * variable will hold the number of bytes required to encode the entry
 * length, and the 'len' variable will hold the entry length. */
#define ZIP_DECODE_LENGTH(ptr, encoding, lensize, len) do {                    \
    ZIP_ENTRY_ENCODING((ptr), (encoding));                                     \
    if ((encoding) < ZIP_STR_MASK) {                                           \
        if ((encoding) == ZIP_STR_06B) {                                       \
            (lensize) = 1;                                                     \
            (len) = (ptr)[0] & 0x3f;                                           \
        } else if ((encoding) == ZIP_STR_14B) {                                \
            (lensize) = 2;                                                     \
            (len) = (((ptr)[0] & 0x3f) << 8) | (ptr)[1];                       \
        } else if ((encoding) == ZIP_STR_32B) {                                \
            (lensize) = 5;                                                     \
            (len) = ((ptr)[1] << 24) |                                         \
                    ((ptr)[2] << 16) |                                         \
                    ((ptr)[3] <<  8) |                                         \
                    ((ptr)[4]);                                                \
        } else {                                                               \
            panic("Invalid string encoding 0x%02X", (encoding));               \
        }                                                                      \
    } else {                                                                   \
        (lensize) = 1;                                                         \
        (len) = zipIntSize(encoding);                                          \
    }                                                                          \
} while(0);

/* Extract the encoding from the byte pointed by 'ptr' and set it into
 * 'encoding' field of the zlentry structure. */
#define ZIP_ENTRY_ENCODING(ptr, encoding) do {  \
    (encoding) = (ptr[0]); \
    if ((encoding) < ZIP_STR_MASK) (encoding) &= ZIP_STR_MASK; \
} while(0)

/* Return bytes needed to store integer encoded by 'encoding'. */
unsigned int zipIntSize(unsigned char encoding) {
    switch(encoding) {
	    case ZIP_INT_8B:  return 1;
	    case ZIP_INT_16B: return 2;
	    case ZIP_INT_24B: return 3;
	    case ZIP_INT_32B: return 4;
	    case ZIP_INT_64B: return 8;
    }
    if (encoding >= ZIP_INT_IMM_MIN && encoding <= ZIP_INT_IMM_MAX)
        return 0; /* 4 bit immediate */
    panic("Invalid integer encoding 0x%02X", encoding);
    return 0;
}

這一步則很關鍵了，通過 ZIP_DECODE_LENGTH 宏解碼了 encoding 相關邏輯。前面說了 encoding 中 00、01、10 開頭的爲字符串，同時對應的長度爲 1、2、5；11 開頭的爲整數，長度固定爲 1 個字節。對應到代碼中就是 encoding 爲具體的編碼方式， lensize 存儲着長度，len 存儲着節點元素具體內容的長度。這裏再強調下 len 這個字段，比如 encoding 的編碼方式爲 ZIP_STR_14B，也就是此節點存儲的是字符串，那麼 lensize 爲 1 個字節，但字符串的長度則是存在 len 這個字段裏；如果 encoding 爲整數，那麼需要注意一點是，當條件滿足 (encoding >= ZIP_INT_IMM_MIN && encoding <= ZIP_INT_IMM_MAX) 時，len 字段爲 0，因爲此時的值存放在 encoding 的後四位。

最後聊下連鎖更新。

刪除壓縮列表中 P 位置 zlentry1 的節點：由於 zlentry1 之後節點長度皆爲 253 個字節，那麼這些節點的 prerawlensize 都爲 1 個字節。當刪除 zlentry1 節點後，zlentry2 的前置節點就爲 zlentry0 了，而 zlentry0 的長度爲 512 個字節，prerawlensize 字段需要 5 個字節，也就是加了 4 個字節（zlentry prerawlen 爲 128 字節，其 prerawlensize 只需 1 個字節），那麼 prerawlen 就擴展爲 253+4= 257 個字節了。而 zlentry2 又作爲 zlentry3 的前置節點，在 prerawlen 擴展爲 257 個字節後，zlentry2 用來存儲的 prerawlen 的prerawlensize 也需要加 4 個字節，後面的節點就以此類推。而每次擴展都將重新分配內存，導致效率很低。

在壓縮列表中 P 位置，添加個長度爲 512 個字節的節點 zlentryX，分析邏輯和刪除一樣。

儘管連鎖跟新的對於 Redis 性能有所影響，但是也得需要滿足條件

首先，壓縮列表裏要恰好有多個連續的、長度介於 250 字節至 253 字節之間的節點（之所以 250~253，可以參見上面的刪除節點時的解釋），連鎖更新纔有可能被引發，在實際中，這種情況並不多見；
其次，即使出現連鎖更新，但只要被更新的節點數量不多，就不會對性能造成任何影響：比如說，對三五個節點進行連鎖更新是絕對不會影響性能的；

上面提到的都是因爲前置節點擴展導致連鎖更新，那麼縮小了呢。比如一開始前置節點長度爲 512，後來變成了 125 了，那麼當前節點存儲前置節點 prerawlen 的 prerawlensize 是否也需要由 5 個字節縮小爲 1 個字節呢。答案是不需要，在 Redis 中爲了防止出現反覆的縮小/擴展而出現的抖動（flapping），便只處理擴展的而不處理縮小的。

【注】此博文中的 Redis 版本爲 5.0。

參考書籍：

【1】redis設計與實現（第二版）
【2】Redis 5設計與源碼分析

Redis 之壓縮列表

DAPPER 事務 TRANSACTION

Postman 批量測試接口

C 和指針第 6 章指針筆記

Redis 之跳錶

git rebase 合併多個 commit

Go 面向對象

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結