Redis數據結構之——SDS

概述：

SDS（simple Dynamic String）

Redis的字符串有2種存儲方式,通過 redisObject 對象裏的 encoding 字段決定，emstr對應常量爲OBJ_ENCODING_EMBSTR，raw對應常量爲OBJ_ENCODING_RAW：

在長度特別短時使用emstr形式存儲，而長度超過44字節時，使用raw形式存儲。爲什麼是44字節呢：44+NULL（結尾）+SDS（19） = 64。embstr將RedisObject對象頭結構和SDS對象連續存儲在一起，使用malloc方法一次分配。
在長度超過44字節時，用raw從初，需要使用兩次malloc方法，RedisObject和SDS在內存地址上一般不連續。

下面是創建一個string類型時的方法源碼：

/* Create a string object with EMBSTR encoding if it is smaller than
 * OBJ_ENCODING_EMBSTR_SIZE_LIMIT, otherwise the RAW encoding is
 * used.
 *
 * The current limit of 44 is chosen so that the biggest string object
 * we allocate as EMBSTR will still fit into the 64 byte arena of jemalloc. */
#define OBJ_ENCODING_EMBSTR_SIZE_LIMIT 44
robj *createStringObject(const char *ptr, size_t len) {
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT)
        return createEmbeddedStringObject(ptr,len);
    else
        return createRawStringObject(ptr,len);
}




/* Create a string object with encoding OBJ_ENCODING_RAW, that is a plain
 * string object where o->ptr points to a proper sds string. */
robj *createRawStringObject(const char *ptr, size_t len) {
    return createObject(OBJ_STRING, sdsnewlen(ptr,len));
}




/* Create a string object with encoding OBJ_ENCODING_EMBSTR, that is
 * an object where the sds string is actually an unmodifiable string
 * allocated in the same chunk as the object itself. */
robj *createEmbeddedStringObject(const char *ptr, size_t len) {
    robj *o = zmalloc(sizeof(robj)+sizeof(struct sdshdr8)+len+1);
    struct sdshdr8 *sh = (void*)(o+1);

    o->type = OBJ_STRING;
    o->encoding = OBJ_ENCODING_EMBSTR;
    o->ptr = sh+1;
    o->refcount = 1;
    if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
        o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
    } else {
        o->lru = LRU_CLOCK();
    }

    sh->len = len;
    sh->alloc = len;
    sh->flags = SDS_TYPE_8;
    if (ptr == SDS_NOINIT)
        sh->buf[len] = '\0';
    else if (ptr) {
        memcpy(sh->buf,ptr,len);
        sh->buf[len] = '\0';
    } else {
        memset(sh->buf,0,len+1);
    }
    return o;
}


/* ===================== Creation and parsing of objects ==================== */
robj *createObject(int type, void *ptr) {
    robj *o = zmalloc(sizeof(*o));
    o->type = type;
    o->encoding = OBJ_ENCODING_RAW;
    o->ptr = ptr;
    o->refcount = 1;

    /* Set the LRU to the current lruclock (minutes resolution), or
     * alternatively the LFU counter. */
    if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
        o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
    } else {
        o->lru = LRU_CLOCK();
    }
    return o;
}

SDS的定義

/* Note: sdshdr5 is never used, we just access the flags byte directly.
 * However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

每個結構體成員的用途：

len：記錄字符串的長度
alloc：排除掉頭和字符串空終止符\0後，分配的大小
flags：標誌位，前3位用來保存sds的類型，後5位未用到（8、16、32、64）
buf：字節數組，用於保存字符串，末尾遵從C語言字符串慣例，保存終止符\0，目的是可以使用部分C語言字符串函數

我們可以看到Redis內部定義了針對不同長度的字符串定義了不同的sdshdr結構體，目的是將len和alloc的兩個字段佔用內存進行優化，都是用的無符號整型。

struct __attribute__ ((__packed__)) 語法是用於取消字節對齊，目的也是爲了優化內存佔用。

flags位

通過標誌位flags字段和7進行&操作，來判斷當前字符串的sds類型（8、16、32、64），比如下面的獲取字符串長度的方法：

#define SDS_TYPE_5  0
#define SDS_TYPE_8  1
#define SDS_TYPE_16 2
#define SDS_TYPE_32 3
#define SDS_TYPE_64 4
#define SDS_TYPE_MASK 7
#define SDS_TYPE_BITS 3
#define SDS_HDR_VAR(T,s) struct sdshdr##T *sh = (void*)((s)-(sizeof(struct sdshdr##T)));
#define SDS_HDR(T,s) ((struct sdshdr##T *)((s)-(sizeof(struct sdshdr##T))))
#define SDS_TYPE_5_LEN(f) ((f)>>SDS_TYPE_BITS)

static inline size_t sdslen(const sds s) {
    unsigned char flags = s[-1];
    switch(flags&SDS_TYPE_MASK) {
        case SDS_TYPE_5:
            return SDS_TYPE_5_LEN(flags);
        case SDS_TYPE_8:
            return SDS_HDR(8,s)->len;
        case SDS_TYPE_16:
            return SDS_HDR(16,s)->len;
        case SDS_TYPE_32:
            return SDS_HDR(32,s)->len;
        case SDS_TYPE_64:
            return SDS_HDR(64,s)->len;
    }
    return 0;
}

SDS和C語言字符串的區別

1、常數複雜度獲取字符串長度

在C語言中，要獲取某個字符串的長度，需要遍歷整個字符串，對遇到的每個字符串進行計數，直到遇到\0爲止，這個操作的複雜度爲O(n)。
在SDS中，額外用一個字段len保存了字符串的長度，獲取長度的複雜度爲O(1)，典型的空間換時間。

2、防止緩衝區溢出

C語言中，假如我們用strcat拼接字符串之前，現有的字符串分配的空間無法容納拼接的字符串大小，數據會溢出到後面緊鄰的空間中。舉個例子：字符串s1爲redis，字符串s2爲golang，在執行stcat(s1, 'abc')之前，忘了爲s1重新分配內存，將會導致字符串s2變爲acblang。
而在SDS中通過空間分配策略避免了緩衝區溢出，當需要對SDS修改時，會首先檢查空間是否滿足需要的長度，如果不滿足會自動擴容，然後再修改。

3、空間預分配

在C語言中，假如我們要拼接字符串，則需要先計算出需要的空間，分配內存後再進行修改，而內存分配操作涉及到系統調用，從用戶態切換到內核態，通過複雜的內存分配算法分配，再由內核態切換到用戶態，整個過程非常耗時。
而在SDS中，如果進行修改時，會先計算分配完後SDS的長度，如果SDS的長度小於1M，比如13字節，則會分配13byte+13byte+1byte（\0）個字節的空間；如果SDS的長度大於1M，比如3M，則會分配3M+1M+1byte（\0）的空間。

4、二進制安全

C語言中默認以\0標誌字符串結尾，所以如果存儲二進制數據會自動截斷，造成數據不完整
SDS中用len計算字符串長度，所以二進制安全

Redis數據結構之——SDS

概述：

SDS的定義

flags位

SDS和C語言字符串的區別

領域驅動設計之：領域建模

Go中閉包的隱含問題

Redis鍵過期策略源碼解析（惰性刪除+定期取樣刪除）

php-redis源碼之長連接、短連接、命令自動檢活

後端開發精品網址集錦

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結