Redis源碼剖析之robj(redisObject)

我們在之前的文章中已經瞭解過一部分Redis的數據結構了，尤其是dict 中講到，可以把redis看做一個hashtable，存儲了一堆的key-value，今天就來看下key-value中value的主要存儲結構redisObject(後文統稱robj)。
robj的詳細代碼見object.c

字段詳解

相對與其他幾個數據結構，robj相對簡單，因爲只包含了幾個字段，含義都很明確。

typedef struct redisObject {
    unsigned type:4;       // 數據類型  integer  string  list  set
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). 
                            * redis用24個位來保存LRU和LFU的信息，當使用LRU時保存上次
                            * 讀寫的時間戳(秒),使用LFU時保存上次時間戳(16位 min級) 保存近似統計數8位 */
    int refcount;          // 引用計數 
    void *ptr;              // 指針指向具體存儲的值，類型用type區分
} robj;

核心就五個字段，我們分別來介紹下。

type(4位)

type是表示當然robj裏所存儲的數據類型，目前redis中包含以下幾種類型。

標識符	值	含義
OBJ_STRING	0	字符串(string)
OBJ_LIST	1	列表(list)
OBJ_SET	2	集合(set)
OBJ_ZSET	3	有序集(zset)
OBJ_HASH	4	哈希表(hash)
OBJ_MODULE	5	模塊(module)
OBJ_STREAM	6	流(stream)

encoding(4位)

編碼方式，如果說每個類型只有一種方式，那麼其實type和encoding兩個字段只需要保留一個即可，但redis爲了在各種情況下儘可能介紹內存，對每種類型的數據在不同情況下有不同的編碼格式，所以這裏需要用額外的字段標識出來。目前有以下幾種編碼(redis 6.2)。

標識符	值	含義
OBJ_ENCODING_RAW	0	最原始的標識方式，只有string纔會用到
OBJ_ENCODING_INT	1	整數
OBJ_ENCODING_HT	2	dict
OBJ_ENCODING_ZIPMAP	3	zipmap 目前已經不再使用
OBJ_ENCODING_LINKEDLIST	4	就的鏈表，現在已經不再使用了
OBJ_ENCODING_ZIPLIST	5	ziplist
OBJ_ENCODING_INTSET	6	intset
OBJ_ENCODING_SKIPLIST	7	跳錶 skiplist
OBJ_ENCODING_EMBSTR	8	嵌入式的sds
OBJ_ENCODING_QUICKLIST	9	快表 quicklist
OBJ_ENCODING_STREAM	10	流 stream

這裏有個OBJ_ENCODING_EMBSTR，這裏着重介紹下。

robj *createEmbeddedStringObject(const char *ptr, size_t len) {
    robj *o = zmalloc(sizeof(robj)+sizeof(struct sdshdr8)+len+1);
    struct sdshdr8 *sh = (void*)(o+1);

    o->type = OBJ_STRING;
    o->encoding = OBJ_ENCODING_EMBSTR;
    o->ptr = sh+1;
    o->refcount = 1;
    if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
        o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
    } else {
        o->lru = LRU_CLOCK();
    }

    sh->len = len;
    sh->alloc = len;
    sh->flags = SDS_TYPE_8;
    if (ptr == SDS_NOINIT)
        sh->buf[len] = '\0';
    else if (ptr) {
        memcpy(sh->buf,ptr,len);
        sh->buf[len] = '\0';
    } else {
        memset(sh->buf,0,len+1);
    }
    return o;
}

從上面代碼就可以看出，它是robj和sds的一個結合，將sds直接放在robj裏，這裏限制最多可以存放44字節長度的字符串。因爲robj佔16字節，sdshdr8頭佔3字節，'\0'一個字節，限制字符串最長爲44就可以保證在64個字節裏存放下所有內容（16+3+1+44==64）。

lru(24位)

衆所周知，redis提供了過期數據自動淘汰的策略，如何知道數據是否已經過期？按照什麼樣的策略淘汰數據？這倆問題的答案都和 lru 這個字段有關。redis給了lru這個字段24位，但千萬別以爲字段名叫lru就認爲它只是LRU淘汰策略中才會使用的，其實LFU用的也是這個字段。 我估計是redis作者先寫了lru策略，所以直接就叫lru了，後來再加lfu策略的時候直接複用這個字段了。
lru字段在不同淘汰策略時有不同的含義。當使用LRU時，它就是一個24位的秒級unix時間戳，代表這個數據在第多少秒被更新過。但使用LFU策略時，24位會被分爲兩部分，16位的分鐘級時間戳和8位的特殊計數器，這裏就不再詳解了，更具體可以關注我後續的博文。

refcount

引用計數，表示這個robj目前被多少個地方應用，refcount的出現爲對象複用提供了基礎。瞭解過垃圾回收的同學都知道有中回收策略就是採用計數器的方式，當refcount爲0時，說明該對象已經沒用了，就可以被回收掉了，redis的作者也實現了這種引用回收的策略。

*ptr

這個就很簡單了，前面幾個字段是爲當然robj提供meta信息，那這個字段就是數據具體所在地址。

robj的編解碼

redis向來將內存空間節省做到了極致，這裏redis的作者又對字符串類型的robj做了特殊的編碼處理，以達到節省內存的目的，編碼過程的代碼及註釋如下：

/* 將string類型的robj做特殊編碼，以節省存儲空間  */
robj *tryObjectEncoding(robj *o) {
    long value;
    sds s = o->ptr;
    size_t len;

    /* Make sure this is a string object, the only type we encode
     * in this function. Other types use encoded memory efficient
     * representations but are handled by the commands implementing
     * the type. 
     * 這裏只編碼string對象，其他類型的的編碼都由其對應的實現處理 */
    serverAssertWithInfo(NULL,o,o->type == OBJ_STRING);

    /* We try some specialized encoding only for objects that are
     * RAW or EMBSTR encoded, in other words objects that are still
     * in represented by an actually array of chars.
     * 非sds string直接返回原數據 */
    if (!sdsEncodedObject(o)) return o;

    /* It's not safe to encode shared objects: shared objects can be shared
     * everywhere in the "object space" of Redis and may end in places where
     * they are not handled. We handle them only as values in the keyspace. 
     * 如果是共享的對象，不能編碼，因爲可能會影響到其他地方的使用*/
     if (o->refcount > 1) return o;

    /* Check if we can represent this string as a long integer.
     * Note that we are sure that a string larger than 20 chars is not
     * representable as a 32 nor 64 bit integer. 
     * 檢查是否可以把字符串表示爲一個長整型數。注意如果長度大於20個字符的字符串是
     * 不能被表示爲32或者64位的整數的*/
    len = sdslen(s);
    if (len <= 20 && string2l(s,len,&value)) {
        /* This object is encodable as a long. Try to use a shared object.
         * Note that we avoid using shared integers when maxmemory is used
         * because every object needs to have a private LRU field for the LRU
         * algorithm to work well. 
         * 如果可以被編碼爲long型，且編碼後的值小於OBJ_SHARED_INTEGERS(10000)，且未配
         * 置LRU替換淘汰策略, 就使用這個數的共享對象，相當於所有小於10000的數都是用的同一個robj*/
        if ((server.maxmemory == 0 ||
            !(server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS)) &&
            value >= 0 &&
            value < OBJ_SHARED_INTEGERS)
        {
            decrRefCount(o);
            incrRefCount(shared.integers[value]);
            return shared.integers[value];
        } else {
            /* 否則原來如果是RAW類型，直接轉爲OBJ_ENCODING_INT類型，然後用long來直接存儲字符串 */    
            if (o->encoding == OBJ_ENCODING_RAW) {
                sdsfree(o->ptr);
                o->encoding = OBJ_ENCODING_INT;
                o->ptr = (void*) value;
                return o;
            /*如果是OBJ_ENCODING_EMBSTR，也會轉化爲OBJ_ENCODING_INT，並用long存儲字符串*/
            } else if (o->encoding == OBJ_ENCODING_EMBSTR) {
                decrRefCount(o);
                return createStringObjectFromLongLongForValue(value);
            }
        }
    }
    // 對於那些無法轉爲long的字符串，做如下處理

    /* If the string is small and is still RAW encoded,
     * try the EMBSTR encoding which is more efficient.
     * In this representation the object and the SDS string are allocated
     * in the same chunk of memory to save space and cache misses. 
     * 如果字符串太小，長度小於等於44，直接轉爲OBJ_ENCODING_EMBSTR*/
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT) {
        robj *emb;

        if (o->encoding == OBJ_ENCODING_EMBSTR) return o;
        emb = createEmbeddedStringObject(s,sdslen(s));
        decrRefCount(o);
        return emb;
    }

    /* We can't encode the object...
     *
     * Do the last try, and at least optimize the SDS string inside
     * the string object to require little space, in case there
     * is more than 10% of free space at the end of the SDS string.
     *
     * We do that only for relatively large strings as this branch
     * is only entered if the length of the string is greater than
     * OBJ_ENCODING_EMBSTR_SIZE_LIMIT. 
     * 
     * 如果前面沒有編碼成功，這裏做最後一次嘗試，如果sds有超過10%的可用空閒空間，
     * 且字符長度大於OBJ_ENCODING_EMBSTR_SIZE_LIMIT(44)那嘗試釋放sds中多餘
     * 的空間以節省內存。
     **/
    trimStringObjectIfNeeded(o);

    /* 直接返回原始對象. */
    return o;
}

檢查是否是字符串，如果不是直接返回。
檢查是否是共享對象(refcount > 1)，被共享的對象不做編碼。
如果字符串長度小於等於20，直接可以編碼爲一個long型的整數，這裏小於10000的long對象都是共享的。
如果字符串長度小於等於44，直接用OBJ_ENCODING_EMBSTR存儲。
如果沒有被編碼，且字符串長度超過44，且sds中的空閒空間超過10%，則清除空閒空間，以節省內存。

當然有編碼就有解碼，代碼及如下，相對比較簡單：

/* Get a decoded version of an encoded object (returned as a new object).
 * If the object is already raw-encoded just increment the ref count.
 * 獲取解碼後的對象(返回的是有個新對象)，如果這個對象是個原始類型，只是把引用加一。 */
robj *getDecodedObject(robj *o) {
    robj *dec;

    if (sdsEncodedObject(o)) {
        incrRefCount(o);
        return o;
    }
    if (o->type == OBJ_STRING && o->encoding == OBJ_ENCODING_INT) {
        char buf[32];

        ll2string(buf,32,(long)o->ptr);
        dec = createStringObject(buf,strlen(buf));
        return dec;
    } else {
        serverPanic("Unknown encoding type");
    }
}

引用計數和自動清理

上文已經說到了，redis爲了節省空間，會複用一些對象，沒有引用的對象會被自動清理。作者用了引用計數的方式來實現gc，代碼也比較簡單，如下：

void incrRefCount(robj *o) {
    if (o->refcount < OBJ_FIRST_SPECIAL_REFCOUNT) {
        o->refcount++;
    } else {
        if (o->refcount == OBJ_SHARED_REFCOUNT) {
            /* Nothing to do: this refcount is immutable. */
        } else if (o->refcount == OBJ_STATIC_REFCOUNT) {
            serverPanic("You tried to retain an object allocated in the stack");
        }
    }
}
/* 減少引用計數，如果沒有引用了就釋放內存空間 */
void decrRefCount(robj *o) {
    // 清理空間 
    if (o->refcount == 1) {
        switch(o->type) {
        case OBJ_STRING: freeStringObject(o); break;
        case OBJ_LIST: freeListObject(o); break;
        case OBJ_SET: freeSetObject(o); break;
        case OBJ_ZSET: freeZsetObject(o); break;
        case OBJ_HASH: freeHashObject(o); break;
        case OBJ_MODULE: freeModuleObject(o); break;
        case OBJ_STREAM: freeStreamObject(o); break;
        default: serverPanic("Unknown object type"); break;
        }
        zfree(o);
    } else {
        if (o->refcount <= 0) serverPanic("decrRefCount against refcount <= 0");
        if (o->refcount != OBJ_SHARED_REFCOUNT) o->refcount--;
    }
}

總結

總結下，可以認爲robj有這樣幾個作用。

爲所有類型的value提供一個統一的封裝。
爲數據淘汰保存必要的信息。
實現數據複用，和自動gc功能。

本文是Redis源碼剖析系列博文，同時也有與之對應的Redis中文註釋版，有想深入學習Redis的同學，歡迎star和關注。
Redis中文註解版倉庫：https://github.com/xindoo/Redis
Redis源碼剖析專欄：https://zxs.io/s/1h
如果覺得本文對你有用，歡迎一鍵三連。
本文來自https://blog.csdn.net/xindoo

Redis源碼剖析之robj(redisObject)

字段詳解

type(4位)

encoding(4位)

lru(24位)

refcount

*ptr

robj的編解碼

引用計數和自動清理

總結

win11關閉自動檢測病毒刪文件

千兆寬帶實際網速能到達多少？

7張圖瞭解kafka基本概念

如何寫好技術文檔——來自Google十多年的文檔經驗

30行代碼實現朋友圈自動點贊

30行代碼實現螞蟻森林自動偷能量

Redis源碼剖析之數據過期(expire)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結