Redis RDB持久化機制

1. RDB的介紹

因爲Redis是內存數據庫，因此將數據存儲在內存中，如果一旦服務器進程退出，服務器中的數據庫狀態就會消失不見，爲了解決這個問題，Redis提供了兩種持久化的機制：RDB和AOF。本篇主要剖析RDB持久化的過程。

RDB持久化是把當前進程數據生成時間點快照（point-in-time snapshot）保存到硬盤的過程，避免數據意外丟失。

1.1 RDB觸發機制

RDB觸發機制分爲手動觸發和自動觸發。

手動觸發的兩條命令：
- SAVE：阻塞當前Redis服務器，知道RDB過程完成爲止。
- BGSAVE：Redis 進程執行fork()操作創建出一個子進程，在後臺完成RDB持久化的過程。（主流）
自動觸發的配置：
- c save 900 1 //服務器在900秒之內，對數據庫執行了至少1次修改 save 300 10 //服務器在300秒之內，對數據庫執行了至少10修改 save 60 1000 //服務器在60秒之內，對數據庫執行了至少1000修改 // 滿足以上三個條件中的任意一個，則自動觸發 BGSAVE 操作 // 或者使用命令CONFIG SET 命令配置

1.2 RDB持久化的流程

我們用圖來表示 BGSAVE命令的觸發流程，如下圖所示：

RDB命令源碼如下：Redis 3.2 RDB源碼註釋

/* BGSAVE [SCHEDULE] */
// BGSAVE 命令實現
void bgsaveCommand(client *c) {
    int schedule = 0;   //SCHEDULE控制BGSAVE的執行，避免和AOF重寫進程衝突

    /* The SCHEDULE option changes the behavior of BGSAVE when an AOF rewrite
     * is in progress. Instead of returning an error a BGSAVE gets scheduled. */
    if (c->argc > 1) {
        // 設置schedule標誌
        if (c->argc == 2 && !strcasecmp(c->argv[1]->ptr,"schedule")) {
            schedule = 1;
        } else {
            addReply(c,shared.syntaxerr);
            return;
        }
    }

    // 如果正在執行RDB持久化操作，則退出
    if (server.rdb_child_pid != -1) {
        addReplyError(c,"Background save already in progress");

    // 如果正在執行AOF持久化操作，需要將BGSAVE提上日程表
    } else if (server.aof_child_pid != -1) {
        // 如果schedule爲真，設置rdb_bgsave_scheduled爲1，表示將BGSAVE提上日程表
        if (schedule) {
            server.rdb_bgsave_scheduled = 1;
            addReplyStatus(c,"Background saving scheduled");
        } else {    //沒有設置schedule，則不能立即執行BGSAVE
            addReplyError(c,
                "An AOF log rewriting in progress: can't BGSAVE right now. "
                "Use BGSAVE SCHEDULE in order to schedule a BGSAVE whenver "
                "possible.");
        }

    // 執行BGSAVE
    } else if (rdbSaveBackground(server.rdb_filename) == C_OK) {
        addReplyStatus(c,"Background saving started");
    } else {
        addReply(c,shared.err);
    }
}

我們後面會重點講解rdbSaveBackground()函數的工作過程。

1.3 RDB的優缺點

RDB的優點：

RDB是一個緊湊壓縮的二進制文件，代表Redis在某個時間點上的數據快照。非常適用於備份，全景複製等場景。
Redis 加載RDB恢復數據遠遠快於AOF的方式。

RDB的缺點：

RDB沒有辦法做到實時持久化或秒級持久化。因爲BGSAVE每次運行的又要進行fork()的調用創建子進程，這屬於重量級操作，頻繁執行成本過高，因爲雖然Linux支持讀時共享，寫時拷貝(copy-on-write)的技術，但是仍然會有大量的父進程的空間內存頁表，信號控制表，寄存器資源等等的複製。
RDB文件使用特定的二進制格式保存，Redis版本演進的過程中，有多個RDB版本，這導致版本兼容的問題。

2. RDB 的源碼剖析

閱讀此部分，可以跳過源碼，只看文字部分，因爲所有過程的依據我都以源碼的方式給出，因此篇幅會比較長，但是我都以文字解釋，所以可以跳過源碼，只讀文字，理解RDB的過程。也可以上github查看所有代碼的註釋：Redis 3.2 源碼註釋

之前我們給出了 BGSAVE命令的源碼，因此我們就重點剖析 rdbSaveBackground()的工作過程，一層一層的剝開封裝。

在RDB持久化之前需要設置一些標識，用來標識服務器當前的狀態，定義在server.h/struct redisServer 結構體中，我們列出會用到的一部分，如果需要可以在這裏查看。Redis 3.2 源碼註釋

struct redisServer {
    // 數據庫數組，長度爲16
    redisDb *db;
    // 從節點列表和監視器列表
    list *slaves, *qiank;    /* List of slaves and MONITORs */

    /* RDB / AOF loading information ××××××××××××××××××××××××××××××××××××××××××××××××××××××××××*/
    // 正在載入狀態
    int loading;                /* We are loading data from disk if true */

    // 設置載入的總字節
    off_t loading_total_bytes;

    // 已載入的字節數
    off_t loading_loaded_bytes;

    // 載入的開始時間
    time_t loading_start_time;

    // 在load時，用來設置讀或寫的最大字節數max_processing_chunk
    off_t loading_process_events_interval_bytes;

    // 服務器內存使用的
    size_t stat_peak_memory;        /* Max used memory record */

    // 計算fork()的時間
    long long stat_fork_time;       /* Time needed to perform latest fork() */

    // 計算fork的速率，GB/每秒
    double stat_fork_rate;          /* Fork rate in GB/sec. */

    /* RDB persistence ××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××*/
    // 髒鍵，記錄數據庫被修改的次數
    long long dirty;                /* Changes to DB from the last save */

    // 在BGSAVE之前要備份髒鍵dirty的值，如果BGSAVE失敗會還原
    long long dirty_before_bgsave;  /* Used to restore dirty on failed BGSAVE */

    // 執行BGSAVE的子進程的pid
    pid_t rdb_child_pid;            /* PID of RDB saving child */

    // 保存save參數的數組
    struct saveparam *saveparams;   /* Save points array for RDB */

    // 數組長度
    int saveparamslen;              /* Number of saving points */

    // RDB文件的名字，默認爲dump.rdb
    char *rdb_filename;             /* Name of RDB file */

    // 是否採用LZF壓縮算法壓縮RDB文件，默認yes
    int rdb_compression;            /* Use compression in RDB? */

    // RDB文件是否使用校驗和，默認yes
    int rdb_checksum;               /* Use RDB checksum? */

    // 上一次執行SAVE成功的時間
    time_t lastsave;                /* Unix time of last successful save */

    // 最近一個嘗試執行BGSAVE的時間
    time_t lastbgsave_try;          /* Unix time of last attempted bgsave */

    // 最近執行BGSAVE的時間
    time_t rdb_save_time_last;      /* Time used by last RDB save run. */

    // BGSAVE開始的時間
    time_t rdb_save_time_start;     /* Current RDB save start time. */

    // 當rdb_bgsave_scheduled爲真時，才能開始BGSAVE
    int rdb_bgsave_scheduled;       /* BGSAVE when possible if true. */

    // rdb執行的類型，是寫入磁盤，還是寫入從節點的socket
    int rdb_child_type;             /* Type of save by active child. */

    // BGSAVE執行完的狀態
    int lastbgsave_status;          /* C_OK or C_ERR */

    // 如果不能執行BGSAVE則不能寫
    int stop_writes_on_bgsave_err;  /* Don't allow writes if can't BGSAVE */

    // 無磁盤同步，管道的寫端
    int rdb_pipe_write_result_to_parent; /* RDB pipes used to return the state */
    // 無磁盤同步，管道的讀端
    int rdb_pipe_read_result_from_child; /* of each slave in diskless SYNC. */

    /* time cache ××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××*/
    // 保存秒單位的Unix時間戳的緩存
    time_t unixtime;        /* Unix time sampled every cron cycle. */

    // 保存毫秒單位的Unix時間戳的緩存
    long long mstime;       /* Like 'unixtime' but with milliseconds resolution. */

    /* Latency monitor ××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××*/
    // 延遲的閥值
    long long latency_monitor_threshold;
    // 延遲與造成延遲的事件關聯的字典
    dict *latency_events;
};

然後我們直接給rdbSaveBackground()函數出源碼：

在這裏，就可以看見fork()函數的執行，在子進程中執行了rdbSave()函數，父進程則執行了一些設置狀態的操作。

// 後臺進行RDB持久化BGSAVE操作
int rdbSaveBackground(char *filename) {
    pid_t childpid;
    long long start;

    // 當前沒有正在進行AOF和RDB操作，否則返回C_ERR
    if (server.aof_child_pid != -1 || server.rdb_child_pid != -1) return C_ERR;

    // 備份當前數據庫的髒鍵值
    server.dirty_before_bgsave = server.dirty;
    // 最近一個執行BGSAVE的時間
    server.lastbgsave_try = time(NULL);
    // fork函數開始時間，記錄fork函數的耗時
    start = ustime();
    // 創建子進程
    if ((childpid = fork()) == 0) {
        int retval;
        // 子進程執行的代碼
        /* Child */

        // 關閉監聽的套接字
        closeListeningSockets(0);
        // 設置進程標題，方便識別
        redisSetProcTitle("redis-rdb-bgsave");
        // 執行保存操作，將數據庫的寫到filename文件中
        retval = rdbSave(filename);

        if (retval == C_OK) {
            // 得到子進程進程的髒私有虛擬頁面大小，如果做RDB的同時父進程正在寫入的數據，那麼子進程就會拷貝一個份父進程的內存，而不是和父進程共享一份內存。
            size_t private_dirty = zmalloc_get_private_dirty();
            // 將子進程分配的內容寫日誌
            if (private_dirty) {
                serverLog(LL_NOTICE,
                    "RDB: %zu MB of memory used by copy-on-write",
                    private_dirty/(1024*1024));
            }
        }
        // 子進程退出，發送信號給父進程，發送0表示BGSAVE成功，1表示失敗
        exitFromChild((retval == C_OK) ? 0 : 1);
    } else {
        // 父進程執行的代碼
        /* Parent */
        // 計算出fork的執行時間
        server.stat_fork_time = ustime()-start;
        // 計算fork的速率，GB/每秒
        server.stat_fork_rate = (double) zmalloc_used_memory() * 1000000 / server.stat_fork_time / (1024*1024*1024); /* GB per second. */
        //如果fork執行時長，超過設置的閥值，則要將其加入到一個字典中，與傳入"fork"關聯，以便進行延遲診斷
        latencyAddSampleIfNeeded("fork",server.stat_fork_time/1000);

        // 如果fork出錯
        if (childpid == -1) {
            server.lastbgsave_status = C_ERR;   //設置BGSAVE錯誤
            // 更新日誌信息
            serverLog(LL_WARNING,"Can't save in background: fork: %s",
                strerror(errno));
            return C_ERR;
        }
        // 更新日誌信息
        serverLog(LL_NOTICE,"Background saving started by pid %d",childpid);
        server.rdb_save_time_start = time(NULL);    //設置BGSAVE開始的時間
        server.rdb_child_pid = childpid;            //設置負責執行BGSAVE操作的子進程id
        server.rdb_child_type = RDB_CHILD_TYPE_DISK;//設置BGSAVE的類型，往磁盤中寫入
        //關閉哈希表的resize，因爲resize過程中會有複製拷貝動作
        updateDictResizePolicy();
        return C_OK;
    }
    return C_OK; /* unreached */
}

我們接着看rdbSave()函數的源碼：

在該函數中，就可以看見RDB文件的初始操作，剛開始生成一個臨時的RDB文件，只有在執行成功後，纔會進行rename操作，然後以寫權限打開文件，然後調用了rdbSaveRio()函數將數據庫的內容寫到臨時的RDB文件，之後進行刷新緩衝區和同步操作，就關閉文件進行rename操作和更新服務器狀態。

我在此說一下rio，rio是Redis抽象的IO層，它可以面向三種對象，分別是緩衝區，文件IO和socket IO，在這裏是調用rioInitWithFile()初始化了一個文件IO對象rdb，實際上SAVE和LOAD命令分別對rdb對象的寫和讀操作的封裝，因此，可以直接調用rdbSave*一類的函數進行寫操作。具體的rio源碼剖析：Redis 輸入輸出的抽象(rio)源碼剖析和註釋，Redis 在複製部分，還實現了無盤複製，生成的RDB文件不保存在磁盤中，而是直接寫向一個網絡的socket，所以，在初始化rio時，只需調用初始化socket IO的接口，而寫和讀操作的函數接口都不變。

/* Save the DB on disk. Return C_ERR on error, C_OK on success. */
// 將數據庫保存在磁盤上，返回C_OK成功，否則返回C_ERR
int rdbSave(char *filename) {
    char tmpfile[256];
    char cwd[MAXPATHLEN]; /* Current working dir path for error messages. */
    FILE *fp;
    rio rdb;
    int error = 0;

    // 創建臨時文件
    snprintf(tmpfile,256,"temp-%d.rdb", (int) getpid());
    // 以寫方式打開該文件
    fp = fopen(tmpfile,"w");
    // 打開失敗，獲取文件目錄，寫入日誌
    if (!fp) {
        char *cwdp = getcwd(cwd,MAXPATHLEN);
        // 寫日誌信息到logfile
        serverLog(LL_WARNING,
            "Failed opening the RDB file %s (in server root dir %s) "
            "for saving: %s",
            filename,
            cwdp ? cwdp : "unknown",
            strerror(errno));
        return C_ERR;
    }

    // 初始化一個rio對象，該對象是一個文件對象IO
    rioInitWithFile(&rdb,fp);
    // 將數據庫的內容寫到rio中
    if (rdbSaveRio(&rdb,&error) == C_ERR) {
        errno = error;
        goto werr;
    }

    /* Make sure data will not remain on the OS's output buffers */
    // 沖洗緩衝區，確保所有的數據都寫入磁盤
    if (fflush(fp) == EOF) goto werr;
    // 將fp指向的文件同步到磁盤中
    if (fsync(fileno(fp)) == -1) goto werr;
    // 關閉文件
    if (fclose(fp) == EOF) goto werr;

    /* Use RENAME to make sure the DB file is changed atomically only
     * if the generate DB file is ok. */
    // 原子性改變rdb文件的名字
    if (rename(tmpfile,filename) == -1) {
        // 改變名字失敗，則獲得當前目錄路徑，發送日誌信息，刪除臨時文件
        char *cwdp = getcwd(cwd,MAXPATHLEN);
        serverLog(LL_WARNING,
            "Error moving temp DB file %s on the final "
            "destination %s (in server root dir %s): %s",
            tmpfile,
            filename,
            cwdp ? cwdp : "unknown",
            strerror(errno));
        unlink(tmpfile);
        return C_ERR;
    }

    // 寫日誌文件
    serverLog(LL_NOTICE,"DB saved on disk");
    // 重置服務器的髒鍵
    server.dirty = 0;
    // 更新上一次SAVE操作的時間
    server.lastsave = time(NULL);
    // 更新SAVE操作的狀態
    server.lastbgsave_status = C_OK;
    return C_OK;

// rdbSaveRio()函數的寫錯誤處理，寫日誌，關閉文件，刪除臨時文件，發送C_ERR
werr:
    serverLog(LL_WARNING,"Write error saving DB on disk: %s", strerror(errno));
    fclose(fp);
    unlink(tmpfile);
    return C_ERR;
}

因此，我們接着往下挖，查看一下rdbSaveRio()函數幹了什麼。

在rdbSaveRio()函數中，我們已經清楚的看到往RDB文件中寫了什麼內容。

例如：Redis標識，RDB版本號，rdb文件的默認信息，還有就是寫數據庫中的內容，接下來寫入一個EOF碼，最後執行校驗和。因此一個完成的RDB文件如圖所示：

// 將一個RDB格式文件內容寫入到rio中，成功返回C_OK，否則C_ERR和一部分或所有的出錯信息
// 當函數返回C_ERR，並且error不是NULL，那麼error被設置爲一個錯誤碼errno
int rdbSaveRio(rio *rdb, int *error) {
    dictIterator *di = NULL;
    dictEntry *de;
    char magic[10];
    int j;
    long long now = mstime();
    uint64_t cksum;

    // 開啓了校驗和選項
    if (server.rdb_checksum)
        // 設置校驗和的函數
        rdb->update_cksum = rioGenericUpdateChecksum;
    // 將Redis版本信息保存到magic中
    snprintf(magic,sizeof(magic),"REDIS%04d",RDB_VERSION);
    // 將magic寫到rio中
    if (rdbWriteRaw(rdb,magic,9) == -1) goto werr;
    // 將rdb文件的默認信息寫到rio中
    if (rdbSaveInfoAuxFields(rdb) == -1) goto werr;

    // 遍歷所有服務器內的數據庫
    for (j = 0; j < server.dbnum; j++) {
        redisDb *db = server.db+j;      //當前的數據庫指針
        dict *d = db->dict;             //當數據庫的鍵值對字典
        // 跳過爲空的數據庫
        if (dictSize(d) == 0) continue;
        // 創建一個字典類型的迭代器
        di = dictGetSafeIterator(d);
        if (!di) return C_ERR;

        /* Write the SELECT DB opcode */
        // 寫入數據庫的選擇標識碼 RDB_OPCODE_SELECTDB爲254
        if (rdbSaveType(rdb,RDB_OPCODE_SELECTDB) == -1) goto werr;
        // 寫入數據庫的id，佔了一個字節的長度
        if (rdbSaveLen(rdb,j) == -1) goto werr;

        /* Write the RESIZE DB opcode. We trim the size to UINT32_MAX, which
         * is currently the largest type we are able to represent in RDB sizes.
         * However this does not limit the actual size of the DB to load since
         * these sizes are just hints to resize the hash tables. */
        // 寫入調整數據庫的操作碼，我們將大小限制在UINT32_MAX以內，這並不代表數據庫的實際大小，只是提示去重新調整哈希表的大小
        uint32_t db_size, expires_size;
        // 如果字典的大小大於UINT32_MAX，則設置db_size爲最大的UINT32_MAX
        db_size = (dictSize(db->dict) <= UINT32_MAX) ?
                                dictSize(db->dict) :
                                UINT32_MAX;
        // 設置有過期時間鍵的大小超過UINT32_MAX，則設置expires_size爲最大的UINT32_MAX
        expires_size = (dictSize(db->expires) <= UINT32_MAX) ?
                                dictSize(db->expires) :
                                UINT32_MAX;
        // 寫入調整哈希表大小的操作碼，RDB_OPCODE_RESIZEDB = 251
        if (rdbSaveType(rdb,RDB_OPCODE_RESIZEDB) == -1) goto werr;
        // 寫入提示調整哈希表大小的兩個值，如果
        if (rdbSaveLen(rdb,db_size) == -1) goto werr;
        if (rdbSaveLen(rdb,expires_size) == -1) goto werr;

        /* Iterate this DB writing every entry */
        // 遍歷數據庫所有的鍵值對
        while((de = dictNext(di)) != NULL) {
            sds keystr = dictGetKey(de);        //當前鍵
            robj key, *o = dictGetVal(de);      //當前鍵的值
            long long expire;

            // 在棧中創建一個鍵對象並初始化
            initStaticStringObject(key,keystr);
            // 當前鍵的過期時間
            expire = getExpire(db,&key);
            // 將鍵的鍵對象，值對象，過期時間寫到rio中
            if (rdbSaveKeyValuePair(rdb,&key,o,expire,now) == -1) goto werr;
        }
        dictReleaseIterator(di);    //釋放迭代器
    }
    di = NULL; /* So that we don't release it again on error. */

    /* EOF opcode */
    // 寫入一個EOF碼，RDB_OPCODE_EOF = 255
    if (rdbSaveType(rdb,RDB_OPCODE_EOF) == -1) goto werr;

    /* CRC64 checksum. It will be zero if checksum computation is disabled, the
     * loading code skips the check in this case. */
    // CRC64檢驗和，當校驗和計算爲0，沒有開啓是，在載入rdb文件時會跳過
    cksum = rdb->cksum;
    memrev64ifbe(&cksum);
    if (rioWrite(rdb,&cksum,8) == 0) goto werr;
    return C_OK;

// 寫入錯誤
werr:
    if (error) *error = errno;  //保存錯誤碼
    if (di) dictReleaseIterator(di);    //如果沒有釋放迭代器，則釋放
    return C_ERR;
}

調用rdbSaveInfoAuxFields()函數寫入一些默認的輔助信息，具體如下：

/* Save a few default AUX fields with information about the RDB generated. */
// 將一個rdb文件的默認信息寫入到rio中
int rdbSaveInfoAuxFields(rio *rdb) {
    // 判斷主機的總線寬度，是64位還是32位
    int redis_bits = (sizeof(void*) == 8) ? 64 : 32;

    /* Add a few fields about the state when the RDB was created. */
    // 添加rdb文件的狀態信息：Redis版本，redis位數，當前時間和Redis當前使用的內存數
    if (rdbSaveAuxFieldStrStr(rdb,"redis-ver",REDIS_VERSION) == -1) return -1;
    if (rdbSaveAuxFieldStrInt(rdb,"redis-bits",redis_bits) == -1) return -1;
    if (rdbSaveAuxFieldStrInt(rdb,"ctime",time(NULL)) == -1) return -1;
    if (rdbSaveAuxFieldStrInt(rdb,"used-mem",zmalloc_used_memory()) == -1) return -1;
    return 1;
}

因此，一個空數據庫持久化生成的dump.rdb文件，使用od -cx dump.rdb命令查看一下

0000000   R   E   D   I   S   0   0   0   7 372  \t   r   e   d   i   s
           4552    4944    3053    3030    fa37    7209    6465    7369
0000020   -   v   e   r 005   3   .   2   .   8 372  \n   r   e   d   i
           762d    7265    3305    322e    382e    0afa    6572    6964
0000040   s   -   b   i   t   s 300   @ 372 005   c   t   i   m   e 302
           2d73    6962    7374    40c0    05fa    7463    6d69    c265
0000060   u   7  \f   Y 372  \b   u   s   e   d   -   m   e   m 302   0
           3775    590c    08fa    7375    6465    6d2d    6d65    30c2
0000100 211  \f  \0 377   8 341   Y 220 225 346   L 245
           0c89    ff00    e138    9059    e695    a54c
0000114

我們將其統計整合一下：

REDIS0007 372\t                     //Redis版本號：REDIS0007
redis-ver 005 3.2.8 372\n           //Redis的版本：redis-ver 3.2.8
redis-bits 300 @ 372 005            //主機系統位數：redis-bits
ctime 302 246 242 \b Y 372 \b       //RDB操作的時間
userd-mem 302 205 \f \0             //子進程使用的內存量
377                                 //八進制377 = 十六進制255 = EOF常量
8 341 Y 220 225 346 L 245           //校驗和：8字節

雖然大概的看懂了一些，但是仍然還有一些八進制數字看不懂，這就是我們所描述RDB文件的特點：緊湊壓縮。這些都是一些壓縮過的數據或操作碼。接下來，還是通過源碼，查看這些壓縮的規則，Redis將各種類型編碼封裝成許多函數，不利於查看編碼規則，因此，我們就給出rdbLoad()函數，這個函數是服務器啓動時，將RDB文件中的內容載入到數據庫中。

rdbLoad()函數源碼如下：

// 將指定的RDB文件讀到數據庫中
int rdbLoad(char *filename) {
    uint32_t dbid;
    int type, rdbver;
    redisDb *db = server.db+0;
    char buf[1024];
    long long expiretime, now = mstime();   //獲取當前load操作的時間
    FILE *fp;
    rio rdb;

    // 只讀打開文件
    if ((fp = fopen(filename,"r")) == NULL) return C_ERR;

    // 初始化一個文件流對象rio且設置對應文件指針
    rioInitWithFile(&rdb,fp);
    // 設置計算校驗和的函數
    rdb.update_cksum = rdbLoadProgressCallback;
    // 設置載入讀或寫的最大字節數，2M
    rdb.max_processing_chunk = server.loading_process_events_interval_bytes;
    // 讀出9個字節到buf，buf中保存着Redis版本"redis0007"
    if (rioRead(&rdb,buf,9) == 0) goto eoferr;
    buf[9] = '\0';  //"redis0007\0"
    //檢查讀出的版本號標識
    if (memcmp(buf,"REDIS",5) != 0) {
        fclose(fp);
        serverLog(LL_WARNING,"Wrong signature trying to load DB from file");
        errno = EINVAL; //讀出的值非法
        return C_ERR;
    }
    // 轉換成整數檢查版本大小
    rdbver = atoi(buf+5);
    if (rdbver < 1 || rdbver > RDB_VERSION) {
        fclose(fp);
        serverLog(LL_WARNING,"Can't handle RDB format version %d",rdbver);
        errno = EINVAL;
        return C_ERR;
    }

    // 設置載入時server的狀態信息
    startLoading(fp);
    // 開始讀取RDB文件到數據庫中
    while(1) {
        robj *key, *val;
        expiretime = -1;

        /* Read type. */
        // 首先讀出類型
        if ((type = rdbLoadType(&rdb)) == -1) goto eoferr;

        /* Handle special types. */
        // 處理特殊情況
        // 如果首先是讀出過期時間單位爲秒
        if (type == RDB_OPCODE_EXPIRETIME) {
            /* EXPIRETIME: load an expire associated with the next key
             * to load. Note that after loading an expire we need to
             * load the actual type, and continue. */
            // 從rio中讀出過期時間
            if ((expiretime = rdbLoadTime(&rdb)) == -1) goto eoferr;
            /* We read the time so we need to read the object type again. */
            // 從過期時間後讀出一個鍵值對的類型
            if ((type = rdbLoadType(&rdb)) == -1) goto eoferr;
            /* the EXPIRETIME opcode specifies time in seconds, so convert
             * into milliseconds. */
            expiretime *= 1000; //轉換成毫秒

        //讀出過期時間單位爲毫秒
        } else if (type == RDB_OPCODE_EXPIRETIME_MS) {
            /* EXPIRETIME_MS: milliseconds precision expire times introduced
             * with RDB v3. Like EXPIRETIME but no with more precision. */
            // 從rio中讀出過期時間
            if ((expiretime = rdbLoadMillisecondTime(&rdb)) == -1) goto eoferr;
            /* We read the time so we need to read the object type again. */
            // 從過期時間後讀出一個鍵值對的類型
            if ((type = rdbLoadType(&rdb)) == -1) goto eoferr;

        // 如果讀到EOF，則直接跳出循環
        } else if (type == RDB_OPCODE_EOF) {
            /* EOF: End of file, exit the main loop. */
            break;

        // 讀出的是切換數據庫操作
        } else if (type == RDB_OPCODE_SELECTDB) {
            /* SELECTDB: Select the specified database. */
            // 讀取出一個長度，保存的是數據庫的ID
            if ((dbid = rdbLoadLen(&rdb,NULL)) == RDB_LENERR)
                goto eoferr;
            // 檢查讀出的ID是否合法
            if (dbid >= (unsigned)server.dbnum) {
                serverLog(LL_WARNING,
                    "FATAL: Data file was created with a Redis "
                    "server configured to handle more than %d "
                    "databases. Exiting\n", server.dbnum);
                exit(1);
            }
            // 切換數據庫
            db = server.db+dbid;
            // 跳過本層循環，在讀一個type
            continue; /* Read type again. */

        // 如果讀出調整哈希表的操作
        } else if (type == RDB_OPCODE_RESIZEDB) {
            /* RESIZEDB: Hint about the size of the keys in the currently
             * selected data base, in order to avoid useless rehashing. */
            uint32_t db_size, expires_size;
            // 讀出一個數據庫鍵值對字典的大小
            if ((db_size = rdbLoadLen(&rdb,NULL)) == RDB_LENERR)
                goto eoferr;
            // 讀出一個數據庫過期字典的大小
            if ((expires_size = rdbLoadLen(&rdb,NULL)) == RDB_LENERR)
                goto eoferr;
            // 擴展兩個字典
            dictExpand(db->dict,db_size);
            dictExpand(db->expires,expires_size);
            // 重新讀出一個type
            continue; /* Read type again. */

        // 讀出的是一個輔助字段
        } else if (type == RDB_OPCODE_AUX) {
            /* AUX: generic string-string fields. Use to add state to RDB
             * which is backward compatible. Implementations of RDB loading
             * are requierd to skip AUX fields they don't understand.
             *
             * An AUX field is composed of two strings: key and value. */
            robj *auxkey, *auxval;
            // 讀出輔助字段的鍵對象和值對象
            if ((auxkey = rdbLoadStringObject(&rdb)) == NULL) goto eoferr;
            if ((auxval = rdbLoadStringObject(&rdb)) == NULL) goto eoferr;

            // 鍵對象的第一個字符是%
            if (((char*)auxkey->ptr)[0] == '%') {
                /* All the fields with a name staring with '%' are considered
                 * information fields and are logged at startup with a log
                 * level of NOTICE. */
                // 寫日誌信息
                serverLog(LL_NOTICE,"RDB '%s': %s",
                    (char*)auxkey->ptr,
                    (char*)auxval->ptr);
            } else {
                /* We ignore fields we don't understand, as by AUX field
                 * contract. */
                serverLog(LL_DEBUG,"Unrecognized RDB AUX field: '%s'",
                    (char*)auxkey->ptr);
            }

            decrRefCount(auxkey);
            decrRefCount(auxval);
            // 重新讀出一個type
            continue; /* Read type again. */
        }

        /* Read key */
        // 讀出一個key對象
        if ((key = rdbLoadStringObject(&rdb)) == NULL) goto eoferr;
        /* Read value */
        // 讀出一個val對象
        if ((val = rdbLoadObject(type,&rdb)) == NULL) goto eoferr;
        /* Check if the key already expired. This function is used when loading
         * an RDB file from disk, either at startup, or when an RDB was
         * received from the master. In the latter case, the master is
         * responsible for key expiry. If we would expire keys here, the
         * snapshot taken by the master may not be reflected on the slave. */
        // 如果當前環境不是從節點，且該鍵設置了過期時間，已經過期
        if (server.masterhost == NULL && expiretime != -1 && expiretime < now) {
            // 釋放鍵值對
            decrRefCount(key);
            decrRefCount(val);
            continue;
        }
        /* Add the new object in the hash table */
        // 將沒有過期的鍵值對添加到數據庫鍵值對字典中
        dbAdd(db,key,val);

        /* Set the expire time if needed */
        // 如果需要，設置過期時間
        if (expiretime != -1) setExpire(db,key,expiretime);

        decrRefCount(key);  //釋放臨時對象
    }

    // 此時已經讀出完所有數據庫的鍵值對，讀到了EOF，但是EOF不是RDB文件的結束，還要進行校驗和
    /* Verify the checksum if RDB version is >= 5 */
    // 當RDB版本大於5時，且開啓了校驗和的功能，那麼進行校驗和
    if (rdbver >= 5 && server.rdb_checksum) {
        uint64_t cksum, expected = rdb.cksum;

        // 讀出一個8字節的校驗和，然後比較
        if (rioRead(&rdb,&cksum,8) == 0) goto eoferr;
        memrev64ifbe(&cksum);
        if (cksum == 0) {
            serverLog(LL_WARNING,"RDB file was saved with checksum disabled: no check performed.");
        } else if (cksum != expected) {
            serverLog(LL_WARNING,"Wrong RDB checksum. Aborting now.");
            rdbExitReportCorruptRDB("RDB CRC error");
        }
    }

    fclose(fp); //關閉RDB文件
    stopLoading();  //設置載入完成的狀態
    return C_OK;

// 錯誤退出
eoferr: /* unexpected end of file is handled here with a fatal exit */
    serverLog(LL_WARNING,"Short read or OOM loading DB. Unrecoverable error, aborting now.");
    // 檢查rdb錯誤發送信息且退出
    rdbExitReportCorruptRDB("Unexpected EOF reading RDB file");
    return C_ERR; /* Just to avoid warning */
}

從這個函數中，我們可以看到許多RDB_TYPE_*類型的對象，他們定義在rdb.h中。

/* Dup object types to RDB object types. Only reason is readability (are we
 * dealing with RDB types or with in-memory object types?). */
#define RDB_TYPE_STRING 0           //字符串類型
#define RDB_TYPE_LIST   1           //列表類型
#define RDB_TYPE_SET    2           //集合類型
#define RDB_TYPE_ZSET   3           //有序集合類型
#define RDB_TYPE_HASH   4           //哈希類型
/* NOTE: WHEN ADDING NEW RDB TYPE, UPDATE rdbIsObjectType() BELOW */

/* Object types for encoded objects. */
#define RDB_TYPE_HASH_ZIPMAP    9
#define RDB_TYPE_LIST_ZIPLIST  10   //列表對象的ziplist編碼類型
#define RDB_TYPE_SET_INTSET    11   //集合對象的intset編碼類型
#define RDB_TYPE_ZSET_ZIPLIST  12   //有序集合的ziplist編碼類型
#define RDB_TYPE_HASH_ZIPLIST  13   //哈希對象的ziplist編碼類型
#define RDB_TYPE_LIST_QUICKLIST 14  //列表對象的quicklist編碼類型
/* NOTE: WHEN ADDING NEW RDB TYPE, UPDATE rdbIsObjectType() BELOW */

/* Test if a type is an object type. */
// 測試t是否是一個對象的編碼類型
#define rdbIsObjectType(t) ((t >= 0 && t <= 4) || (t >= 9 && t <= 14))

/* Special RDB opcodes (saved/loaded with rdbSaveType/rdbLoadType). */
#define RDB_OPCODE_AUX        250       //輔助標識
#define RDB_OPCODE_RESIZEDB   251       //提示調整哈希表大小的操作碼
#define RDB_OPCODE_EXPIRETIME_MS 252    //過期時間毫秒
#define RDB_OPCODE_EXPIRETIME 253       //過期時間秒
#define RDB_OPCODE_SELECTDB   254       //選擇數據庫的操作
#define RDB_OPCODE_EOF        255       //EOF碼

因此，看到這，我們就可以剖析dump.rdb文件了。

0000000   R   E   D   I   S   0   0   0   7 372  \t   r   e   d   i   s
           4552    4944    3053    3030    fa37    7209    6465    7369
0000020   -   v   e   r 005   3   .   2   .   8 372  \n   r   e   d   i
           762d    7265    3305    322e    382e    0afa    6572    6964
0000040   s   -   b   i   t   s 300   @ 372 005   c   t   i   m   e 302
           2d73    6962    7374    40c0    05fa    7463    6d69    c265
0000060   u   7  \f   Y 372  \b   u   s   e   d   -   m   e   m 302   0
           3775    590c    08fa    7375    6465    6d2d    6d65    30c2
0000100 211  \f  \0 377   8 341   Y 220 225 346   L 245
           0c89    ff00    e138    9059    e695    a54c
0000114

八進制372 對應着十進制的RDB_OPCODE_AUX，然後在到rdbLoad()函數中，找到type == RDB_OPCODE_AUX的情況，要分別讀出一個鍵對象和一個值對象；

讀對象時，先讀1個字節的長度，因此八進制'\t'對應十進制的9，所以在讀鍵對象的長度爲9字節，正如所分析的，redis-ver長度爲9字節。
- 然後讀出一值對象，先讀1字節的長度，因此八進制的005對應十進制的5，所以在讀出值對象的長度爲5字節，正如所分析的，3.2.8長度爲5字節。

判斷完type == RDB_OPCODE_AUX的情況，然後根據代碼，要跳出當前循環，於是，在讀出1個字節的type，此時type =還是372，於是還是分別讀出一個鍵對象和一個值對象；

讀對象時，先讀1個字節的長度，因此八進制'\n'對應十進制的10，所以在讀鍵對象的長度爲10字節，正如所分析的，redis-bits長度爲10字節。
然後讀出一值對象，先讀1字節的長度，因此八進制的300對應十進制的192，此時，這顯然不對，是因爲RDB是經過壓縮過得文件，接下來，我們介紹壓縮的規則：

/* When a length of a string object stored on disk has the first two bits
 * set, the remaining two bits specify a special encoding for the object
 * accordingly to the following defines: */
#define RDB_ENC_INT8 0        /* 8位有符號整數 8 bit signed integer */
#define RDB_ENC_INT16 1       /* 16位有符號整數 16 bit signed integer */
#define RDB_ENC_INT32 2       /* 32位有符號整數 32 bit signed integer */
#define RDB_ENC_LZF 3         /* LZF壓縮過的字符串 string compressed with FASTLZ */

#define RDB_6BITLEN 0           //6位長
#define RDB_14BITLEN 1          //14位長
#define RDB_32BITLEN 2          //32位長
#define RDB_ENCVAL 3            //編碼值
#define RDB_LENERR UINT_MAX     //錯誤值

一個字符串壓縮可能有如上4種，它的讀法，可以看rdbLoadLen()函數的源碼：可以從這個函數中看出，不同編碼類型，保存值的長度所佔的字節數。

我們讀一值對象，先讀1字節的長度，因此八進制的300對應二進制的1100 0000，它的最高兩位是11，十進制是3，對應RDB_ENCVAL類型，並且返回0。

// 返回一個從rio讀出的len值，如果該len值不是整數，而是被編碼後的值，那麼將isencoded設置爲1
uint32_t rdbLoadLen(rio *rdb, int *isencoded) {
    unsigned char buf[2];
    uint32_t len;
    int type;

    // 默認爲沒有編碼
    if (isencoded) *isencoded = 0;
    // 將rio中的值讀到buf中
    if (rioRead(rdb,buf,1) == 0) return RDB_LENERR;

    // (buf[0]&0xC0)>>6 = (1100 000 & buf[0]) >> 6 = buf[0]的最高兩位
    type = (buf[0]&0xC0)>>6;

    // 一個編碼過的值，返回解碼值，設置編碼標誌
    if (type == RDB_ENCVAL) {
        /* Read a 6 bit encoding type. */
        if (isencoded) *isencoded = 1;
        return buf[0]&0x3F; //取出剩下六位表示的長度值

    // 一個6位長的值
    } else if (type == RDB_6BITLEN) {
        /* Read a 6 bit len. */
        return buf[0]&0x3F; //取出剩下六位表示的長度值

    // 一個14位長的值
    } else if (type == RDB_14BITLEN) {
        /* Read a 14 bit len. */
        // 從buf+1讀出1個字節的值
        if (rioRead(rdb,buf+1,1) == 0) return RDB_LENERR;
        return ((buf[0]&0x3F)<<8)|buf[1];   //取出除最高兩位的長度值

    // 一個32位長的值
    } else if (type == RDB_32BITLEN) {
        /* Read a 32 bit len. */
        // 讀出4個字節的值
        if (rioRead(rdb,&len,4) == 0) return RDB_LENERR;
        return ntohl(len);  //轉換爲主機序的值
    } else {
        rdbExitReportCorruptRDB(
            "Unknown length encoding %d in rdbLoadLen()",type);
        return -1; /* Never reached. */
    }
}

然後回到創建字符串對象的函數rdbGenericLoadStringObject()，rdbLoadLen()函數的返回值是0，對應RDB_ENC_INT8，然後又調用了rdbLoadIntegerObject()函數。


// 根據flags，將從rio讀出一個字符串對象進行編碼
void *rdbGenericLoadStringObject(rio *rdb, int flags) {
    int encode = flags & RDB_LOAD_ENC;  //編碼
    int plain = flags & RDB_LOAD_PLAIN; //原生的值
    int isencoded;
    uint32_t len;

    // 從rio中讀出一個字符串對象，編碼類型保存在isencoded中，所需的字節爲len
    len = rdbLoadLen(rdb,&isencoded);
    // 如果讀出的對象被編碼(isencoded被設置爲1)，則根據不同的長度值len映射到不同的整數編碼
    if (isencoded) {
        switch(len) {
        case RDB_ENC_INT8:
        case RDB_ENC_INT16:
        case RDB_ENC_INT32:
            // 以上三種類型的整數編碼，根據flags返回不同類型值
            return rdbLoadIntegerObject(rdb,len,flags);
        case RDB_ENC_LZF:
            // 如果是壓縮後的字符串，進行構建壓縮字符串編碼對象
            return rdbLoadLzfStringObject(rdb,flags);
        default:
            rdbExitReportCorruptRDB("Unknown RDB string encoding type %d",len);
        }
    }

    // 如果len值錯誤，則返回NULL
    if (len == RDB_LENERR) return NULL;

    // 如果不是原生值
    if (!plain) {
        // 根據encode編碼類型創建不同的字符串對象
        robj *o = encode ? createStringObject(NULL,len) :
                           createRawStringObject(NULL,len);
        // 設置o對象的值，從rio中讀出來，如果失敗，釋放對象返回NULL
        if (len && rioRead(rdb,o->ptr,len) == 0) {
            decrRefCount(o);
            return NULL;
        }
        return o;
    // 如果設置了原生值
    } else {
        // 分配空間
        void *buf = zmalloc(len);
        // 從rio中讀出來
        if (len && rioRead(rdb,buf,len) == 0) {
            zfree(buf);
            return NULL;
        }
        return buf; //返回
    }
}

當傳入的編碼是RDB_ENC_INT8時。它又從後面讀取了1字節。後面的八進制值\n，對應十進制爲64，因此redis-bits

所對應的值爲64，也就是64位的Redis服務器。

// 將rio中的整數值根據不同的編碼讀出來，並根據flags構建成一個不同類型的值並返回
void *rdbLoadIntegerObject(rio *rdb, int enctype, int flags) {
    int plain = flags & RDB_LOAD_PLAIN; //無格式
    int encode = flags & RDB_LOAD_ENC;  //字符串對象
    unsigned char enc[4];
    long long val;

    // 根據不同的整數編碼類型，從rio中讀出整數值到enc中
    if (enctype == RDB_ENC_INT8) {
        if (rioRead(rdb,enc,1) == 0) return NULL;
        val = (signed char)enc[0];
    } else if (enctype == RDB_ENC_INT16) {
        uint16_t v;
        if (rioRead(rdb,enc,2) == 0) return NULL;
        v = enc[0]|(enc[1]<<8);
        val = (int16_t)v;
    } else if (enctype == RDB_ENC_INT32) {
        uint32_t v;
        if (rioRead(rdb,enc,4) == 0) return NULL;
        v = enc[0]|(enc[1]<<8)|(enc[2]<<16)|(enc[3]<<24);
        val = (int32_t)v;
    } else {
        val = 0; /* anti-warning */
        rdbExitReportCorruptRDB("Unknown RDB integer encoding type %d",enctype);
    }

    // 如果是整數，轉換爲字符串類型返回
    if (plain) {
        char buf[LONG_STR_SIZE], *p;
        int len = ll2string(buf,sizeof(buf),val);
        p = zmalloc(len);
        memcpy(p,buf,len);
        return p;
    // 如果是編碼過的整數值，則轉換爲字符串對象，返回
    } else if (encode) {
        return createStringObjectFromLongLong(val);
    } else {
    // 返回一個字符串對象
        return createObject(OBJ_STRING,sdsfromlonglong(val));
    }
}

此時，也就介紹完了所有規則，後面的分析和之前的如出一轍，因此，不在繼續分析了。SAVE和LOAD是相反的過程，因此可以反過來理解。

我將RDB持久化所有的源碼放在了github上，歡迎閱讀：Redis 3.2 源碼註釋

Redis源碼剖析和註釋（十七）--- RDB持久化機制

Redis RDB持久化機制

1. RDB的介紹

1.1 RDB觸發機制

1.2 RDB持久化的流程

1.3 RDB的優缺點

2. RDB 的源碼剖析

詐騙（殺豬盤）網站進行滲透測試

Python 潮流週刊#50：我最喜歡的 Python 3.13 新特性！

【Python】保存gym截圖

【譯】使用 GitHub Copilot 作爲你的編碼 GPS

Linux 服務器配置-安裝portainer-ce社區版

外行也能讀懂的網絡硬件設備功能原理速成

gdb 調試工具 --- 使用方法淺析

Redis源碼剖析和註釋（十五）---- 通知功能實現與實戰 (notify)

Redis源碼剖析和註釋（十四）---- Redis 數據庫及相關命令實現(db)

Redis源碼剖析和註釋（二）--- 簡單動態字符串

Redis源碼剖析和註釋（四）--- 跳躍表(skiplist)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結