redis 腦裂現象

腦裂

由於網絡問題,集羣節點失去聯繫。主從數據不同步;重新平衡選舉,產生兩個主服務。兩套主服務一起運行,導致數據不一致。

參考 redis 腦裂等極端情況分析


解決方案

比較簡單的方案,進行 redis 設置

// master 至少有 3 個副本連接
min-slaves-to-write 3
// 數據複製和同步的延遲不能超過 10 秒
min-slaves-max-lag 10

redis.conf 相關解析

# It is possible for a master to stop accepting writes if there are less than
# N slaves connected, having a lag less or equal than M seconds.
#
# The N slaves need to be in "online" state.
#
# The lag in seconds, that must be <= the specified value, is calculated from
# the last ping received from the slave, that is usually sent every second.
#
# This option does not GUARANTEE that N replicas will accept the write, but
# will limit the window of exposure for lost writes in case not enough slaves
# are available, to the specified number of seconds.
#
# For example to require at least 3 slaves with a lag <= 10 seconds use:
#
# min-slaves-to-write 3
# min-slaves-max-lag 10
#
# Setting one or the other to 0 disables the feature.
#
# By default min-slaves-to-write is set to 0 (feature disabled) and
# min-slaves-max-lag is set to 10.

redis(3.2.8)代碼實現流程(有刪減)

#define run_with_period(_ms_) if ((_ms_ <= 1000/server.hz) || !(server.cronloops%((_ms_)/(1000/server.hz))))

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
  run_with_period(1000) replicationCron();
}

/* Replication cron function, called 1 time per second. */
// 複製週期執行的函數,每秒調用1次
void replicationCron(void) {
    // 更新延遲至log小於min-slaves-max-lag的從服務器數量
    refreshGoodSlavesCount();
}

/* This function counts the number of slaves with lag <= min-slaves-max-lag.
 * If the option is active, the server will prevent writes if there are not
 * enough connected slaves with the specified lag (or less). */
// 更新延遲至log小於min-slaves-max-lag的從服務器數量
void refreshGoodSlavesCount(void) {
    listIter li;
    listNode *ln;
    int good = 0;

    // 沒設置限制則返回
    if (!server.repl_min_slaves_to_write ||
        !server.repl_min_slaves_max_lag) return;

    listRewind(server.slaves,&li);
    // 遍歷所有的從節點client
    while((ln = listNext(&li))) {
        client *slave = ln->value;
        // 計算延遲值
        time_t lag = server.unixtime - slave->repl_ack_time;

        // 計數小於延遲限制的個數
        if (slave->replstate == SLAVE_STATE_ONLINE &&
            lag <= server.repl_min_slaves_max_lag) good++;
    }
    server.repl_good_slaves_count = good;
}

參考

  1. redis 腦裂等極端情況分析
  2. redis 3.2.8 的源碼註釋
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章