腦裂
由於網絡問題,集羣節點失去聯繫。主從數據不同步;重新平衡選舉,產生兩個主服務。兩套主服務一起運行,導致數據不一致。
解決方案
比較簡單的方案,進行 redis 設置
// master 至少有 3 個副本連接
min-slaves-to-write 3
// 數據複製和同步的延遲不能超過 10 秒
min-slaves-max-lag 10
redis.conf 相關解析
# It is possible for a master to stop accepting writes if there are less than
# N slaves connected, having a lag less or equal than M seconds.
#
# The N slaves need to be in "online" state.
#
# The lag in seconds, that must be <= the specified value, is calculated from
# the last ping received from the slave, that is usually sent every second.
#
# This option does not GUARANTEE that N replicas will accept the write, but
# will limit the window of exposure for lost writes in case not enough slaves
# are available, to the specified number of seconds.
#
# For example to require at least 3 slaves with a lag <= 10 seconds use:
#
# min-slaves-to-write 3
# min-slaves-max-lag 10
#
# Setting one or the other to 0 disables the feature.
#
# By default min-slaves-to-write is set to 0 (feature disabled) and
# min-slaves-max-lag is set to 10.
redis(3.2.8)代碼實現流程(有刪減)
#define run_with_period(_ms_) if ((_ms_ <= 1000/server.hz) || !(server.cronloops%((_ms_)/(1000/server.hz))))
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
run_with_period(1000) replicationCron();
}
/* Replication cron function, called 1 time per second. */
// 複製週期執行的函數,每秒調用1次
void replicationCron(void) {
// 更新延遲至log小於min-slaves-max-lag的從服務器數量
refreshGoodSlavesCount();
}
/* This function counts the number of slaves with lag <= min-slaves-max-lag.
* If the option is active, the server will prevent writes if there are not
* enough connected slaves with the specified lag (or less). */
// 更新延遲至log小於min-slaves-max-lag的從服務器數量
void refreshGoodSlavesCount(void) {
listIter li;
listNode *ln;
int good = 0;
// 沒設置限制則返回
if (!server.repl_min_slaves_to_write ||
!server.repl_min_slaves_max_lag) return;
listRewind(server.slaves,&li);
// 遍歷所有的從節點client
while((ln = listNext(&li))) {
client *slave = ln->value;
// 計算延遲值
time_t lag = server.unixtime - slave->repl_ack_time;
// 計數小於延遲限制的個數
if (slave->replstate == SLAVE_STATE_ONLINE &&
lag <= server.repl_min_slaves_max_lag) good++;
}
server.repl_good_slaves_count = good;
}