版本：redis-5.0.5

參考：http://redis.io/topics/cluster-tutorial。

集羣部署交互式命令行工具：https://github.com/eyjian/redis-tools/tree/master/deploy

集羣運維命令行工具：https://github.com/eyjian/redis-tools/tree/master

批量操作工具：https://github.com/eyjian/libmooon/releases

1. 前言

2019年5月15日REdis發佈了最新版本5.0.5，在5.0.4基礎上修復了部分BUG。本文檔基於以前寫的《Redis-3.0.5集羣配置》和《Redis-4.0.11集羣配置》。

redis-3.0.0開始支持集羣，redis-4.0.0開始支持module，redis-5.0.0開始支持類似於kafka那樣的消息隊列。

本文參考官方文檔而成：http://redis.io/topics/cluster-tutorial，不適用redis-5.0.0以下版本，原因是從redis-5.0.0版本開始，redis-trib.rb的功能被redis-cli替代了。

redis-5.0.0以下版本的安裝和部署，可參考：https://blog.csdn.net/Aquester/article/details/50150163。

redis運維工具和部署工具：https://github.com/eyjian/redis-tools。

2. 名詞解釋

名詞	解釋
ASAP	As Soon As Possible，儘可能
RESP	Redis Serialization Protocol，redis的序列化協議
replica	從5.0開始，原slave改叫replica，相關的配置參數也做了同樣改名

3. 部署計劃

redis要求至少三主三從共6個節點才能組成redis集羣，測試環境可一臺物理上啓動6個redis節點，但生產環境至少要準備3臺物理機。

服務端口	IP地址	配置文件名
6381	192.168.0.251	redis-6381.conf
6382	192.168.0.251	redis-6382.conf
6383	192.168.0.251	redis-6383.conf
6384	192.168.0.251	redis-6384.conf
6385	192.168.0.251	redis-6385.conf
6386	192.168.0.251	redis-6386.conf

疑問：如果是3臺物理機，會不會主和從節點分佈在同一個物理機上？

4. 修改系統參數

4.1. 修改最大可打開文件數

修改文件/etc/security/limits.conf，加入以下兩行：

* soft nofile 102400

* hard nofile 102400

# End of file

其中102400爲一個進程最大可以打開的文件個數，當與RedisServer的連接數多時，需要設定爲合適的值。

有些環境修改後，root用戶需要重啓機器才生效，而普通用戶重新登錄後即生效。如果是crontab，則需要重啓crontab，如：service crond restart，有些平臺可能是service cron restart（類似重啓系統日誌服務：service rsyslog restart或systemctl restart rsyslog）。

有些環境下列設置即可讓root重新登錄即生效，而不用重啓機器：

root soft nofile 102400

root hard nofile 102400

# End of file

但是要小心，有些環境上面這樣做，可能導致無法ssh登錄，所以在修改時最好打開兩個窗口，萬一登錄不了還可自救。

如何確認更改對一個進程生效？按下列方法（其中$PID爲被查的進程ID）：

$ cat /proc/$PID/limits

系統關於/etc/security/limits.conf文件的說明：

#This file sets the resource limits for the users logged in via PAM.

#It does not affect resource limits of the system services.

PAM：全稱“Pluggable Authentication Modules”，中文名“插入式認證模塊”。/etc/security/limits.conf實際爲pam_limits.so（位置：/lib/security/pam_limits.so）的配置文件，只針對單個會話。要使用limits.conf生效，必須保證pam_limits.so被加入到了啓動文件中。

註釋說明只對通過PAM登錄的用戶生效，與PAM相關的文件（均位於/etc/pam.d目錄下）：

/etc/pam.d/login

/etc/pam.d/sshd

/etc/pam.d/crond

如果需要設置Linux用戶的密碼策略，可以修改文件/etc/login.defs，但這個只對新增的用戶有效，如果要影響已有用戶，可使用命令chage。

4.2. TCP監聽隊列大小

即TCP listen的backlog大小，“/proc/sys/net/core/somaxconn”的默認值一般較小如128，需要修改大一點，比如改成32767。立即生效還可以使用命令：sysctl -w net.core.somaxconn=32767。

要想永久生效，需要在文件/etc/sysctl.conf中增加一行：net.core.somaxconn = 32767，然後執行命令“sysctl -p”以生效。

Redis配置項tcp-backlog的值不能超過somaxconn的大小。

4.3. OOM相關：vm.overcommit_memory

如果“/proc/sys/vm/overcommit_memory”的值爲0，則會表示開啓了OOM。可以設置爲1關閉OOM，設置方法請參照net.core.somaxconn完成。

4.4. /sys/kernel/mm/transparent_hugepage/enabled

默認值爲“[always] madvise never”，建議設置爲never，以開啓內核的“Transparent Huge Pages (THP)”特性，設置後redis進程需要重啓。爲了永久生效，請將“echo never > /sys/kernel/mm/transparent_hugepage/enabled”加入到文件/etc/rc.local中。

什麼是Transparent Huge Pages？爲提升性能，通過大內存頁來替代傳統的4K頁，使用得管理虛擬地址數變少，加快從虛擬地址到物理地址的映射，以及摒棄內存頁面的換入換出以提高內存的整體性能。內核Kernel將程序緩存內存中，每頁內存以2M爲單位。相應的系統進程爲khugepaged。

在Linux中，有兩種方式使用Huge Pages，一種是2.6內核引入的HugeTLBFS，另一種是2.6.36內核引入的THP。HugeTLBFS主要用於數據庫，THP廣泛應用於應用程序。

一般可以在rc.local或/etc/default/grub中對Huge Pages進行設置。

5. 目錄結構

redis.conf爲從https://raw.githubusercontent.com/antirez/redis/5.0/redis.conf下載的配置文件，帶端口號的配置文件基於redis.conf修改。實際只需要完成公共的redis.conf和一個端口號的，如redis-6381.conf，其它端口號的配置文件基於一個修改後的端口號配置文件即可。

本文將redis安裝在/data/redis，建議將bin目錄加入到環境變量PATH中，以簡化後續的使用。

如果拿到的是redis源代碼，在make成功後，推薦按下列目錄結構部署各程序文件：

/data/redis

|-- bin

| |-- redis-benchmark

| |-- redis-check-aof

| |-- redis-check-rdb

| |-- mkreleasehdr.sh

| |-- redis-cli

| |-- redis-sentinel -> redis-server

| `-- redis-server

|-- conf

| |-- redis-6381.conf

| |-- redis-6382.conf

| |-- redis-6383.conf

| |-- redis-6384.conf

| |-- redis-6385.conf

| |-- redis-6386.conf

| `-- redis.conf

`-- log

3 directories, 14 files

注意，redis-check-dump和redis-check-rdb是同一個程序，在redis-3.0.0之前叫redis-check-dump，之後更名爲redis-check-rdb。

6. 編譯安裝

打開redis的Makefile文件，可以看到如下內容：

PREFIX?=/usr/local

INSTALL_BIN=$(PREFIX)/bin

INSTALL=install

Makefile中的“?=”表示，如果該變量之前沒有定義過，則賦值爲/usr/local，否則什麼也不做。

如果不設置環境變量PREFIX或不修改Makefile中的值，則默認安裝到/usr/local/bin目錄下。建議不要使用默認配置，而是指定安裝目錄，如/data/redis-5.0.5：

$ make

$ make install PREFIX=/data/redis-5.0.5

$ ln -s /data/redis-5.0.5 /data/redis

$ mkdir /data/redis/conf

$ mkdir /data/redis/log

$ mkdir /data/redis/data

7. 配置redis

推薦配置分成兩部分：一是公共配置，另一個與端口相關的配置。公共配置文件名可命令爲redis.conf，而端口相關的配置文件名可命令爲redis-PORT.conf或redis_PORT.conf。假設端口爲6379，則端口相關的配置文件名爲redis-6379.conf。redis-PORT.conf通過include的方式包含redis.conf，如：include /data/redis/conf/redis.conf。

從https://raw.githubusercontent.com/antirez/redis/5.0/redis.conf下載配置文件（也可直接複製源代碼包中的redis.conf，然後在它的基礎上進行修改），在這個基礎上，進行如下表所示的修改（配置文件名redis-PORT.conf中的PORT替換爲實際使用的端口號，如6381等）。

高效完成多個端口配置的一個方法是先完成一個指定端口的配置文件，然後替換端口方式生成另一個端口的配置文件。如通過端口6381的配置文件redis-6381.conf生成端口號6382的配置文件redis-6382.conf，只需要這樣：sed 's/6381/6382/g' redis-6381.conf > redis-6382.conf。

下表配置項，加粗部分是必須和建議修改的，其它可根據實際需求修改：

配置項（加粗部分必須或建議修改）	值	配置文件	說明
include	redis.conf	指定端口的配置文件 redis-PORT.conf （該文件定義所有與端口相關的配置項，PORT需要替換爲具體的端口，如6381）	引用公共的配置文件，建議爲全路徑值
port	PORT		客戶端連接端口，並且總有一個剛好大於10000的端口，這個大的端口用於主從複製和集羣內部通訊。
cluster-config-file	nodes-PORT.conf		默認放在dir指定的目錄，注意不能包含目錄，純文件名，爲redis-server進程自動維護，不能手工修改
pidfile	/var/run/redis-PORT.pid		只有當daemonize值爲yes時，纔有意義；並且這個要求對目錄/var/run有寫權限，否則可以考慮設置爲/tmp/redis-PORT.pid，或者放在bin或log目錄下，如：/data/redis/log/redis-PORT.pid。只有當配置項daemonize的值爲yes時，纔會產生這個文件。
dir	/data/redis/data/PORT
dbfilename	dump-PORT.rdb		純文件名，位於dir指定的目錄下，不能包含目錄，否則報錯“appendfilename can't be a path, just a filename”。如果開啓了AOF，REdis進程啓動時並不會讀取RDB文件，所以配置上可以考慮關閉RDB，這樣可以提升REdis穩定性。
appendfilename	"appendonly-PORT.aof"		純文件名，位於dir指定的目錄下，不能包含目錄，否則報錯“appendfilename can't be a path, just a filename”
logfile	/data/redis/log/redis-PORT.log		日誌文件，包含目錄和文件名，注意redis不會自動滾動日誌文件
cluster-enabled	yes	redis.conf （公共配置文件，定義所有與端口無關的配置項）	yes表示以集羣方式運行，爲no表示以非集羣方式運行
loglevel	verbose		日誌級別，建議爲notice，另外注意redis是不會滾動日誌文件的，每次寫日誌都是先打開日誌文件再寫日誌再關閉方式
maxclients	10000		最大連接數
timeout	0		客戶端多長（秒）時間沒發包過來關閉它，0表示永不關閉
cluster-node-timeout	15000		集羣中的節點最大不可用時長，在這個時長內，不會被判定爲fail。對於master節點，當不可用時長超過此值時，slave在延遲至少0.5秒後會發起選舉進行failover成爲master。Redis集羣的很多其它值與cluster-node-timeout有關。
cluster-slave-validity-factor （5.0開始請使用cluster-replica-validity-factor）	0		如果設置爲0，則slave總是嘗試成爲master，無論slave和master間的鏈接斷開時間的長短。如果是一個大於0的值，則最大可斷開時長爲：(cluster-slave-validity-factor * cluster-node-timeout)。例如：當cluster-node-timeout值爲5，cluster-slave-validity-factor值爲10時，slave和master間的連接斷開50秒內，slave不會嘗試成爲master。
repl-timeout	10		這個參數一定不能小於repl-ping-replica-period，可以考慮爲repl-ping-replica-period的3倍或更大。定義多長時間內均PING不通時，判定心跳超時。對於redis集羣，達到這個值並不會發生主從切換,主從何時切換由參數cluster-node-timeout控制，只有master狀態爲fail後，它的slaves才能發起選舉。
repl-ping-slave-period （5.0開始請使用repl-ping-replica-period）	1		定義slave多久（秒）ping一次master，如果超過repl-timeout指定的時長都沒有收到響應，則認爲master掛了
slave-read-only （5.0開始請用replica-read-only）	yes		slave是否只讀
slave-serve-stale-data （5.0開始請使用replica-serve-stale-data）	yes		當slave與master斷開連接，slave是否繼續提供服務
slave-priority （5.0開始請使用replica-priority）	100		slave權重值，當master掛掉，只有權重最大的slave接替master
aof-use-rdb-preamble			4.0新增配置項，用於控制是否啓用RDB-AOF混用，值爲no表示關閉
appendonly	yes		當同時寫AOF或RDB，則redis啓動時只會加載AOF，AOF包含了全量數據。如果當隊列使用，入隊壓力又很大，建議設置爲no
appendfsync	no		可取值everysec，其中no表示由系統自動，當寫壓力很大時，建議設置爲no，否則容易造成整個集羣不可用
daemonize	yes		相關配置項pidfile
protected-mode	no		3.2.0新增的配置項，默認值爲yes，限制從其它機器登錄Redis server，而只能從127.0.0.1登錄。
tcp-backlog	32767		取值不能超過系統的/proc/sys/net/core/somaxconn
auto-aof-rewrite-percentage	100		設置自動rewite AOF文件（手工rewrite只需要調用命令BGREWRITEAOF）
auto-aof-rewrite-min-size	64mb		觸發rewrite的AOF文件大小，只有大於此大小時纔會觸發rewrite
no-appendfsync-on-rewrite	yes		子進程在做rewrite時，主進程不調用fsync（由內核默認調度）
stop-writes-on-bgsave-error	yes		如果因爲磁盤故障等導致保存rdb失敗，停止寫操作，可設置爲NO。
cluster-require-full-coverage	no		爲no表示有slots不可服務時其它slots仍然繼續服務，建議值爲no，以提供最高的可用性
maxmemory	26843545600		設置最大的內存，單位爲字節
maxmemory-policy	volatile-lru		設置達到最大內存時的淘汰策略
client-output-buffer-limit			設置master端的客戶端緩存，三種：normal、slave和pubsub
cluster-migration-barrier	1		最少slave數，用來保證集羣中不會有裸奔的master。當某個master節點的slave節點掛掉裸奔後，會從其他富餘的master節點分配一個slave節點過來，確保每個master節點都有至少一個slave節點，不至於因爲master節點掛掉而沒有相應slave節點替換爲master節點導致集羣崩潰不可用。
repl-backlog-size	1mb		當slave失聯時的，環形複製緩區大小，值越大可容忍更長的slave失聯時長
repl-backlog-ttl			slave失聯的時長達到該值時，釋放backlog緩衝區
save	save 900 1 save 300 10 save 60 10000		刷新快照（RDB）到磁盤的策略，根據實際調整值，“save 900 1”表示900秒後至少有1個key被修改才觸發save操作，其它類推。注意執行flushall命令也會產生RDB文件，不過是空文件。如果不想生成RDB文件，可以將save全註釋掉。

8. 啓動redis實例

在啓動之前，需要創建好配置中的各目錄。然後啓動好所有的redis實例，如以本文中定義的6個節點爲例（帶個目錄是個良好和規範的習慣）：

/data/redis/bin/redis-server /data/redis/conf/redis-6381.conf

/data/redis/bin/redis-server /data/redis/conf/redis-6382.conf

/data/redis/bin/redis-server /data/redis/conf/redis-6383.conf

/data/redis/bin/redis-server /data/redis/conf/redis-6384.conf

/data/redis/bin/redis-server /data/redis/conf/redis-6385.conf

/data/redis/bin/redis-server /data/redis/conf/redis-6386.conf

可以寫一個啓動腳本start-redis-cluster.sh：

#!/bin/sh

REDIS_HOME=/data/redis

$REDIS_HOME/bin/redis-server $REDIS_HOME/conf/redis-6379.conf

$REDIS_HOME/bin/redis-server $REDIS_HOME/conf/redis-6380.conf

一般需要加上進程監控，可直接使用process_monitor.sh，監控示例（放在crontab中，下載網址：https://github.com/eyjian/libmooon/blob/master/shell/process_monitor.sh）：

REDIS_HOME=/data/redis

* * * * * /usr/local/bin/process_monitor.sh "$REDIS_HOME/bin/redis-server 6381" "$REDIS_HOME/bin/redis-server $REDIS_HOME/conf/redis_6381.conf"

* * * * * log=$REDIS_HOME/log/redis_6381.log;if test `ls -l $log|cut -d' ' -f5` -gt 104857600; then mv $log $log.old; fi

注意：redis的日誌文件不會自動滾動，redis-server每次在寫日誌時，均會以追加方式調用fopen寫日誌，而不處理滾動。也可藉助linux自帶的logrotate來滾動redis日誌，命令logrotate一般位於目錄/usr/sbin下。

9. 創建和啓動redis集羣

如果只是想快速創建和啓動redis集羣，而不關心過程，可使用redis官方提供的腳本create-cluster，兩步完成：

create-cluster start

create-cluster create

第二步“create-cluster create”是一個交互式過程，當提示時，請輸入“yes”再回車繼續，第一個節點的端口號爲30001，一共會啓動六個redis節點。

create-cluster在哪兒？它位於redis源代碼的utils/create-cluster目錄下，是一個bash腳本文件。停止集羣：create-cluster stop。

但如果是爲學習和運營，建議按下列步驟操作，以加深對redis集羣的理解，提升掌控能力：

9.1. 創建redis cluster

創建redis集羣命令（三主三從，每個主一個從，注意redis-5.0.0版本開始才支持“--cluster”，之前的版本會報錯“Unrecognized option or bad number of args for: '--cluster'”）：

redis-cli --cluster create 192.168.0.251:6381 192.168.0.251:6382 192.168.0.251:6383 192.168.0.251:6384 192.168.0.251:6385 192.168.0.251:6386 --cluster-replicas 1

如果配置項cluster-enabled的值不爲yes，則執行時會報錯“[ERR] Node 192.168.0.251:6381 is not configured as a cluster node.”。這個時候需要先將cluster-enabled的值改爲yes，然後重啓redis-server進程，之後纔可以重新執行redis-cli創建集羣。

Ø redis-cli的參數說明：

1) create

表示創建一個redis集羣。

2) --cluster-replicas 1

表示爲集羣中的每一個主節點指定一個從節點，即一比一的複製。\

運行過程中，會有個提示，輸入yes回車即可。從屏幕輸出，可以很容易地看出哪些是主（master）節點，哪些是從（slave）節點：

$ ./redis-cli --cluster create 192.168.0.251:6381 192.168.0.251:6382 192.168.0.251:6383 192.168.0.251:6384 192.168.0.251:6385 192.168.0.251:6386 --cluster-replicas 1

>>> Performing hash slots allocation on 6 nodes...

Master[0] -> Slots 0 - 5460

Master[1] -> Slots 5461 - 10922

Master[2] -> Slots 10923 - 16383

Adding replica 192.168.0.251:6384 to 192.168.0.251:6381

Adding replica 192.168.0.251:6385 to 192.168.0.251:6382

Adding replica 192.168.0.251:6386 to 192.168.0.251:6383

>>> Trying to optimize slaves allocation for anti-affinity

[WARNING] Some slaves are in the same host as their master

M: f805e652ff8abe151393430cb3bcbf514b8a7399 192.168.0.251:6381

slots:[0-5460] (5461 slots) master

M: bfad383775421b1090eaa7e0b2dcfb3b38455079 192.168.0.251:6382

slots:[5461-10922] (5462 slots) master

M: 44eb43e50c101c5f44f48295c42dda878b6cb3e9 192.168.0.251:6383

slots:[10923-16383] (5461 slots) master

S: 29fcce29837d3e5266b6178a15aecfa938ff241a 192.168.0.251:6384

replicates bfad383775421b1090eaa7e0b2dcfb3b38455079

S: 0ae8b5400d566907a3d8b425d983ac3b7cbd8412 192.168.0.251:6385

replicates 44eb43e50c101c5f44f48295c42dda878b6cb3e9

S: c67dc9e02e25f2e6321df8ac2eb4d99789917783 192.168.0.251:6386

replicates f805e652ff8abe151393430cb3bcbf514b8a7399

Can I set the above configuration? (type 'yes' to accept): yes

>>> Nodes configuration updated

>>> Assign a different config epoch to each node

>>> Sending CLUSTER MEET messages to join the cluster

Waiting for the cluster to join

...

>>> Performing Cluster Check (using node 192.168.0.251:6381)

M: f805e652ff8abe151393430cb3bcbf514b8a7399 192.168.0.251:6381

slots:[0-5460] (5461 slots) master

1 additional replica(s)

S: c67dc9e02e25f2e6321df8ac2eb4d99789917783 192.168.0.251:6386

slots: (0 slots) slave

replicates f805e652ff8abe151393430cb3bcbf514b8a7399

S: 29fcce29837d3e5266b6178a15aecfa938ff241a 192.168.0.251:6384

slots: (0 slots) slave

replicates bfad383775421b1090eaa7e0b2dcfb3b38455079

M: bfad383775421b1090eaa7e0b2dcfb3b38455079 192.168.0.251:6382

slots:[5461-10922] (5462 slots) master

1 additional replica(s)

S: 0ae8b5400d566907a3d8b425d983ac3b7cbd8412 192.168.0.251:6385

slots: (0 slots) slave

replicates 44eb43e50c101c5f44f48295c42dda878b6cb3e9

M: 44eb43e50c101c5f44f48295c42dda878b6cb3e9 192.168.0.251:6383

slots:[10923-16383] (5461 slots) master

1 additional replica(s)

[OK] All nodes agree about slots configuration.

>>> Check for open slots...

>>> Check slots coverage...

[OK] All 16384 slots covered.

9.2. ps aux|grep redis

查看redis進程是否已切換爲集羣狀態（cluster）：

[test@test-168-251 ~]$ ps aux|grep redis-server

redis 3824 0.7 5.9 6742404 3885144 ? Ssl 2018 1639:13 /data/redis/bin/redis-server *:6381 [cluster]

redis 3825 0.5 3.9 6709636 2618536 ? Ssl 2018 1235:43 /data/redis/bin/redis-server *:6382 [cluster]

redis 3826 0.5 3.9 6709636 2618536 ? Ssl 2018 1235:43 /data/redis/bin/redis-server *:6383 [cluster]

redis 3827 0.5 3.9 6709636 2618536 ? Ssl 2018 1235:43 /data/redis/bin/redis-server *:6384 [cluster]

redis 3828 0.5 3.9 6709636 2618536 ? Ssl 2018 1235:43 /data/redis/bin/redis-server *:6385 [cluster]

redis 3829 0.5 3.9 6709636 2618536 ? Ssl 2018 1235:43 /data/redis/bin/redis-server *:6386 [cluster]

停止redis實例，直接使用kill命令即可，如：kill 3825，重啓和單機版相同。

10. redis cluster client

10.1. 命令行工具redis-cli

官方提供的命令行客戶端工具，在單機版redis基礎上指定參數“-c”即可。以下是在192.168.0.251上執行redis-cli的記錄：

$ ./redis-cli -c -p 6379

127.0.0.1:6379> set foo bar

-> Redirected to slot [12182] located at 192.168.0.251:6379

192.168.0.251:6379> set hello world

-> Redirected to slot [866] located at 192.168.0.251:6379

192.168.0.251:6379> get foo

-> Redirected to slot [12182] located at 192.168.0.251:6379

"bar"

192.168.0.251:6379> get hello

-> Redirected to slot [866] located at 192.168.0.251:6379

"world"

查看集羣中的節點：

192.168.0.251:6379> cluster nodes

10.2. 從slaves讀數據

默認不能從slaves讀取數據，但建立連接後，執行一次命令READONLY ，即可從slaves讀取數據。如果想再次恢復不能從slaves讀取數據，可以執行下命令READWRITE。

10.3. jedis（java cluster client）

官網：https://github.com/xetorthio/jedis，編程示例：

Set<HostAndPort> jedisClusterNodes = new HashSet<HostAndPort>();

//Jedis Cluster will attempt to discover cluster nodes automatically

jedisClusterNodes.add(new HostAndPort("127.0.0.1", 7379));

JedisCluster jc = new JedisCluster(jedisClusterNodes);

jc.set("foo", "bar");

String value = jc.get("foo");

10.4. r3c（C++ cluster client）

官網：https://github.com/eyjian/r3c

11. 新增節點

11.1. 添加一個新主（master）節點

假設要添加新的節點“192.168.0.251:6390”，先以單機版配置和啓動好6387，然後執行命令（“192.168.0.251:6381”爲集羣中任一可用的節點）：

redis-cli --cluster add-node 192.168.0.251:6390 192.168.0.251:6381

如果執行順利，看到的輸出如下：

$ ./redis-cli --cluster add-node 192.168.0.251:6390 192.168.0.251:6381

>>> Adding node 192.168.0.251:6390 to cluster 192.168.0.251:6381

>>> Performing Cluster Check (using node 192.168.0.251:6381)

M: f805e652ff8abe151393430cb3bcbf514b8a7399 192.168.0.251:6381

slots:[0-5460] (5461 slots) master

1 additional replica(s)

S: c67dc9e02e25f2e6321df8ac2eb4d99789917783 192.168.0.251:6386

slots: (0 slots) slave

replicates f805e652ff8abe151393430cb3bcbf514b8a7399

S: 29fcce29837d3e5266b6178a15aecfa938ff241a 192.168.0.251:6384

slots: (0 slots) slave

replicates bfad383775421b1090eaa7e0b2dcfb3b38455079

M: bfad383775421b1090eaa7e0b2dcfb3b38455079 192.168.0.251:6382

slots:[5461-10922] (5462 slots) master

1 additional replica(s)

S: 0ae8b5400d566907a3d8b425d983ac3b7cbd8412 192.168.0.251:6385

slots: (0 slots) slave

replicates 44eb43e50c101c5f44f48295c42dda878b6cb3e9

M: 44eb43e50c101c5f44f48295c42dda878b6cb3e9 192.168.0.251:6383

slots:[10923-16383] (5461 slots) master

1 additional replica(s)

[OK] All nodes agree about slots configuration.

>>> Check for open slots...

>>> Check slots coverage...

[OK] All 16384 slots covered.

>>> Send CLUSTER MEET to node 192.168.0.251:6390 to make it join the cluster.

[OK] New node added correctly.

在執行“add-node”之前的集羣：

$ redis-cli -c -p 6381 cluster nodes|grep master

bfad383775421b1090eaa7e0b2dcfb3b38455079 192.168.0.251:6382@16382 master - 0 1540549992591 2 connected 5461-10922

f805e652ff8abe151393430cb3bcbf514b8a7399 192.168.0.251:6381@16381 myself,master - 0 1540549993000 1 connected 0-5460

44eb43e50c101c5f44f48295c42dda878b6cb3e9 192.168.0.251:6383@16383 master - 0 1540549994593 3 connected 10923-16383

$ redis-cli -c -p 6381 cluster nodes|grep slave

c67dc9e02e25f2e6321df8ac2eb4d99789917783 192.168.0.251:6386@146386 slave f805e652ff8abe151393430cb3bcbf514b8a7399 0 1540549996595 6 connected

29fcce29837d3e5266b6178a15aecfa938ff241a 192.168.0.251:6384@16384 slave bfad383775421b1090eaa7e0b2dcfb3b38455079 0 1540549995595 4 connected

0ae8b5400d566907a3d8b425d983ac3b7cbd8412 192.168.0.251:6385@16385 slave 44eb43e50c101c5f44f48295c42dda878b6cb3e9 0 1540549996000 5 connected

執行“add-node”之後的集羣（可以看到新增的master節點192.168.0.251:6390沒有負責任何slots）：

$ redis-cli -c -p 6381 cluster nodes|grep master

082c079149a9915612d21cca8e08c831a4edeade 192.168.0.251:6390@16390 master - 0 1540550992379 0 connected

bfad383775421b1090eaa7e0b2dcfb3b38455079 192.168.0.251:6382@16382 master - 0 1540550991000 2 connected 5461-10922

f805e652ff8abe151393430cb3bcbf514b8a7399 192.168.0.251:6381@16381 myself,master - 0 1540550990000 1 connected 0-5460

44eb43e50c101c5f44f48295c42dda878b6cb3e9 192.168.0.251:6383@16383 master - 0 1540550991377 3 connected 10923-16383

$ redis-cli -c -p 6381 cluster nodes|grep slave

c67dc9e02e25f2e6321df8ac2eb4d99789917783 192.168.0.251:6386@16386 slave f805e652ff8abe151393430cb3bcbf514b8a7399 0 1540550996000 6 connected

29fcce29837d3e5266b6178a15aecfa938ff241a 192.168.0.251:6384@16384 slave bfad383775421b1090eaa7e0b2dcfb3b38455079 0 1540550994383 4 connected

0ae8b5400d566907a3d8b425d983ac3b7cbd8412 192.168.0.251:6385@16385 slave 44eb43e50c101c5f44f48295c42dda878b6cb3e9 0 1540550998388 5 connected

如果報錯“[ERR] Node 192.168.0.251:4077 is not configured as a cluster node.”，是因爲新節點的配置項“cluster-enabled”的值不爲“yes”。這時需要將“cluster-enabled”的值改爲“yes”，並重啓該節點，然後再重新執行“add-node”操作。

也可能遇到錯誤“[ERR] Sorry, can't connect to node 127.0.0.1:6390”，引起這個問題的原因是從Redis 3.2.0版本開始引入了“保護模式（protected mode），防止redis-cli遠程訪問”，僅限redis-cli綁定到127.0.0.1纔可以連接Redis server。

爲了完成添加新主節點，可以暫時性的關閉保護模式，使用redis-cli，不指定-h參數（但可以指定-p參數，或者-h參數值爲127.0.0.1）進入操作界面：CONFIG SET protected-mode no。

注意：6390是新增的節點，而6381是已存在的節點（可爲master或slave）。如果需要將6390變成某master（假如爲3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e）的slave節點，只需要在6390上執行redis命令（前提：這個master沒有負責任何slots，亦即需爲一個空master）：

redis-cli -h 192.168.0.251 -p 6390 cluster replicate 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e

新加入的master節點上沒有任何數據（slots，運行redis命令cluster nodes可以看到這個情況）。當一個slave想成爲master時，由於這個新的master節點不管理任何slots，它不參與選舉。可以使用redis-cli的reshard爲這個新master節點分配slots，如：

redis-cli --cluster reshard 192.168.0.251:6390

11.2. 添加一個新從（slave）節點

以添加“192.168.0.251:6390”爲例：

redis-cli --cluster add-node 192.168.0.251:6390 192.168.0.251:6381 --cluster-slave

“192.168.0.251:6390”爲新添加的從節點，“192.168.0.251:6381”可爲集羣中已有的任意節點，這種方法隨機爲6390指定一個master，如果想明確指定master，假設目標master的ID爲“3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e”，則：

redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000 --cluster-slave --cluster-master-id 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e

12. 刪除節點

從集羣中刪除一個節點命令格式：

redis-cli --cluster del-node 127.0.0.1:7000 `<node-id>`

“127.0.0.1:7000”爲集羣中任意一個非待刪除節點，“node-id”爲待刪除節點的ID。如果待刪除的是master節點，則在刪除之前需要將該master負責的slots先全部遷到其它master。

$ ./redis-cli --cluster del-node 192.168.0.251:6381 082c079149a9915612d21cca8e08c831a4edeade

>>> Removing node 082c079149a9915612d21cca8e08c831a4edeade from cluster 192.168.0.251:6381

>>> Sending CLUSTER FORGET messages to the cluster...

>>> SHUTDOWN the node.

如果刪除後，其它節點還看得到這個被刪除的節點，則可通過FORGET命令解決，需要在所有還看得到的其它節點上執行：

CLUSTER FORGET `<node-id>`

FORGET做兩件事：

1) 從節點表剔除節點；

2) 在60秒的時間內，阻止相同ID的節點加進來。

13. master機器硬件故障

這種情況下，master機器可能無法啓動，導致其上的master無法連接，master將一直處於“master,fail”狀態，如果是slave則處於“slave,fail”狀態。

如果是master，則會它的slave變成了master，因此只需要添加一個新的從節點作爲原slave（已變成master）的slave節點。完成後，通過CLUSTER FORGET將故障的master或slave從集羣中剔除即可。

！！！請注意，需要在所有node上執行一次“CLUSTER FORGET”，否則可能遇到被剔除node的總是處於handshake狀態。

14. 檢查節點狀態

以檢查節點“192.168.0.251:6381”的狀態爲例：

redis-cli --cluster check 192.168.0.251:6381

如發現如下這樣的錯誤：

[WARNING] Node 192.168.0.251:6381 has slots in migrating state (5461).

[WARNING] The following slots are open: 5461

可以使用redis命令取消slots遷移（5461爲slot的ID）：

cluster setslot 5461 stable

需要注意，須登錄到192.168.0.251:6381上執行redis的setslot子命令。

15. 變更主從關係

在目標slave上執行，命令格式：

cluster replicate <master-node-id>

假設將“192.168.0.251:6381”的master改爲“3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e”：

redis-cli -h 192.168.0.251 -p 6381 cluster replicate 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e

使用命令cluster replicate，參數爲master節點ID，注意不是IP和端口，在被遷移的slave上執行該命令。

16. slots相關命令

CLUSTER ADDSLOTS slot1 [slot2] ... [slotN]

CLUSTER DELSLOTS slot1 [slot2] ... [slotN]

CLUSTER SETSLOT slot NODE node

CLUSTER SETSLOT slot MIGRATING node

CLUSTER SETSLOT slot IMPORTING node

17. 遷移slosts

官方參考：https://redis.io/commands/cluster-setslot。

示例：將值爲8的slot從源節點A遷移到目標節點B，有如下兩種方法：

在目標節點B上執行：CLUSTER SETSLOT 8 IMPORTING src-A-node-id

或

在源節點A上執行：CLUSTER SETSLOT 8 MIGRATING dst-B-node-id

上述操作只是將slot標記爲遷移狀態，完成遷移還需要執行（在目標node上執行）：

CLUSTER SETSLOT <slot> NODE <dst-node-id>

其中node-id爲目標的Node ID，取消遷移使用“CLUSTER SETSLOT <slot> STABLE”，操作示例：

# 將值爲11677的slot遷到192.168.31.3:6379

$ redis-cli -c -h 192.168.31.3 -p 6379 CLUSTER SETSLOT 11677 IMPORTING 216e0069af11eca91465394b2ad7bf1c27f5f7fe

$ redis-cli -c -h 192.168.31.3 -p 6379 CLUSTER SETSLOT 11677 NODE 4e149c72aff2b6651370ead476dd70c8cf9e3e3c

18. 人工主備切換

在需要的slaves節點上執行命令：

CLUSTER FAILOVER

人工發起failover，其它master會收到“Failover auth granted to 4291f18b5e9729e832ed15ceb6324ce5dfc2ffbe for epoch 31”，每次epoch值增一。

23038:M 06 Sep 20:31:24.815 # Failover auth granted to 4291f18b5e9729e832ed15ceb6324ce5dfc2ffbe for epoch 31

當出現下面兩條日誌時，表示failover完成：

23038:M 06 Sep 20:32:44.019 * FAIL message received from ea28f68438e5bb79c26a9cb2135241f11d7a50ba about 5e6ffacb2c5d5761e39aba5270fbf48f296cb5ee

23038:M 06 Sep 20:32:58.487 * Clear FAIL state for node 5e6ffacb2c5d5761e39aba5270fbf48f296cb5ee: slave is reachable again.

成爲新master的slave日誌：

Manual failover user request accepted.

Received replication offset for paused master manual failover: 347540

All master replication stream processed, manual failover can start.

Start of election delayed for 0 milliseconds (rank #0, offset 347540).

Starting a failover election for epoch 7545.

Failover election won: I'm the new master.

原master收到failover後的日誌：

35475:M 06 Sep 20:35:43.396 - DB 0: 16870482 keys (7931571 volatile) in 50331648 slots HT.

35475:M 06 Sep 20:35:43.396 - 1954 clients connected (1 slaves), 5756515544 bytes in use

35475:M 06 Sep 20:35:48.083 # Manual failover requested by slave 58a40dbe01e1563773724803854406df04c62724.

35475:M 06 Sep 20:35:48.261 # Failover auth granted to 58a40dbe01e1563773724803854406df04c62724 for epoch 32

35475:M 06 Sep 20:35:48.261 - Client closed connection

10.51.147.216:7388爲failover前的slave，

10.51.147.216:7388的ID爲58a40dbe01e1563773724803854406df04c62724

35475:M 06 Sep 20:35:48.261 # Connection with slave 10.51.147.216:7388 lost.

35475:M 06 Sep 20:35:48.278 # Configuration change detected. Reconfiguring myself as a replica of 58a40dbe01e1563773724803854406df04c62724

35475:S 06 Sep 20:35:48.280 - Client closed connection

35475:S 06 Sep 20:35:48.408 - DB 0: 16870296 keys (7931385 volatile) in 50331648 slots HT.

35475:S 06 Sep 20:35:48.408 - 1953 clients connected (0 slaves), 5722753736 bytes in use

35475:S 06 Sep 20:35:48.408 * Connecting to MASTER 10.51.147.216:7388

35475:S 06 Sep 20:35:48.408 * MASTER <-> SLAVE sync started

35475:S 06 Sep 20:35:48.408 * Non blocking connect for SYNC fired the event.

35475:S 06 Sep 20:35:48.408 * Master replied to PING, replication can continue...

35475:S 06 Sep 20:35:48.408 * Partial resynchronization not possible (no cached master)

35475:S 06 Sep 20:35:48.459 * Full resync from master: 36beb63d32b3809039518bf4f3e4e10de227f3ee:16454238619

35475:S 06 Sep 20:35:48.493 - Client closed connection

35475:S 06 Sep 20:35:48.880 - Client closed connection

19. 查看集羣信息

對應的redis命令爲：cluster info，示例：

127.0.0.1:6381> cluster info

cluster_state:ok 所有slots正常則顯示爲OK，否則爲error

cluster_slots_assigned:16384 多少slots被分配了，即多少被master管理了，16384爲全部slots

cluster_slots_ok:16384 有多少slots是正常的

cluster_slots_pfail:0 有多少slots可能處於異常狀態，處於這個狀態並不表示有問題，仍能繼續提供服務

cluster_slots_fail:0 有多少slots處於異常狀態，需要修復才能服務

cluster_known_nodes:10 集羣中的節點數

cluster_size:3 集羣中master個數

cluster_current_epoch:11 本地的當前時間變量，用於故障切換時生成獨一無二的增量版本號

cluster_my_epoch:0

cluster_stats_messages_sent:4049 通過集羣消息總線發送的消息總數

cluster_stats_messages_received:4051 通過過集通過羣消息總線收到的消息總數

20. 禁止指定命令

KEYS命令很耗時，FLUSHDB和FLUSHALL命令可能導致誤刪除數據，所以線上環境最好禁止使用，可以在Redis配置文件增加如下配置：

rename-command KEYS ""

rename-command FLUSHDB ""

rename-command FLUSHALL ""

21. 數據遷移

可使用命令“redis-cli --cluster import”將數據從一個redis集羣遷到另一個redis集羣。

22. 各版本配置文件

https://raw.githubusercontent.com/antirez/redis/5.0.5/redis.conf

https://raw.githubusercontent.com/antirez/redis/4.0.11/redis.conf

https://raw.githubusercontent.com/antirez/redis/4.0.9/redis.conf

https://raw.githubusercontent.com/antirez/redis/4.0.5/redis.conf

https://raw.githubusercontent.com/antirez/redis/4.0.3/redis.conf

https://raw.githubusercontent.com/antirez/redis/4.0.1/redis.conf

https://raw.githubusercontent.com/antirez/redis/4.0/redis.conf

https://raw.githubusercontent.com/antirez/redis/3.2.9/redis.conf

https://raw.githubusercontent.com/antirez/redis/3.0/redis.conf

23. 大壓力下Redis參數調整要點

參數	建議最小值	說明
repl-ping-slave-period	10	每10秒ping一次
repl-timeout	60	60秒超時，也就是ping十次
cluster-node-timeout	15000
repl-backlog-size	1GB	Master對slave的隊列大小
appendfsync	no	讓系統自動刷
save		大壓力下，調大參數值，以減少寫RDB帶來的壓力： "900 20 300 200 60 200000"
appendonly		對於隊列，建議單獨建立集羣，並且設置該值爲no

爲何大壓力下要這樣調整？

最重要的原因之一Redis的主從複製，兩者複製共享同一線程，雖然是異步複製的，但因爲是單線程，所以也十分有限。如果主從間的網絡延遲不是在0.05左右，比如達到0.6，甚至1.2等，那麼情況是非常糟糕的，因此同一Redis集羣一定要部署在同一機房內。

這些參數的具體值，要視具體的壓力而定，而且和消息的大小相關，比如一條200~500KB的流水數據可能比較大，主從複製的壓力也會相應增大，而10字節左右的消息，則壓力要小一些。大壓力環境中開啓appendfsync是十分不可取的，容易導致整個集羣不可用，在不可用之前的典型表現是QPS毛刺明顯。

這麼做的目的是讓Redis集羣儘可能的避免master正常時觸發主從切換，特別是容納的數據量很大時，和大壓力結合在一起，集羣會雪崩。

當Redis日誌中，出現大量如下信息，即可能意味着相關的參數需要調整了：

22135:M 06 Sep 14:17:05.388 * FAIL message received from 1d07e208db56cfd7395950ca66e03589278b8e12 about e438a338e9d9834a6745c12931950da87e360ca2

22135:M 06 Sep 14:17:07.551 * FAIL message received from ae8f6e7e0ab16b04414c8f3d08b58c0aa268b467 about d6eb06e9d118c120d3961a659972a1d0191a8652

22135:M 06 Sep 14:17:08.438 # Failover auth granted to f7d6b2c72fa3b801e7dcfe0219e73383d143dd0f for epoch 285 （We can vote for this slave）

有投票資格的node：

1）爲master

2）至少有一個slot

3）投票node的epoch不能小於node自己當前的epoch（reqEpoch < curEpoch）

4）node沒有投票過該epoch（already voted for epoch）

5）投票node不能爲master（it is a master node）

6）投票node必須有一個master（I don't know its master）

7）投票node的master處於fail狀態（its master is up）

22135:M 06 Sep 14:17:19.844 # Failover auth denied to 534b93af6ba45a7033dbf38c8f47cd688514125a: already voted for epoch 285

如果一個node又聯繫上了，則它當是一個slave，或者無slots的master時，直接清除FAIL標誌；但如果是一個master，則當“(now - node->fail_time) > (server.cluster_node_timeout * CLUSTER_FAIL_UNDO_TIME_MULT)”時，也清除FAIL標誌，定義在cluster.h中（cluster.h:#define CLUSTER_FAIL_UNDO_TIME_MULT 2 /* Undo fail if master is back. */）

22135:M 06 Sep 14:17:29.243 * Clear FAIL state for node d6eb06e9d118c120d3961a659972a1d0191a8652: master without slots is reachable again.

如果消息類型爲fail。

22135:M 06 Sep 14:17:31.995 * FAIL message received from f7d6b2c72fa3b801e7dcfe0219e73383d143dd0f about 1ba437fa1683a8caafd38ff977e5fbabdaf84fd6

22135:M 06 Sep 14:17:32.496 * FAIL message received from 1d07e208db56cfd7395950ca66e03589278b8e12 about d7942cfe636b25219c6d56aa72828fcfde2ee261

22135:M 06 Sep 14:17:32.968 # Failover auth granted to 938d9ae2de278938beda1d39185608b02d3b31ec for epoch 286

22135:M 06 Sep 14:17:33.177 # Failover auth granted to d9dadf3342006e2c92def3071ca0a76390be62b0 for epoch 287

22135:M 06 Sep 14:17:36.336 * Clear FAIL state for node 1ba437fa1683a8caafd38ff977e5fbabdaf84fd6: master without slots is reachable again.

22135:M 06 Sep 14:17:36.855 * Clear FAIL state for node d7942cfe636b25219c6d56aa72828fcfde2ee261: master without slots is reachable again.

22135:M 06 Sep 14:17:38.419 * Clear FAIL state for node e438a338e9d9834a6745c12931950da87e360ca2: is reachable again and nobody is serving its slots after some time.

22135:M 06 Sep 14:17:54.954 * FAIL message received from ae8f6e7e0ab16b04414c8f3d08b58c0aa268b467 about 7990d146cece7dc83eaf08b3e12cbebb2223f5f8

22135:M 06 Sep 14:17:56.697 * FAIL message received from 1d07e208db56cfd7395950ca66e03589278b8e12 about fbe774cdbd2acd24f9f5ea90d61c607bdf800eb5

22135:M 06 Sep 14:17:57.705 # Failover auth granted to e1c202d89ffe1c61b682e28071627635974c84a7 for epoch 288

22135:M 06 Sep 14:17:57.890 * Clear FAIL state for node 7990d146cece7dc83eaf08b3e12cbebb2223f5f8: slave is reachable again.

22135:M 06 Sep 14:17:57.892 * Clear FAIL state for node fbe774cdbd2acd24f9f5ea90d61c607bdf800eb5: master without slots is reachable again.

24. 問題排查

1) 如果最後一條日誌爲“16367:M 08 Jun 14:48:15.560 # Server started, Redis version 3.2.0”，節點狀態始終終於fail狀態，則可能是aof文件損壞了，這時可以使用工具edis-check-aof --fix進行修改，如：

../../bin/redis-check-aof --fix appendonly-6380.aof

0x a1492b9b: Expected prefix '

AOF analyzed: size=2705928192, ok_up_to=2705927067, diff=1125

This will shrink the AOF from 2705928192 bytes, with 1125 bytes, to 2705927067 bytes

Continue? [y/N]: y

2) in `call': ERR Slot 16011 is already busy (Redis::CommandError)

將所有節點上的配置項cluster-config-file指定的文件刪除，然後重新啓；或者在所有節點上執行下FLUSHALL命令。

另外，如果使用主機名而不是IP，也可能遇到這個錯誤，如：“redis-cli create --replicas 1 redis1:6379 redis2:6379 redis3:6379 redis4:6379 redis5:6379 redis6:6379”，可能也會得到錯誤“ERR Slot 16011 is already busy (Redis::CommandError)”。

3) for lack of backlog (Slave request was: 51875158284)

默認值：

# redis-cli config get repl-timeout

A) "repl-timeout"

B) "10"

# redis-cli config get client-output-buffer-limit

A) "client-output-buffer-limit"

B) "normal 0 0 0 slave 268435456 67108864 60 pubsub 33554432 8388608 60"

增大：

redis-cli config set "client-output-buffer-limit" "normal 0 0 0 slave 2684354560 671088640 60 pubsub 33554432 8388608 60"

4) 複製中斷場景

A) master的slave緩衝區達到限制的硬或軟限制大小，與參數client-output-buffer-limit相關；

B) 複製時間超過repl-timeout指定的值，與參數repl-timeout相關。

slave反覆循環從master複製，如果調整以上參數仍然解決不了，可以嘗試刪除slave上的aof和rdb文件，然後再重啓進程複製，這個時候可能能正常完成複製。

5) 日誌文件出現：Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.

考慮優化以下配置項：

no-appendfsync-on-rewrite值設爲yes

repl-backlog-size和client-output-buffer-limit調大一點

6) 日誌文件出現：MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.

考慮設置stop-writes-on-bgsave-error值爲“no”。

7) Failover auth granted to

當日志大量反反覆覆出現下列內容時，很可能表示master和slave間同步和通訊不順暢，導致無效的failover和狀態變更，這個時候需要調大相關參數值，容忍更長的延遲，因此也特別注意集羣內所有節點間的網絡延遲要儘可能的小，最好達到0.02ms左右的水平，調大參數的代價是主備切換變遲鈍。

Slave日誌：

31019:S 06 Sep 11:07:24.169 * Connecting to MASTER 10.5.14.8:6379

31019:S 06 Sep 11:07:24.169 * MASTER <-> SLAVE sync started

31019:S 06 Sep 11:07:24.169 # Start of election delayed for 854 milliseconds (rank #0, offset 5127277817).

31019:S 06 Sep 11:07:24.169 * Non blocking connect for SYNC fired the event.

31019:S 06 Sep 11:07:25.069 # Starting a failover election for epoch 266.

31019:S 06 Sep 11:07:29.190 * Clear FAIL state for node ae8f6e7e0ab16b04414c8f3d08b58c0aa268b467: is reachable again and nobody is serving its slots after some time.

31019:S 06 Sep 11:07:29.191 * Master replied to PING, replication can continue...

31019:S 06 Sep 11:07:29.191 * Clear FAIL state for node f7d6b2c72fa3b801e7dcfe0219e73383d143dd0f: is reachable again and nobody is serving its slots after some time.

31019:S 06 Sep 11:07:29.192 * Trying a partial resynchronization (request ea2261c827fbc54135a95f707046581a55dff133:5127277818).

31019:S 06 Sep 11:07:29.192 * Successful partial resynchronization with master.

31019:S 06 Sep 11:07:29.192 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.

31019:S 06 Sep 11:07:29.811 * Clear FAIL state for node e438a338e9d9834a6745c12931950da87e360ca2: is reachable again and nobody is serving its slots after some time.

31019:S 06 Sep 11:07:37.680 * FAIL message received from 5b41f7860cc800e65932e92d1d97c6c188138e56 about 3114cec541c5bcd36d712cd6c9f4c5055510e386

31019:S 06 Sep 11:07:43.710 * Clear FAIL state for node 3114cec541c5bcd36d712cd6c9f4c5055510e386: slave is reachable again.

31019:S 06 Sep 11:07:48.119 * FAIL message received from 7d61af127c17d9c19dbf9af0ac8f7307f1c96c4b about e1c202d89ffe1c61b682e28071627635974c84a7

31019:S 06 Sep 11:07:49.410 * FAIL message received from 5b41f7860cc800e65932e92d1d97c6c188138e56 about d9dadf3342006e2c92def3071ca0a76390be62b0

31019:S 06 Sep 11:07:53.352 * Clear FAIL state for node d9dadf3342006e2c92def3071ca0a76390be62b0: slave is reachable again.

31019:S 06 Sep 11:07:57.147 * Clear FAIL state for node e1c202d89ffe1c61b682e28071627635974c84a7: slave is reachable again.

31019:S 06 Sep 11:08:36.516 * FAIL message received from ae8f6e7e0ab16b04414c8f3d08b58c0aa268b467 about 938d9ae2de278938beda1d39185608b02d3b31ec

31019:S 06 Sep 11:08:41.900 * Clear FAIL state for node 938d9ae2de278938beda1d39185608b02d3b31ec: slave is reachable again.

31019:S 06 Sep 11:08:46.380 * FAIL message received from d7942cfe636b25219c6d56aa72828fcfde2ee261 about fbe774cdbd2acd24f9f5ea90d61c607bdf800eb5

31019:S 06 Sep 11:08:46.531 * Marking node 7990d146cece7dc83eaf08b3e12cbebb2223f5f8 as failing (quorum reached).

31019:S 06 Sep 11:09:01.882 * Clear FAIL state for node 7990d146cece7dc83eaf08b3e12cbebb2223f5f8: master without slots is reachable again.

31019:S 06 Sep 11:09:01.883 * Clear FAIL state for node fbe774cdbd2acd24f9f5ea90d61c607bdf800eb5: master without slots is reachable again.

31019:S 06 Sep 11:09:06.538 * FAIL message received from e438a338e9d9834a6745c12931950da87e360ca2 about d7942cfe636b25219c6d56aa72828fcfde2ee261

31019:S 06 Sep 11:09:06.538 * FAIL message received from e438a338e9d9834a6745c12931950da87e360ca2 about 1ba437fa1683a8caafd38ff977e5fbabdaf84fd6

31019:S 06 Sep 11:09:12.555 * Clear FAIL state for node 1ba437fa1683a8caafd38ff977e5fbabdaf84fd6: is reachable again and nobody is serving its slots after some time.

31019:S 06 Sep 11:09:12.555 * Clear FAIL state for node d7942cfe636b25219c6d56aa72828fcfde2ee261: master without slots is reachable again.

31019:S 06 Sep 11:09:15.565 * Marking node 534b93af6ba45a7033dbf38c8f47cd688514125a as failing (quorum reached).

31019:S 06 Sep 11:09:16.599 * FAIL message received from 0a92bd7472c9af3e52f9185eac1bd1bbf36146e6 about e1c202d89ffe1c61b682e28071627635974c84a7

31019:S 06 Sep 11:09:22.262 * Clear FAIL state for node 534b93af6ba45a7033dbf38c8f47cd688514125a: slave is reachable again.

31019:S 06 Sep 11:09:27.906 * Clear FAIL state for node e1c202d89ffe1c61b682e28071627635974c84a7: is reachable again and nobody is serving its slots after some time.

31019:S 06 Sep 11:09:50.744 * FAIL message received from ae8f6e7e0ab16b04414c8f3d08b58c0aa268b467 about e1c202d89ffe1c61b682e28071627635974c84a7

31019:S 06 Sep 11:09:55.141 * FAIL message received from 5b41f7860cc800e65932e92d1d97c6c188138e56 about d9dadf3342006e2c92def3071ca0a76390be62b0

31019:S 06 Sep 11:09:55.362 * FAIL message received from 7d61af127c17d9c19dbf9af0ac8f7307f1c96c4b about 938d9ae2de278938beda1d39185608b02d3b31ec

31019:S 06 Sep 11:09:55.557 * FAIL message received from ae8f6e7e0ab16b04414c8f3d08b58c0aa268b467 about 1d07e208db56cfd7395950ca66e03589278b8e12

31019:S 06 Sep 11:09:55.578 * FAIL message received from ae8f6e7e0ab16b04414c8f3d08b58c0aa268b467 about 144347d5a51acf047887fe81f22e8f7705c911ec

31019:S 06 Sep 11:09:56.521 * Marking node 534b93af6ba45a7033dbf38c8f47cd688514125a as failing (quorum reached).

31019:S 06 Sep 11:09:57.996 * Clear FAIL state for node 1d07e208db56cfd7395950ca66e03589278b8e12: slave is reachable again.

31019:S 06 Sep 11:09:58.329 * FAIL message received from 5b41f7860cc800e65932e92d1d97c6c188138e56 about 0a92bd7472c9af3e52f9185eac1bd1bbf36146e6

31019:S 06 Sep 11:10:09.239 * Clear FAIL state for node 144347d5a51acf047887fe81f22e8f7705c911ec: slave is reachable again.

31019:S 06 Sep 11:10:09.812 * Clear FAIL state for node d9dadf3342006e2c92def3071ca0a76390be62b0: is reachable again and nobody is serving its slots after some time.

31019:S 06 Sep 11:10:13.549 * Clear FAIL state for node 534b93af6ba45a7033dbf38c8f47cd688514125a: slave is reachable again.

31019:S 06 Sep 11:10:13.590 * FAIL message received from 716f2e2dd9792eaf4ee486794c9797fa6e1c9650 about 1ba437fa1683a8caafd38ff977e5fbabdaf84fd6

31019:S 06 Sep 11:10:13.591 * FAIL message received from f7d6b2c72fa3b801e7dcfe0219e73383d143dd0f about d7942cfe636b25219c6d56aa72828fcfde2ee261

31019:S 06 Sep 11:10:14.316 * Clear FAIL state for node e1c202d89ffe1c61b682e28071627635974c84a7: is reachable again and nobody is serving its slots after some time.

31019:S 06 Sep 11:10:15.108 * Clear FAIL state for node d7942cfe636b25219c6d56aa72828fcfde2ee261: slave is reachable again.

31019:S 06 Sep 11:10:17.588 * Clear FAIL state for node 938d9ae2de278938beda1d39185608b02d3b31ec: slave is reachable again.

31019:S 06 Sep 11:10:32.622 * Clear FAIL state for node 0a92bd7472c9af3e52f9185eac1bd1bbf36146e6: slave is reachable again.

31019:S 06 Sep 11:10:32.623 * FAIL message received from 5b41f7860cc800e65932e92d1d97c6c188138e56 about 3114cec541c5bcd36d712cd6c9f4c5055510e386

31019:S 06 Sep 11:10:32.623 * Clear FAIL state for node 3114cec541c5bcd36d712cd6c9f4c5055510e386: slave is reachable again.

Master日誌：

31014:M 06 Sep 14:08:54.083 * Background saving terminated with success

31014:M 06 Sep 14:09:55.093 * 10000 changes in 60 seconds. Saving...

31014:M 06 Sep 14:09:55.185 * Background saving started by pid 41395

31014:M 06 Sep 14:11:00.269 # Disconnecting timedout slave: 10.15.40.9:6018

31014:M 06 Sep 14:11:00.269 # Connection with slave 10.15.40.9:6018 lost.

41395:C 06 Sep 14:11:01.141 * DB saved on disk

41395:C 06 Sep 14:11:01.259 * RDB: 5 MB of memory used by copy-on-write

31014:M 06 Sep 14:11:01.472 * Background saving terminated with success

31014:M 06 Sep 14:11:11.525 * FAIL message received from 1d07e208db56cfd7395950ca66e03589278b8e12 about 534b93af6ba45a7033dbf38c8f47cd688514125a

31014:M 06 Sep 14:11:23.039 * FAIL message received from 1ba437fa1683a8caafd38ff977e5fbabdaf84fd6 about d78845370c98b3ce4cfc02e8d3e233a9a1d84a83

31014:M 06 Sep 14:11:23.541 * Clear FAIL state for node 534b93af6ba45a7033dbf38c8f47cd688514125a: slave is reachable again.

31014:M 06 Sep 14:11:23.813 * Slave 10.15.40.9:6018 asks for synchronization

31014:M 06 Sep 14:11:23.813 * Partial resynchronization request from 10.15.40.9:6018 accepted. Sending 46668 bytes of backlog starting from offset 5502672944.

31014:M 06 Sep 14:11:23.888 # Failover auth granted to 7d61af127c17d9c19dbf9af0ac8f7307f1c96c4b for epoch 283

31014:M 06 Sep 14:11:32.464 * FAIL message received from d6eb06e9d118c120d3961a659972a1d0191a8652 about 3114cec541c5bcd36d712cd6c9f4c5055510e386

31014:M 06 Sep 14:11:47.616 * Clear FAIL state for node d78845370c98b3ce4cfc02e8d3e233a9a1d84a83: master without slots is reachable again.

31014:M 06 Sep 14:11:55.515 * FAIL message received from d6eb06e9d118c120d3961a659972a1d0191a8652 about ae8f6e7e0ab16b04414c8f3d08b58c0aa268b467

31014:M 06 Sep 14:11:57.135 # Failover auth granted to ae8f6e7e0ab16b04414c8f3d08b58c0aa268b467 for epoch 284

31014:M 06 Sep 14:12:01.766 * Clear FAIL state for node ae8f6e7e0ab16b04414c8f3d08b58c0aa268b467: slave is reachable again.

31014:M 06 Sep 14:12:08.753 * Clear FAIL state for node 3114cec541c5bcd36d712cd6c9f4c5055510e386: master without slots is reachable again.

31014:M 06 Sep 14:16:02.070 * 10 changes in 300 seconds. Saving...

31014:M 06 Sep 14:16:02.163 * Background saving started by pid 13832

31014:M 06 Sep 14:17:18.443 * FAIL message received from ae8f6e7e0ab16b04414c8f3d08b58c0aa268b467 about d6eb06e9d118c120d3961a659972a1d0191a8652

31014:M 06 Sep 14:17:18.443 # Failover auth granted to f7d6b2c72fa3b801e7dcfe0219e73383d143dd0f for epoch 285

31014:M 06 Sep 14:17:29.272 # Connection with slave client id #40662 lost.

31014:M 06 Sep 14:17:29.273 # Failover auth denied to 534b93af6ba45a7033dbf38c8f47cd688514125a: already voted for epoch 285

31014:M 06 Sep 14:17:29.278 * Slave 10.15.40.9:6018 asks for synchronization

31014:M 06 Sep 14:17:29.278 * Partial resynchronization request from 10.15.40.9:6018 accepted. Sending 117106 bytes of backlog starting from offset 5502756264.

13832:C 06 Sep 14:17:29.850 * DB saved on disk

13832:C 06 Sep 14:17:29.970 * RDB: 7 MB of memory used by copy-on-write

31014:M 06 Sep 14:17:38.449 * FAIL message received from f7d6b2c72fa3b801e7dcfe0219e73383d143dd0f about 1ba437fa1683a8caafd38ff977e5fbabdaf84fd6

31014:M 06 Sep 14:17:38.449 * FAIL message received from 1d07e208db56cfd7395950ca66e03589278b8e12 about d7942cfe636b25219c6d56aa72828fcfde2ee261

31014:M 06 Sep 14:17:38.449 # Failover auth denied to 938d9ae2de278938beda1d39185608b02d3b31ec: reqEpoch (286) < curEpoch(287)

31014:M 06 Sep 14:17:38.449 # Failover auth granted to d9dadf3342006e2c92def3071ca0a76390be62b0 for epoch 287

31014:M 06 Sep 14:17:38.449 * Background saving terminated with success

31014:M 06 Sep 14:17:38.450 * Clear FAIL state for node d7942cfe636b25219c6d56aa72828fcfde2ee261: master without slots is reachable again.

31014:M 06 Sep 14:17:38.450 * Clear FAIL state for node 1ba437fa1683a8caafd38ff977e5fbabdaf84fd6: master without slots is reachable again.

31014:M 06 Sep 14:17:38.452 * Clear FAIL state for node d6eb06e9d118c120d3961a659972a1d0191a8652: slave is reachable again.

31014:M 06 Sep 14:17:54.985 * FAIL message received from ae8f6e7e0ab16b04414c8f3d08b58c0aa268b467 about 7990d146cece7dc83eaf08b3e12cbebb2223f5f8

31014:M 06 Sep 14:17:56.729 * FAIL message received from 1d07e208db56cfd7395950ca66e03589278b8e12 about fbe774cdbd2acd24f9f5ea90d61c607bdf800eb5

31014:M 06 Sep 14:17:57.737 # Failover auth granted to e1c202d89ffe1c61b682e28071627635974c84a7 for epoch 288

31014:M 06 Sep 14:17:57.922 * Clear FAIL state for node fbe774cdbd2acd24f9f5ea90d61c607bdf800eb5: master without slots is reachable again.

31014:M 06 Sep 14:17:57.923 * Clear FAIL state for node 7990d146cece7dc83eaf08b3e12cbebb2223f5f8: slave is reachable again.

Redis-5.0.5集羣配置

目錄

1. 前言

2. 名詞解釋

3. 部署計劃

4. 修改系統參數

4.1. 修改最大可打開文件數

4.2. TCP監聽隊列大小

4.3. OOM相關：vm.overcommit_memory

4.4. /sys/kernel/mm/transparent_hugepage/enabled

5. 目錄結構

6. 編譯安裝

7. 配置redis

8. 啓動redis實例

9. 創建和啓動redis集羣

9.1. 創建redis cluster

9.2. ps aux|grep redis

10. redis cluster client

10.1. 命令行工具redis-cli

10.2. 從slaves讀數據

10.3. jedis（java cluster client）

10.4. r3c（C++ cluster client）

11. 新增節點

11.1. 添加一個新主（master）節點

11.2. 添加一個新從（slave）節點

12. 刪除節點

13. master機器硬件故障

14. 檢查節點狀態

15. 變更主從關係

16. slots相關命令

17. 遷移slosts

18. 人工主備切換

19. 查看集羣信息

20. 禁止指定命令

21. 數據遷移

22. 各版本配置文件

23. 大壓力下Redis參數調整要點

24. 問題排查