Redis Cluster

早期Redis 分佈式集羣部署方案：

客戶端分區：由客戶端程序決定key寫分配和寫入的redis node，但是需要客戶端自己處理寫入分配、高可用管理和故障轉移等
代理方案：基於三方軟件實現redis proxy，客戶端先連接之代理層，由代理層實現key的寫入分配，對客戶端來說是有比較簡單，但是對於集羣管節點增減相對比較麻煩，而且代理本身也是單點和性能瓶頸。

在哨兵sentinel機制中，可以解決redis高可用的問題，即當master故障後可以自動將slave提升爲master從而可以保證redis服務的正常使用，但是無法解決redis單機寫入的瓶頸問題，即單機的redis寫入性能受限於單機的內存大小、併發數量、網卡速率等因素，因此redis官方在redis 3.0版本之後推出了無中心架構的redis cluster機制，在無中心的redis集羣當中，其每個節點保存當前節點數據和整個集羣狀態,每個節點都和其他所有節點連接，特點如
下：

1：所有Redis節點使用(PING機制)互聯
2：集羣中某個節點的失效，是整個集羣中超過半數的節點監測都失效纔算真正的失效
3：客戶端不需要proxy即可直接連接redis，應用程序需要寫全部的redis服務器IP。
4：redis cluster把所有的redis node映射到 0-16383個槽位(slot)上，讀寫需要到指定的redis node上進行操作，因此有多少個reids node相當於redis 併發擴展了多少倍。
5：Redis cluster預先分配16384個(slot)槽位，當需要在redis集羣中寫入一個key -value的時候，會使用
CRC16(key) mod 16384之後的值，決定將key寫入值哪一個槽位從而決定寫入哪一個Redis節點上，從而有效解決單機瓶頸。

Redis cluster基本架構

假如三個主節點分別是：A, B, C 三個節點，採用哈希槽 (hash slot)的方式來分配16384個slot 的話，它們三個節點分別承擔的slot 區間是

節點A覆蓋 0－5460
節點B覆蓋 5461－10922
節點C覆蓋 10923－16383

Redis cluster主從架構

Redis cluster的架構雖然解決了併發的問題，但是又引入了一個新的問題，每個Redis master的高可用如何解決？

部署集羣

環境：生產環境建議直接6臺服務器

–		–	–
master	172.222.2.107：6379/6380	172.222.2.117：6379/6380	172.222.2.127：6379/6380
slave	172.222.2.10：6379/6380	172.222.2.11：6379/6380	172.222.2.12：6379/6380
	預留主機	172.222.2.13：6379/6380	172.222.2.14：6379/6380

創建redis cluster集羣的前提

1.每個redis node節點採用相同的硬件配置、相同的密碼、相同的redis版本。

2.每個節點必須開啓的參數cluster-enabled yes #必須開啓集羣狀態，開啓後redis 進程會有cluster顯示 cluster-config-file nodes-6380.conf #此文件有redis cluster集羣自動創建和維護，不需要任何手動操作

3.所有redis服務器必須沒有任何數據

4.先啓動爲單機redis且沒有任何key value

各服務器配置cluster文件

bind 0.0.0.0
requirepass 123456
logfile "/apps/redis/log/redis.log"
dir "/apps/redis/data"
cluster-enabled yes #開啓cluster集羣
cluster-node-timeout 15000  #node同步超時 
cluster-config-file nodes-6379.conf #clusterfile

驗證當前Redis服務狀態：

#ps -ef |grep redis
redis      3820      1  0 11:07 ?        00:00:00 /apps/redis/bin/redis-server 0.0.0.0:6379 [cluster]

#ss -tnlp |grep redis
LISTEN     0      511          *:6379   #客戶端通信端口                  *:*                   users:(("redis-server",pid=3820,fd=6))
LISTEN     0      511          *:16379   #服務端端口                 *:*                   users:(("redis-server",pid=3820,fd=8))

創建集羣

Redis 3和 4版本：
需要使用到集羣管理工具redis-trib.rb，這個工具是redis官方推出的管理redis集羣的工具，集成在redis的源碼src目錄下，是基於redis提供的集羣命令封裝成簡單、便捷、實用的操作工具，redis-trib.rb是redis作者用ruby開發完成的，centos 系統yum安裝的ruby存在版本較低問題，如下：

# yum install ruby rubygems -y
# find / -name redis-trib.rb
/usr/local/src/redis-4.0.14/src/redis-trib.rb
[root@s1 ~]# cp /usr/local/src/redis-4.0.14/src/redis-trib.rb /usr/bin/
[root@s1 src]# gem install redis
Fetching: redis-4.1.2.gem (100%)
ERROR: Error installing redis:
redis requires Ruby version >= 2.3.0.

解決ruby版本較低問題：

#解決ruby版本較低問題：
# yum remove ruby rubygems -y
# wget https://cache.ruby-lang.org/pub/ruby/2.5/ruby-2.5.5.tar.gz
# tar xf ruby-2.5.5.tar.gz -C /apps/redis
# cd ruby-2.5.5
# ./configure
# make -j 2
# make install
/apps/redis/ruby-2.5.5/bin]#./gem install redis
Fetching: redis-4.1.3.gem (100%)
Successfully installed redis-4.1.3
Parsing documentation for redis-4.1.3
Installing ri documentation for redis-4.1.3
Done installing documentation for redis after 1 seconds
1 gem installed

#如果無法在線安裝，可以下載reids模塊安裝包離線安裝
#https://rubygems.org/gems/redis #先下載redis模塊安裝包
#gem install -l redis-3.3.0.gem #安裝redis模塊

如果gem install redis這一步出錯了

ERROR:  Loading command: install (LoadError)
    cannot load such file -- zlib
ERROR:  While executing gem ... (NoMethodError)
    undefined method `invoke_with_build_args' for nil:NilClass

問題解決方案-安裝庫

yum -y install zlib-devel openssl-devel
---------------------------------------
cd /ruby-2.5.5/ext/zlib
-------------------------------------
ruby extconf.rb

問題解決方案-修改Makefile文件

vim Makefile 290

省略……
zlib.o: $(RUBY_EXTCONF_H)
zlib.o: $(arch_hdrdir)/ruby/config.h
zlib.o: $(hdrdir)/ruby/backward.h
zlib.o: $(hdrdir)/ruby/defines.h
zlib.o: $(hdrdir)/ruby/encoding.h
zlib.o: $(hdrdir)/ruby/intern.h
zlib.o: $(hdrdir)/ruby/io.h
zlib.o: $(hdrdir)/ruby/missing.h
zlib.o: $(hdrdir)/ruby/onigmo.h
zlib.o: $(hdrdir)/ruby/oniguruma.h
zlib.o: $(hdrdir)/ruby/ruby.h
zlib.o: $(hdrdir)/ruby/st.h
zlib.o: $(hdrdir)/ruby/subst.h
zlib.o: $(hdrdir)/ruby/thread.h
zlib.o: $(top_srcdir)/include/ruby.h   #修改此行zlib.o: ../../include/ruby.h  
zlib.o: zlib.c
省略……

重新編譯

#cd ../../
#make clean
#make && make install


# ./bin/gem install redis #重新安裝redis
Fetching: redis-4.1.3.gem (100%)
Successfully installed redis-4.1.3
Parsing documentation for redis-4.1.3
Installing ri documentation for redis-4.1.3
Done installing documentation for redis after 0 seconds
1 gem installed

驗證redis-trib.rb命令是否可執行:

這個命令是在redis的源碼包中/root/redis-4.0.14/src/redis-trib.rb

ln -s /root/redis-4.0.14/src/redis-trib.rb /usr/bin/ #創建軟連接

/apps/redis]#redis-trib.rb
Usage: redis-trib <command> <options> <arguments ...>
create host1:port1 ... hostN:portN #創建集羣
--replicas <arg> #指定master的副本數量
check host:port #檢查集羣信息
info host:port #查看集羣主機信息
fix host:port #修復集羣
--timeout <arg>
reshard host:port #在線熱遷移集羣指定主機的slots數據
--from <arg>
--to <arg>
--slots <arg>
--yes
--timeout <arg>
--pipeline <arg>
rebalance host:port #平衡集羣中各主機的slot數量
--weight <arg>
--auto-weights
--use-empty-masters
--timeout <arg>
--simulate
--pipeline <arg>
--threshold <arg>
add-node new_host:new_port existing_host:existing_port #添加主機到集羣
--slave
--master-id <arg>
del-node host:port node_id #刪除主機
set-timeout host:port milliseconds #設置節點的超時時間
call host:port command arg arg .. arg #在集羣上的所有節點上執行命令
import host:port #導入外部redis服務器的數據到當前集羣
--from <arg>
--copy
--replace
help (show this help)

修改密碼redis 登錄密碼

# vim /usr/local/lib/ruby/gems/2.5.0/gems/redis-4.1.3/lib/redis/client.rb

)]

保證每個集羣node節點正常運行創建redis cluster集羣

#redis-trib.rb create --replicas 1 172.222.2.107:6379 172.222.2.117:6379 172.222.2.127:6379 172.222.2.10:6379 172.222.2.11:6379 172.222.2.12:6379

#Redis 3/4版本：
>>> Creating cluster
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
172.222.2.107:6379
172.222.2.117:6379
172.222.2.127:6379
Adding replica 172.222.2.11:6379 to 172.222.2.107:6379
Adding replica 172.222.2.12:6379 to 172.222.2.117:6379
Adding replica 172.222.2.10:6379 to 172.222.2.127:6379
M: f3c7690f8855d568ddb4800bf3ff9e3add81320f 172.222.2.107:6379
   slots:0-5460 (5461 slots) master
M: 2db718e78013cae9d9742751d57a746c76981695 172.222.2.117:6379
   slots:5461-10922 (5462 slots) master
M: 37b3d5c303aba40acb819d452f155dd633b01377 172.222.2.127:6379
   slots:10923-16383 (5461 slots) master
S: 007f65ee64fe06543fa434ecd42b88a9d14e9247 172.222.2.10:6379
   replicates 37b3d5c303aba40acb819d452f155dd633b01377
S: 8fb2a00ad5cc8384f672cc29e5b99906cdd4afab 172.222.2.11:6379
   replicates f3c7690f8855d568ddb4800bf3ff9e3add81320f
S: 9ed92a3e95dd28039bad64a6a6a437b27028dbb2 172.222.2.12:6379
   replicates 2db718e78013cae9d9742751d57a746c76981695
Can I set the above configuration? (type 'yes' to accept): yes #確定集羣分配
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join..
>>> Performing Cluster Check (using node 172.222.2.107:6379)
M: f3c7690f8855d568ddb4800bf3ff9e3add81320f 172.222.2.107:6379
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
S: 007f65ee64fe06543fa434ecd42b88a9d14e9247 172.222.2.10:6379
   slots: (0 slots) slave
   replicates 37b3d5c303aba40acb819d452f155dd633b01377
S: 8fb2a00ad5cc8384f672cc29e5b99906cdd4afab 172.222.2.138:6379
   slots: (0 slots) slave
   replicates f3c7690f8855d568ddb4800bf3ff9e3add81320f
M: 2db718e78013cae9d9742751d57a746c76981695 172.222.2.117:6379
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
M: 37b3d5c303aba40acb819d452f155dd633b01377 172.222.2.127:6379
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: 9ed92a3e95dd28039bad64a6a6a437b27028dbb2 172.222.2.12:6379
   slots: (0 slots) slave
   replicates 2db718e78013cae9d9742751d57a746c76981695
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

如果有之前的操作導致 Redis 集羣創建報錯，則執行清空數據和集羣命令

# 127.0.0.1:6379> FLUSHALL
OK
# 127.0.0.1:6379> cluster reset
OK

# systemctl stop redis
# rm -rf /apps/redis/data/*
# systemctl start redis
# 如果還是不行，重新停止每臺redis主機，並刪除節點數據

Redis 5版本：

#redis-cli -a 123456 --cluster create 1 172.222.2.107:6379 172.222.2.117:6379 172.222.2.127:6379 172.222.2.10:6379 172.222.2.11:6379 172.222.2.12:6379 --cluster-replicas 1

驗證Redis集羣狀態

由於未設置masterauth認證密碼，所以主從未建立起來，但是集羣已經運行，所以需要在每個slave控制檯使用
config set設置masterauth密碼，或者寫在每個redis配置文件中，最好是在控制點設置密碼之後再寫入配置文件當中。

172.222.2.11:6379> info replication  #任意slave節點查看
# Replication
role:slave
master_host:172.222.2.107
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1
master_link_down_since_seconds:1581575827
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:7a80851b2c66738f3abba509b2598bcd0e3f07f4
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

驗證master狀態

172.222.2.117:6379> info replication
# Replication
role:master
connected_slaves:0
master_replid:5ed4cbd89785d4635e35f8bcf632285677cf8246
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

驗證集羣狀態

172.222.2.117:6379> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:2
cluster_stats_messages_ping_sent:5495
cluster_stats_messages_pong_sent:4806
cluster_stats_messages_meet_sent:2
cluster_stats_messages_sent:10303
cluster_stats_messages_ping_received:4803
cluster_stats_messages_pong_received:5497
cluster_stats_messages_meet_received:3
cluster_stats_messages_received:10303

查看集羣node對應關係

172.222.2.107:6379> cluster  nodes
b1ed14b586e20cde9e08da6bef39e0107c705ffa 172.222.2.127:6379@16379 master - 0 1581585431269 3 connected 10923-16383
34152ec02a70206be59f41133d23f2e6076dc056 172.222.2.117:6379@16379 master - 0 1581585432000 2 connected 5461-10922
01f29f0c1eb9d203c3affad66885a5e9b53f5cf6 172.222.2.10:6379@16379 slave b1ed14b586e20cde9e08da6bef39e0107c705ffa 0 1581585432275 4 connected
ad8963b59fa501819fd70b45776a0106ef6b2ec1 172.222.2.107:6379@16379 myself,master - 0 1581585430000 1 connected 0-5460
c43cdbc5afacb9ff21c1b48f03e1f9cbad9cfdb0 172.222.2.11:6379@16379 slave ad8963b59fa501819fd70b45776a0106ef6b2ec1 0 1581585431268 5 connected
41ddaa8d7903bd8c9e18640d991486baac31e74b 172.222.2.12:6379@16379 slave 34152ec02a70206be59f41133d23f2e6076dc056 0 1581585431570 6 connected

保證每臺主從關係同步正確,錯誤示例日誌

# tail -f /apps/redis/logs/redis.log 
7625:S 15 Feb 00:02:11.807 # Error condition on socket for SYNC: Connection refused
7625:S 15 Feb 00:02:12.809 * Connecting to MASTER 172.222.2.12:6379
7625:S 15 Feb 00:02:12.809 * MASTER <-> SLAVE sync started
7625:S 15 Feb 00:02:12.810 # Error condition on socket for SYNC: Connection refused
7625:S 15 Feb 00:02:13.811 * Connecting to MASTER 172.222.2.12:6379
7625:S 15 Feb 00:02:13.811 * MASTER <-> SLAVE sync started
7625:S 15 Feb 00:02:13.812 # Error condition on socket for SYNC: Connection refused

解決方案-窗口級別臨時添加密碼認證

# /apps/redis/bin/redis-cli 
172.222.2.12:6379> auth 123456
OK
172.222.2.12:6379> CONFIG set masterauth 123456
OK

再次查看slave日誌驗證

# tail -f /apps/redis/logs/redis.log   #保證數據可以正常同步
22536:S 15 Feb 00:01:54.368 * Connecting to MASTER 172.222.2.117:6379
22536:S 15 Feb 00:01:54.368 * MASTER <-> SLAVE sync started
22536:S 15 Feb 00:01:54.369 * Non blocking connect for SYNC fired the event.
22536:S 15 Feb 00:01:54.369 * Master replied to PING, replication can continue...
22536:S 15 Feb 00:01:54.369 * Partial resynchronization not possible (no cached master)
22536:S 15 Feb 00:01:54.369 * Full resync from master: dc8f1d6513d4247a2edb0f6cb62c0a2037a1536b:938
22536:S 15 Feb 00:01:54.460 * MASTER <-> SLAVE sync: receiving 177 bytes from master
22536:S 15 Feb 00:01:54.460 * MASTER <-> SLAVE sync: Flushing old data
22536:S 15 Feb 00:01:54.460 * MASTER <-> SLAVE sync: Loading DB in memory
22536:S 15 Feb 00:01:54.460 * MASTER <-> SLAVE sync: Finished with success

驗證集羣寫入key

172.222.2.127:6379> set key2 value2 #經過算法計算，當前key的槽位需要寫入指定的node
(error) MOVED 4998 172.222.2.107:6379  #槽位不在當前node所以無法寫入

172.222.2.117:6379> set key2 value2
(error) MOVED 4998 172.222.2.107:6379

172.222.2.107:6379> set key2 value2  #指定的node就可以寫入
OK


172.222.2.107:6379> KEYS *
1) "key2"
172.222.2.117:6379> KEYS *
(empty list or set)
172.222.2.127:6379> KEYS *
(empty list or set)

集羣狀態驗證與監控

#redis-trib.rb check 172.222.2.107:6379
>>> Performing Cluster Check (using node 172.222.2.107:6379)
M: ad8963b59fa501819fd70b45776a0106ef6b2ec1 172.222.2.107:6379
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: b1ed14b586e20cde9e08da6bef39e0107c705ffa 172.222.2.127:6379
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
M: 34152ec02a70206be59f41133d23f2e6076dc056 172.222.2.117:6379
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: 01f29f0c1eb9d203c3affad66885a5e9b53f5cf6 172.222.2.10:6379
   slots: (0 slots) slave
   replicates b1ed14b586e20cde9e08da6bef39e0107c705ffa
S: c43cdbc5afacb9ff21c1b48f03e1f9cbad9cfdb0 172.222.2.11:6379
   slots: (0 slots) slave
   replicates ad8963b59fa501819fd70b45776a0106ef6b2ec1
S: 41ddaa8d7903bd8c9e18640d991486baac31e74b 172.222.2.12:6379
   slots: (0 slots) slave
   replicates 34152ec02a70206be59f41133d23f2e6076dc056
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
#redis-trib.rb info 172.222.2.107:6379
172.222.2.107:6379 (ad8963b5...) -> 0 keys | 5461 slots | 1 slaves.
172.222.2.127:6379 (b1ed14b5...) -> 0 keys | 5461 slots | 1 slaves.
172.222.2.117:6379 (34152ec0...) -> 0 keys | 5462 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.

Redis 5：

redis-cli -a 123456 --cluster check 172.222.2.101:6379

redis cluster集羣節點維護

集羣運行時間長久之後，難免由於硬件故障、網絡規劃、業務增長等原因對已有集羣進行相應的調整，比如增加Redis node節點、減少節點、節點遷移、更換服務器等。

增加節點和刪除節點會涉及到已有的槽位重新分配及數據遷移

集羣維護之動態添加節點

增加master 172.222.2.13 slave 172.222.2.14，需要與之前的Redis node版本相同、配置一致，然後分別啓動兩臺Redis node且不能影響業務使用和數據丟失

add-node new_host:new_port existing_host:existing_port
要添加的新redis節點IP和端口 添加到的集羣中的master IP:端口，新的node節點加到集羣之後默認是master節
點，但是沒有slots數據，需要重新分配。

添加master節點

Redis 4
#redis-trib.rb add-node 172.222.2.13:6379 172.222.2.107:6379

Redis 5添加方式

#redis-cli -a 123456 --cluster add-node 172.222.2.13:6379 172.222.2.107:6379

#redis-trib.rb add-node 172.222.2.13:6379 172.222.2.107:6379
>>> Adding node 172.222.2.13:6379 to cluster 172.222.2.107:6379
>>> Performing Cluster Check (using node 172.222.2.107:6379)
M: ad8963b59fa501819fd70b45776a0106ef6b2ec1 172.222.2.107:6379
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: b1ed14b586e20cde9e08da6bef39e0107c705ffa 172.222.2.127:6379
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
M: 34152ec02a70206be59f41133d23f2e6076dc056 172.222.2.117:6379
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: 01f29f0c1eb9d203c3affad66885a5e9b53f5cf6 172.222.2.10:6379
   slots: (0 slots) slave
   replicates b1ed14b586e20cde9e08da6bef39e0107c705ffa
S: c43cdbc5afacb9ff21c1b48f03e1f9cbad9cfdb0 172.222.2.11:6379
   slots: (0 slots) slave
   replicates ad8963b59fa501819fd70b45776a0106ef6b2ec1
S: 41ddaa8d7903bd8c9e18640d991486baac31e74b 172.222.2.12:6379
   slots: (0 slots) slave
   replicates 34152ec02a70206be59f41133d23f2e6076dc056
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 172.222.2.13:6379 to make it join the cluster.
[OK] New node added correctly.

重新分配槽位

添加主機之後需要對添加至集羣種的新主機重新分片否則其沒有分片也就無法寫入數據。

驗證當前狀態

#redis-trib.rb check 172.222.2.107:6379  #當前狀態 ,新添加的master沒有對應的槽位
#redis-trib.rb reshard 172.222.2.107:6379 #重新分片

redis5
#redis-cli -a 123456 --cluster check 172.222.2.107:6379
# redis-cli -a 123456 --cluster reshard  172.222.2.107:6379


#使用命令對新加的主機重新分配槽位:
How many slots do you want to move (from 1 to 16384)? 4096 #分配多少個槽位

What is the receiving node ID? 111b4b3d85558858e00966f1cd51af5d04523ddf #接收slot的服務器ID，手動輸入172.222.2.13:6379的node ID

Source node #1: all #將哪些源主機的槽位分配給172.222.2.13:6379，all是自動在所有的redis node選擇劃分，如果是從redis cluster刪除主機可以使用此方式將主機上的槽位全部移動到別的redis主機


Do you want to proceed with the proposed reshard plan (yes/no)? yes #確認分配

確定分配狀態

Moving slot 11493 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11494 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11495 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11496 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11497 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11498 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11499 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11500 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11501 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11502 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11503 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11504 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11505 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11506 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11507 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11508 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11509 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11510 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11511 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11512 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11513 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11514 from 172.222.2.127:6379 to 172.222.2.13:6379:

驗證重新分配槽位之後的集羣狀態

爲新的master添加slave節點

需要再向當前的Redis集羣中添加一個Redis單機服務器，用於解決當前172.222.2.13單機的潛在宕機問題

Redis 3/4：
# redis-trib.rb add-node 172.222.2.14:6379 172.222.2.107:6379

Redis 5:
# redis-cli -a 123456 --cluster add-node 172.222.2.14:6379 172.222.2.107:6379

更改新節點更改狀態爲slave

需要手動將其指定爲某個master的slave，否則其默認角色爲master

# /apps/redis/bin/redis-cli  -h 172.222.2.14 -a 123456 #登錄到新添加節點
172.222.2.14:6379> auth 123456
OK
172.222.2.14:6379> cluster nodes #查看當前集羣節點，找到目標master 的ID
c43cdbc5afacb9ff21c1b48f03e1f9cbad9cfdb0 172.222.2.11:6379@16379 slave ad8963b59fa501819fd70b45776a0106ef6b2ec1 0 1581589009656 1 connected
111b4b3d85558858e00966f1cd51af5d04523ddf 172.222.2.13:6379@16379 master - 0 1581589009154 7 connected 0-1364 5461-6826 10923-12287  #masterID 
ad8963b59fa501819fd70b45776a0106ef6b2ec1 172.222.2.107:6379@16379 master - 0 1581589009556 1 connected 1365-5460
d160f06bc3ca9eea0c5e8ef7872b9a61b9983912 172.222.2.14:6379@16379 myself,master - 0 1581589009000 0 connected
b1ed14b586e20cde9e08da6bef39e0107c705ffa 172.222.2.127:6379@16379 master - 0 1581589010160 3 connected 12288-16383
41ddaa8d7903bd8c9e18640d991486baac31e74b 172.222.2.12:6379@16379 slave 34152ec02a70206be59f41133d23f2e6076dc056 0 1581589009000 2 connected
01f29f0c1eb9d203c3affad66885a5e9b53f5cf6 172.222.2.10:6379@16379 slave b1ed14b586e20cde9e08da6bef39e0107c705ffa 0 1581589009053 3 connected
34152ec02a70206be59f41133d23f2e6076dc056 172.222.2.117:6379@16379 master - 0 1581589009556 2 connected 6827-10922
172.222.2.14:6379> CLUSTER REPLICATE 111b4b3d85558858e00966f1cd51af5d04523ddf  #將其設置
slave
OK
命令格式爲cluster replicate MASTERID

驗證當前集羣狀態，驗證節點是否已經更改爲指定master 的slave

確認每個master都有一個slave

集羣維護之動態刪除節點

添加節點的時候是先添加node節點到集羣，然後分配槽位，刪除節點的操作與添加節點的操作正好相反，是先將
被刪除的Redis node上的槽位遷移到集羣中的其他Redis node節點上，然後再將其刪除，如果一個Redis node節
點上的槽位沒有被完全遷移，刪除該node的時候會提示有數據且無法刪除。

遷移master 的槽位到其他master

被遷移Redis master源服務器必須保證沒有數據，否則遷移報錯並會被強制中斷。

redis4
# redis-trib.rb reshard 172.222.2.13:6379
# redis-trib.rb fix 172.222.2.14:6379 #如果遷移失敗使用此命令修復集羣

redis5
#redis-cli -a 123456 --cluster reshard 172.222.2.13:6379

遷移172.222.2.13的槽位到172.222.2.127上

#redis-trib.rb reshard 172.222.2.13:6379
>>> Performing Cluster Check (using node 172.222.2.13:6379)
M: 111b4b3d85558858e00966f1cd51af5d04523ddf 172.222.2.13:6379
   slots:0-1364,5461-6826,10923-12287 (4096 slots) master
   1 additional replica(s)
S: 01f29f0c1eb9d203c3affad66885a5e9b53f5cf6 172.222.2.10:6379
   slots: (0 slots) slave
   replicates b1ed14b586e20cde9e08da6bef39e0107c705ffa
S: d160f06bc3ca9eea0c5e8ef7872b9a61b9983912 172.222.2.14:6379
   slots: (0 slots) slave
   replicates 111b4b3d85558858e00966f1cd51af5d04523ddf
M: ad8963b59fa501819fd70b45776a0106ef6b2ec1 172.222.2.107:6379
   slots:1365-5460 (4096 slots) master
   1 additional replica(s)
M: b1ed14b586e20cde9e08da6bef39e0107c705ffa 172.222.2.127:6379
   slots:12288-16383 (4096 slots) master
   1 additional replica(s)
M: 34152ec02a70206be59f41133d23f2e6076dc056 172.222.2.117:6379
   slots:6827-10922 (4096 slots) master
   1 additional replica(s)
S: 41ddaa8d7903bd8c9e18640d991486baac31e74b 172.222.2.12:6379
   slots: (0 slots) slave
   replicates 34152ec02a70206be59f41133d23f2e6076dc056
S: c43cdbc5afacb9ff21c1b48f03e1f9cbad9cfdb0 172.222.2.11:6379
   slots: (0 slots) slave
   replicates ad8963b59fa501819fd70b45776a0106ef6b2ec1
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096 #遷移master上的多少個槽位
What is the receiving node ID? b1ed14b586e20cde9e08da6bef39e0107c705ffa #接收槽位的服務器172.222.2.127的ID
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:111b4b3d85558858e00966f1cd51af5d04523ddf #從哪個服務器遷移4096個槽位，13的主機ID
Source node #2:done #寫done，表示沒有其他master了
  Moving slot 12286 from 111b4b3d85558858e00966f1cd51af5d04523ddf
    Moving slot 12287 from 111b4b3d85558858e00966f1cd51af5d04523ddf
Do you want to proceed with the proposed reshard plan (yes/no)? yes #是否繼續

遷移完成

驗證槽位遷移完成

從集羣刪除服務器

雖然槽位已經遷移完成，但是服務器IP信息還在集羣當中，因此還需要將IP信息從集羣刪除

刪除master

Redis 3/4：
#redis-trib.rb del-node 172.222.2.13:6379 111b4b3d85558858e00966f1cd51af5d04523ddf
>>> Removing node 111b4b3d85558858e00966f1cd51af5d04523ddf from cluster 172.222.2.13:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.

Redis 5:
# redis-cli -a 123456 --cluster del-node 172.222.2.13:6379 ID

驗證master-node是否被刪除

#redis-trib.rb info 172.222.2.107:6379
172.222.2.107:6379 (ad8963b5...) -> 1 keys | 4096 slots | 1 slaves.
172.222.2.127:6379 (b1ed14b5...) -> 0 keys | 8192 slots | 2 slaves.
172.222.2.117:6379 (34152ec0...) -> 1 keys | 4096 slots | 1 slaves.
[OK] 2 keys in 3 masters.
0.00 keys per slot on average.

#master被刪除之後，其之前的slave自動成爲了Redis集羣中其他master的slave，此節點如果不需要也可以一併刪除。
#確認整個Redis cluster集羣中，每個master至少有一個slave，可以有多個，但是至少要有一個提供數據備份和服務高可用

驗證集羣Master與Slave對應關係：
Redis Slave節點一定不能個master在一個服務器，必須爲跨主機交叉備份模式，避免主機故障後主備全部掛掉，如果出現Redis Slave與Redis master在同一臺Redis node的情況，則需要安裝以上步驟重新進行slave分配，直到不相互交叉備份爲止。

集羣維護之模擬Master宕機

目前的架構爲三主三從，互爲跨主機master slave模式，測試master 宕機之後是否會自動切換至slave

測試數據寫入

測試數據寫入master，並在相應slave驗證數據

//Master 172.222.2.117
[root@centos7 ~]# redis-cli -h 172.222.2.117 -p 6379 -a 123456
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.117:6379> SET key1 value1
OK
172.222.2.117:6379> KEYS *
1) "key1"

slave驗證數據

//Slave 172.222.2.12
[root@centos7 ~]# redis-cli -h 172.222.2.12 -p 6379 -a linux39
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.12:6379> KEYS *
1) "key1"
172.222.2.12:6379> GET key1
(error) MOVED 9189 172.222.2.117:6379

停止master並驗證故障轉移

Redis Master服務停止之後，其對應的slave會被選舉爲master繼續處理數據的讀寫操作。

//Master 172.222.2.117
systemctl stop redis.service

驗證slave 日誌

tail -f /apps/redis/logs/redis_6379.log #需要相應的數秒故障轉移時間

7625:S 15 Feb 00:17:14.624 * MASTER <-> SLAVE sync started
7625:S 15 Feb 00:17:14.624 # Error condition on socket for SYNC: Connection refused
7625:S 15 Feb 00:17:15.325 # Start of election delayed for 530 milliseconds (rank #0, offset 1066).
7625:S 15 Feb 00:17:15.426 # Currently unable to failover: Waiting the delay before I can start a new failover.
7625:S 15 Feb 00:17:15.626 * Connecting to MASTER 172.222.0.117:6379
7625:S 15 Feb 00:17:15.626 * MASTER <-> SLAVE sync started
7625:S 15 Feb 00:17:15.626 # Error condition on socket for SYNC: Connection refused
7625:S 15 Feb 00:17:15.927 # Starting a failover election for epoch 11.
7625:S 15 Feb 00:17:15.935 # Currently unable to failover: Waiting for votes, but majority still not reached.
7625:S 15 Feb 00:17:15.940 # Failover election won: I'm the new master. #成爲新的master
7625:S 15 Feb 00:17:15.940 # configEpoch set to 11 after successful failover
7625:M 15 Feb 00:17:15.940 # Setting secondary replication ID to 7f64150decace43211d248dd093cadc7e9a1ec49, valid up to offset: 1067. New replication ID is c063d71c97da69434dc56472af7a7d389d5dd00e
7625:M 15 Feb 00:17:15.941 * Discarding previously cached master state.
7625:M 15 Feb 00:17:15.941 # Cluster state changed: ok  #切換成功

注意

redis-cluser至少是3master和３slave，且一旦有某個master宕機此時，在集羣切換期間(slave提爲master)，集羣會被關閉，不能讀寫數據

30659:S 14 Feb 19:18:22.242 * FAIL message received from  6a07dfc8eeecd8b1b378b4beec3f17536bba77d3 about 774217998cb20e0e2092a2db4279faef49bdacbc
30659:S 14 Feb 19:18:22.242 # Cluster state changed: fail #集羣已關閉

驗證數據讀寫

確認slave 172.222.2.12:6379切換爲master之後可以繼續爲業務提供讀寫業務且數據沒有丟失。

注：服務恢復之後重新驗證各Master的Slave同步是否異常,重啓源master節點後，會自動加入到cluster中且狀態爲slave。

//Master 172.222.2.12
[root@centos7 ~]# redis-cli -h 172.222.2.12 -p 6379 -a 123456
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.117:6379> KEYS *
1) "key1"
172.222.2.117:6379> GET kye1 #默認slave是不能GET數據的，所以此時slave提升成功
“valuel”

集羣維護之導入現有Redis數據

導入數據需要redis cluster不能與被導入的數據有重複的key名稱，否則導入不成功或中斷

基礎環境準備

導入數據之前需要關閉各redis 服務器的密碼，包括集羣中的各node和源Redis server，避免認證帶來的環境不一致從而無法導入，可以加參數–cluster-replace 強制替換Redis cluster已有的key。

關閉各Redis密碼認證

# redis-cli -h master1 -p 6379 -a 123456
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.117:6379> CONFIG SET requirepass ""
OK
----------------------------------------------
# redis-cli -h master2 -p 6379 -a 123456
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.127:6379> CONFIG SET requirepass ""
OK
----------------------------------------------
# redis-cli -h master3 -p 6379 -a 123456
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.107:6379> CONFIG SET requirepass ""
OK
----------------------------------------------
# redis-cli -h slave1 -p 6379 -a 123456
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.110:6379> CONFIG SET requirepass ""
OK
----------------------------------------------
# redis-cli -h slave2 -p 6379 -a 123456
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.11:6379> CONFIG SET requirepass ""
OK
----------------------------------------------
# redis-cli -h slave3 -p 6379 -a 123456bash
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.12:6379> CONFIG SET requirepass ""
OK
----------------------------------------------

執行數據導入

將源Redis server的數據直接導入之redis cluster。

redis 4
#redis-trib.rb import --from cluster外部主機IP:6379 --replace 集羣熱任意節點IP:6379

錯誤1：
# redis-trib.rb import --from 172.222.2.99:6379 --replace 172.222.2.117:6379
>>> Importing data from 172.222.2.99:6379 to cluster  #連接不到本地節點
[ERR] Sorry, can't connect to node 172.222.2.117:6379

#修改
# vim /usr/local/lib/ruby/gems/2.5.0/gems/redis-4.1.3/lib/redis/client.rb
:password => nil, #設置爲nil

錯誤2:
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Connecting to the source Redis instance
*** Importing 4 keys from DB 0
Migrating mem3 to 172.17.0.3:6379: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)
Migrating mem1 to 172.17.0.6:6379: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)
Migrating mem2 to 172.17.0.3:6379: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)
Migrating mem to 172.17.0.6:6379: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)

#這是因爲ruby的gem不同造成的。以後5.0版本會拋棄redis-trib.rb。直接使用redis-cli客戶端實現集羣管理。在此之前，大家可以先安裝本文的解決方案進行處理。期待拋棄了redis-trib.rb的Redis-5.0。

詳見https://blog.csdn.net/m0_37128231/article/details/80755478

redis 5
#redis-cli --cluster import 集羣服務器IP:PORT --cluster-from 外部Redis node-IP:PORT --cluster-copy --cluster-replace
--------------------------------------

redis擴展集羣方案

除了Redis 官方自帶的Redis cluster集羣之外，還有一些開源的集羣解決方案可供參考使用。

codis

Codis 是一個分佈式 Redis 解決方案, 對於上層的應用來說, 連接到 Codis Proxy 和連接原生的 Redis Server 沒有顯著區別 (令不支持的命列表), 上層應用可以像使用單機的 Redis 一樣使用, Codis 底層會處理請求的轉發, 不停機的數據遷移等工作, 所有後邊的一切事情, 對於前面的客戶端來說是透明的, 可以簡單的認爲後邊連接的是一個內存無限大的 Redis 服務

codis-proxy相當於redis，即連接codis-proxy和連接redis是沒有任何區別的，codis-proxy無狀態，不負責記錄是
否在哪保存，數據在zookeeper記錄，即codis proxy向zookeeper查詢key的記錄位置，proxy 將請求轉發到一個
組進行處理，一個組裏面有一個master和一個或者多個slave組成，默認有1024個槽位，redis cluster 默認有
16384個槽位，其把不同的槽位的內容放在不同的group。

Github 地址：https://github.com/CodisLabs/codis/

twemproxy

由Twemproxy雙向代理客戶端實現分片，即代替用戶將數據分片併到不同的後端服務器進行讀寫，其還支持
memcached，可以爲proxy配置算法，缺點爲twemproxy是瓶頸，不支持數據遷移。

Github 地址：https://github.com/twitter/twemproxy

goo flush. org

發佈了56 篇原創文章 · 獲贊 11 · 訪問量 3029

私信關注

Redis cluster

文章目錄