高可用的Redis主從複製集羣，從理論到實踐

前言

我們都知道，服務如果只部署一個節點，很容易出現單點故障，從而導致服務不可用。所以服務做成高可用是很有必要的，Redis服務自然也是這樣。本文主要從理論和實踐兩方面講解Redis高可用集羣。手把手教你搭建一個高可用的redis主從複製集羣。

本文采取理論和實踐穿插講解的方式，如果只關心集羣的搭建，可以跳過文中理論部分。

前置閱讀

Redis持久化：Redis：持久化

實驗環境

VMware Workstation 15
CentOS Linux release 7.7.1908
Redis-5.0.8

注意事項

三個節點ip分別爲192.168.1.101、192.168.1.102、192.168.1.103
確保三個節點都能訪問互聯網，並且三個節點能夠相互通信
確保Linux的yum、wget、tar、gcc等基礎命令、編譯器可用

建議先關閉防火牆，Centos 7操作如下

firewall-cmd --state ## 查看防火牆狀態 not running表示已經關閉
systemctl stop firewalld.service ## 關閉防火牆
systemctl disable firewalld.service ## 禁止開機啓動防火牆

redis單機安裝

下載

wget http://download.redis.io/releases/redis-5.0.8.tar.gz

解壓
```
tar -zxvf redis-5.0.8.tar.gz
```
編譯
```
cd redis-5.0.8
make
```

安裝

make install  ## 或者指定安裝目錄 make install PREFIX=指定路徑。默認路徑是/usr/local/bin
./utils/install_server.sh  ## 安裝成服務，如果上一步配置了PREFIX，需要把安裝路徑配置到環境變量/etc/profile

install_server.sh是redis提供的腳本，運行之後會讓你指定幾個配置：端口號、配置文件路徑、日誌文件路徑、數據文件路徑。

如果都設置成默認值，redis根據按照端口號來區分同一臺主機上的不同的實例，因爲install_server.sh可以多次運行，每次運行相當於安裝了一個實例。

安裝過程如果都是默認安裝，會有以下幾個配置：

端口號：6379
配置文件路徑：/etc/redis/6379.conf
日誌文件路徑：/var/log/redis_6379.log
數據文件路徑：/var/lib/redis/6379/
redis-server.sh路徑：/usr/local/bin/
redis-cli.sh路徑：/usr/local/bin/

安裝成功會出現如下日誌

Copied /tmp/6379.conf => /etc/init.d/redis_6379
Installing service...
Successfully added to chkconfig!
Successfully added to runlevels 345!
Starting Redis server...
Installation successful!

可以看到redis服務已經自動啓動。

主從複製

Redis主從複製是redis3.0之後自帶的一種集羣實現方式，不需要其他的中間件。是一種基於異步複製的主從實現方式。所以Redis主從複製並不能保證數據的強一致性.。集羣在特定的條件下可能會丟失寫操作。

集羣結構

現在來搭建一個一主兩從的集羣，集羣拓撲圖如下

其中master節點可寫可讀，一般用來處理寫請求，slave節點默認情況下是只讀的，所以用來處理讀請求。兩個slave節點的數據都是從master節點複製過去的。所以這種集羣也叫讀寫分離。

配置

redis配置文件默認路徑爲/etc/redis/6379.conf，用vi/vim打開，三個節點都配置如下內容

## 需要綁定的ip地址
bind 127.0.0.1 192.168.1.101 192.168.1.102 192.168.1.103

## 關閉後臺運行，便於觀察
daemonize no

## 註釋日誌路徑，讓日誌直接輸出在控制檯，便於觀察
# logfile /var/log/redis_6379.log

## 關閉AOF持久化模式
appendonly no

啓動

配置完成後分別啓動三個節點

cd /usr/local/bin
redis-server /etc/redis/6379.conf

設置主從關係

兩個slave節點用redis-cli客戶端連接redis-server後，均執行如下命令，把自己設置成master節點的slave

replicaof 192.168.1.101 6379

replicaof也可以直接寫在配置文件中（文中爲了實驗效果，以命令的方式執行）

################################# REPLICATION #################################

# Master-Replica replication. Use replicaof to make a Redis instance a copy of
# another Redis server. A few things to understand ASAP about Redis replication.
#
#   +------------------+      +---------------+
#   |      Master      | ---> |    Replica    |
#   | (receive writes) |      |  (exact copy) |
#   +------------------+      +---------------+
#
# 1) Redis replication is asynchronous, but you can configure a master to
#    stop accepting writes if it appears to be not connected with at least
#    a given number of replicas.
# 2) Redis replicas are able to perform a partial resynchronization with the
#    master if the replication link is lost for a relatively small amount of
#    time. You may want to configure the replication backlog size (see the next
#    sections of this file) with a sensible value depending on your needs.
# 3) Replication is automatic and does not need user intervention. After a
#    network partition replicas automatically try to reconnect to masters
#    and resynchronize with them.
#
# replicaof <masterip> <masterport>

replicaof在redis5.0之前的版本叫slaveof，命令描述如下

127.0.0.1:6379> help slaveof

  SLAVEOF host port
  summary: Make the server a replica of another instance, or promote it as master. Deprecated starting with Redis 5. Use REPLICAOF instead.
  since: 1.0.0
  group: server

127.0.0.1:6379> help replicaof

  REPLICAOF host port
  summary: Make the server a replica of another instance, or promote it as master.
  since: 5.0.0
  group: server

命令成功執行後192.168.1.101（master）會出現如下日誌

1817:M 16 Apr 2020 22:33:36.802 * Replica 192.168.1.102:6379 asks for synchronization
1817:M 16 Apr 2020 22:33:36.802 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'e801c600a0a2381a65e1aec22daba7db82cb02f8', my replication IDs are 'be75572b8e6624da4971aa16448600c9822fd42a' and '0000000000000000000000000000000000000000')
1817:M 16 Apr 2020 22:33:36.803 * Starting BGSAVE for SYNC with target: disk
1817:M 16 Apr 2020 22:33:36.837 * Background saving started by pid 1822
1822:C 16 Apr 2020 22:33:36.944 * DB saved on disk
1822:C 16 Apr 2020 22:33:36.944 * RDB: 6 MB of memory used by copy-on-write
1817:M 16 Apr 2020 22:33:37.038 * Background saving terminated with success
1817:M 16 Apr 2020 22:33:37.038 * Synchronization with replica 192.168.1.102:6379 succeeded

我們逐行看一下192.168.1.101（master）做了哪些事。

第一行意思是有一個salve節點192.168.1.102:6379請求同步
第二行意思是會進行全量同步，因爲是第一次請求同步
第三行意思是開始執行BGSAVE把數據持久化到磁盤
第四行意思是pid爲1822子進程開始執行持久化
第五行意思是持久化完成
第六行意思是copy-on-write機制使用了6M內存

最後兩行表示同步過程已經完成。master節點是把數據以RDB的形式持久化到磁盤，再通過網絡發送給slave。參數repl-diskless-sync設置成no的話，表示數據不經過磁盤，直接發送給slave。

看了192.168.1.101（master）的日誌，再來看salve的日誌，任取一個slave的日誌

2013:S 16 Apr 2020 22:33:36.233 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
2013:S 16 Apr 2020 22:33:36.233 * REPLICAOF 192.168.1.101:6379 enabled (user request from 'id=3 addr=127.0.0.1:33550 fd=8 name= age=4 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=49 qbuf-free=32719 obl=0 oll=0 omem=0 events=r cmd=replicaof')
2013:S 16 Apr 2020 22:33:36.808 * Connecting to MASTER 192.168.1.101:6379
2013:S 16 Apr 2020 22:33:36.808 * MASTER <-> REPLICA sync started
2013:S 16 Apr 2020 22:33:36.809 * Non blocking connect for SYNC fired the event.
2013:S 16 Apr 2020 22:33:36.810 * Master replied to PING, replication can continue...
2013:S 16 Apr 2020 22:33:36.811 * Trying a partial resynchronization (request e801c600a0a2381a65e1aec22daba7db82cb02f8:1).
2013:S 16 Apr 2020 22:33:36.946 * Full resync from master: a9861cdcfdb3358ea0a3bb5a4df2895938c1c2d0:0
2013:S 16 Apr 2020 22:33:36.946 * Discarding previously cached master state.
2013:S 16 Apr 2020 22:33:37.048 * MASTER <-> REPLICA sync: receiving 175 bytes from master
2013:S 16 Apr 2020 22:33:37.048 * MASTER <-> REPLICA sync: Flushing old data
2013:S 16 Apr 2020 22:33:37.048 * MASTER <-> REPLICA sync: Loading DB in memory
2013:S 16 Apr 2020 22:33:37.048 * MASTER <-> REPLICA sync: Finished with success

salve節點日誌較多，告訴我們具體做了這些事

向192.168.1.101:6379(master)請求同步
發送指令SYNC
收到master的回覆
全量同步，收到了175 bytes
清空自身的數據(Flushing old data)
加載master傳送的數據到內存(Loading DB in memory)

結合master和slave日誌，可以看出複製的大致過程。

完整的主從複製的過程如下

master收到某個slave第一次請求的同步時，會進行全量同步，在同步期間會把執行過的修改數據的命令寫入緩存，等同步完成後，再發送給slave節點執行。第一次全量同步完成後，master會持續給slave節點發送寫命令，以保證主從節點數據一致性。

在這裏可以思考一個問題，slave節點在全量同步的這段時間中，裏面的數據能不能被客戶端查詢呢？
replicaof-server-stale-data參數設置成yes表示可以查，設置成no表示同步必須完成才能查。

操作

先往master節點寫入數據

192.168.1.101:6379> set key1 hello
OK

再從slave節點獲取（注意提示符中的ip地址），毫無疑問是可以獲取的

192.168.1.102:6379> get key1
"hello"

如果往slave節點寫入數據會怎樣？

默認情況下slave節點禁止寫入，所以會報錯。

192.168.1.102:6379> set key2 world
(error) READONLY You can't write against a read only replica.

replica-read-only參數可以設置slave允許寫入

# You can configure a replica instance to accept writes or not. Writing against
# a replica instance may be useful to store some ephemeral data (because data
# written on a replica will be easily deleted after resync with the master) but
# may also cause problems if clients are writing to it because of a
# misconfiguration.
#
# Since Redis 2.6 by default replicas are read-only.
#
# Note: read only replicas are not designed to be exposed to untrusted clients
# on the internet. It's just a protection layer against misuse of the instance.
# Still a read only replica exports by default all the administrative commands
# such as CONFIG, DEBUG, and so forth. To a limited extent you can improve
# security of read only replicas using 'rename-command' to shadow all the
# administrative / dangerous commands.
replica-read-only yes

至此，最簡單的主從複製集羣已經搭建完成。

故障

你已經是一個成熟的程序員了，應該要學會面向故障編程。

在這個集羣中有三個節點，兩種角色。salve可能會掛，master也可能會掛。我們先看下salve節點掛了會怎樣。

slave故障

首先讓一臺slave宕機，由於配置了2個slave節點，所以一個出了故障，不至於整個服務不可用。只要儘快處理故障，恢復slave即可，實驗步驟如下。

時間	192.168.1.101(`master`)	192.168.1.102(`slave`)
T1	-	宕機
T2	寫入數據	-
T3	-	重啓

現在重啓出故障的slave節點

/usr/local/bin/redis-server /etc/redis/6379.conf --replicaof 192.168.1.101 6379

觀察master，會打印如下日誌信息

2168:M 17 Apr 2020 13:38:16.282 * Replica 192.168.1.102:6379 asks for synchronization
2168:M 17 Apr 2020 13:38:16.282 * Partial resynchronization request from 192.168.1.102:6379 accepted. Sending 143 bytes of backlog starting from offset 1473.

可以看到只打印了2行日誌。表示收到了192.168.1.102:6379（slave）節點的同步請求，並且接受同步，從偏移（offset）1473開始傳輸，共傳輸了143 bytes。這意味着slave的重新連接，並沒有觸發全量同步，而是增量同步。同步的數據只是故障期間在master寫入的那部分數據。

上面的操作是沒有開啓AOF的情況，如果開啓AOF，情況又不一樣。下面來操作開啓AOF的情況，操作步驟和上面一樣，區別僅僅是slave節點重啓時開啓AOF

/usr/local/bin/redis-server /etc/redis/6379.conf --replicaof 192.168.1.101 6379 --appendonly yes

觀察master節點，可以看到如下日誌

2168:M 17 Apr 2020 13:45:21.977 * Replica 192.168.1.102:6379 asks for synchronization
2168:M 17 Apr 2020 13:45:21.977 * Full resync requested by replica 192.168.1.102:6379
2168:M 17 Apr 2020 13:45:21.977 * Starting BGSAVE for SYNC with target: disk
2168:M 17 Apr 2020 13:45:21.978 * Background saving started by pid 2306
2306:C 17 Apr 2020 13:45:22.009 * DB saved on disk
2306:C 17 Apr 2020 13:45:22.010 * RDB: 8 MB of memory used by copy-on-write
2168:M 17 Apr 2020 13:45:22.111 * Background saving terminated with success
2168:M 17 Apr 2020 13:45:22.111 * Synchronization with replica 192.168.1.102:6379 succeeded

根據日誌可以看出，slave節點重啓時如果開啓了AOF，會觸發全量同步。即使整個實驗一開始就把所以節點都開啓AOF，這裏也會觸發全量同步。

下面是slave日誌，也可以證明觸發了全量同步。

2598:S 17 Apr 2020 13:45:21.967 * Ready to accept connections
2598:S 17 Apr 2020 13:45:21.968 * Connecting to MASTER 192.168.1.101:6379
2598:S 17 Apr 2020 13:45:21.968 * MASTER <-> REPLICA sync started
2598:S 17 Apr 2020 13:45:21.969 * Non blocking connect for SYNC fired the event.
2598:S 17 Apr 2020 13:45:21.971 * Master replied to PING, replication can continue...
2598:S 17 Apr 2020 13:45:21.973 * Partial resynchronization not possible (no cached master)
2598:S 17 Apr 2020 13:45:21.977 * Full resync from master: 8b57ea32e3bada6e91d3f371123cb693df2eec8b:2235
2598:S 17 Apr 2020 13:45:22.107 * MASTER <-> REPLICA sync: receiving 271 bytes from master
2598:S 17 Apr 2020 13:45:22.108 * MASTER <-> REPLICA sync: Flushing old data
2598:S 17 Apr 2020 13:45:22.122 * MASTER <-> REPLICA sync: Loading DB in memory
2598:S 17 Apr 2020 13:45:22.122 * MASTER <-> REPLICA sync: Finished with success
2598:S 17 Apr 2020 13:45:22.125 * Background append only file rewriting started by pid 2602
2598:S 17 Apr 2020 13:45:22.178 * AOF rewrite child asks to stop sending diffs.
2602:C 17 Apr 2020 13:45:22.179 * Parent agreed to stop sending diffs. Finalizing AOF...
2602:C 17 Apr 2020 13:45:22.179 * Concatenating 0.00 MB of AOF diff received from parent.
2602:C 17 Apr 2020 13:45:22.179 * SYNC append only file rewrite performed
2602:C 17 Apr 2020 13:45:22.180 * AOF rewrite: 4 MB of memory used by copy-on-write
2598:S 17 Apr 2020 13:45:22.274 * Background AOF rewrite terminated with success
2598:S 17 Apr 2020 13:45:22.274 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
2598:S 17 Apr 2020 13:45:22.275 * Background AOF rewrite finished successfully

master故障

由於在這個集羣中，master節點只有一個，萬一宕機了，整個服務就無法寫入數據了，相當於服務不可用。這個時候救世主就出現了。哦，不，是哨兵（Sentinel）出現了。

Redis Sentinel（哨兵）是Redis官方的高可用性解決方案，用於管理多個 Redis 服務器（instance），哨兵的作用主要有三個：

監控（Monitoring）：Sentinel 會不斷地檢查你的主服務器和從服務器是否運作正常。
提醒（Notification）：當被監控的某個 Redis 服務器出現問題時， Sentinel 可以通過 API 向管理員或者其他應用程序發送通知。
自動故障遷移（Automatic failover）：當一個主服務器不能正常工作時，Sentinel 會開始一次自動故障遷移操作，它會將失效主服務器（master）的其中一個從服務器（slave）升級爲新的主服務器（master），並讓失效主服務器的其他從服務器改爲複製新的主服務器；當客戶端試圖連接失效的主服務器時，集羣也會向客戶端返回新主服務器的地址，使得集羣可以使用新主服務器代替失效服務器。

如果單單只是一個哨兵實例來監控集羣，那哨兵必定也存在單點故障的問題，所以需要多個哨兵實例。加入哨兵後的集羣結構如下

26379是sentinel的默認端口，三個哨兵分別放在三個節點上。

哨兵

redis安裝包的解壓目錄下會有一個sentinel.conf文件，這就是哨兵的配置文件，爲了方便，把它拷貝到和redis配置文件相同的目錄

## 拷貝哨兵配置文件
cp sentinel.conf /etc/redis/
## 配置哨兵的配置文件
vim /etc/redis/sentinel.conf

需要改的地方只有一個，就是指定哨兵要監控哪個master，因爲master是可以知道有哪些slave節點連接了自己，所以監控master就夠了。注意三個sentinel節點都是配置master的ip和端口

# sentinel monitor <master-name> <ip> <redis-port> <quorum>
#
# Tells Sentinel to monitor this master, and to consider it in O_DOWN
# (Objectively Down) state only if at least <quorum> sentinels agree.
#
# Note that whatever is the ODOWN quorum, a Sentinel will require to
# be elected by the majority of the known Sentinels in order to
# start a failover, so no failover can be performed in minority.
#
# Replicas are auto-discovered, so you don't need to specify replicas in
# any way. Sentinel itself will rewrite this configuration file adding
# the replicas using additional configuration options.
# Also note that the configuration file is rewritten when a
# replica is promoted to master.
#
# Note: master name should not include special characters or spaces.
# The valid charset is A-z 0-9 and the three characters ".-_".
sentinel monitor mymaster 192.168.1.101 6379 2

配置指示 Sentinel 去監視一個名爲mymaster的主服務器，這個主服務器的 IP 地址爲 192.168.1.101 ，端口號爲 6379 。後面那個2表示這個主服務器判斷爲失效至少需要 2 個 Sentinel 同意（只要同意 Sentinel 的數量不達標，自動故障遷移就不會執行）。

不過要注意，無論設置要多少個 Sentinel 同意才能判斷一個服務器失效，一個 Sentinel 都需要獲得系統中多數（majority） Sentinel 的支持，才能發起一次自動故障遷移

正是爲了更好的區分多數和少數，所以一般使用奇數個sentinel實例來監控集羣。

配置文件修改完成後，開始啓動三個哨兵，哨兵的啓動有兩種方式：直接運行redis-sentinel、運行redis-server --sentinel

redis-server /etc/redis/sentinel.conf --sentinel

第一個哨兵啓動日誌如下

2873:X 17 Apr 2020 20:56:54.495 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2873:X 17 Apr 2020 20:56:54.498 # Sentinel ID is 643817dcf5ba6d53a737782a75706a62df869e33
2873:X 17 Apr 2020 20:56:54.498 # +monitor master mymaster 192.168.1.101 6379 quorum 2
2873:X 17 Apr 2020 20:56:54.500 * +slave slave 192.168.1.102:6379 192.168.1.102 6379 @ mymaster 192.168.1.101 6379
2873:X 17 Apr 2020 20:56:54.503 * +slave slave 192.168.1.103:6379 192.168.1.103 6379 @ mymaster 192.168.1.101 6379

可以看到哨兵打印出了自己的ID，還監控了192.168.1.101 6379（master）和兩個slave節點

3031:X 17 Apr 2020 20:59:59.153 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
3031:X 17 Apr 2020 20:59:59.158 # Sentinel ID is e784d728f7a813de688ea800a88bda6aca0512ff
3031:X 17 Apr 2020 20:59:59.158 # +monitor master mymaster 192.168.1.101 6379 quorum 2
3031:X 17 Apr 2020 20:59:59.164 * +slave slave 192.168.1.102:6379 192.168.1.102 6379 @ mymaster 192.168.1.101 6379
3031:X 17 Apr 2020 20:59:59.166 * +slave slave 192.168.1.103:6379 192.168.1.103 6379 @ mymaster 192.168.1.101 6379
3031:X 17 Apr 2020 21:00:00.115 * +sentinel sentinel 643817dcf5ba6d53a737782a75706a62df869e33 192.168.1.101 26379 @ mymaster 192.168.1.101 6379

啓動第二個哨兵時，也打印了同樣的日誌。除此之外，還多打印了一行關於sentinel的日誌。可以看出打印出的sentinel的ID就是第一個哨兵的。也就是說哨兵在監控master的時候，除了可以發下slave節點，還可以發現監控master節點的其他哨兵。回頭再看第一個哨兵的日誌，也會多打印一行，就是第二個哨兵的ID。

三個哨兵已經準備就緒，接下來再讓master宕機。

master宕機30秒後，Sentinel 認爲服務器已經宕機，由參數sentinel down-after-milliseconds指定

# sentinel down-after-milliseconds <master-name> <milliseconds>
#
# Number of milliseconds the master (or any attached replica or sentinel) should
# be unreachable (as in, not acceptable reply to PING, continuously, for the
# specified period) in order to consider it in S_DOWN state (Subjectively
# Down).
#
# Default is 30 seconds.

超過半數的Sentinel感知到master宕機後會進行投票選舉，從剩下的兩個slave中選出一個master。三個哨兵日誌分別如下

2873:X 17 Apr 2020 21:02:57.687 # +sdown master mymaster 192.168.1.101 6379
2873:X 17 Apr 2020 21:02:57.765 # +new-epoch 1
2873:X 17 Apr 2020 21:02:57.766 # +vote-for-leader a32bc56146695d9ebcbceaff2b0b8a5339c61a5b 1
2873:X 17 Apr 2020 21:02:58.326 # +config-update-from sentinel a32bc56146695d9ebcbceaff2b0b8a5339c61a5b 192.168.1.103 26379 @ mymaster 192.168.1.101 6379
2873:X 17 Apr 2020 21:02:58.326 # +switch-master mymaster 192.168.1.101 6379 192.168.1.103 6379
2873:X 17 Apr 2020 21:02:58.327 * +slave slave 192.168.1.102:6379 192.168.1.102 6379 @ mymaster 192.168.1.103 6379
2873:X 17 Apr 2020 21:02:58.327 * +slave slave 192.168.1.101:6379 192.168.1.101 6379 @ mymaster 192.168.1.103 6379
2873:X 17 Apr 2020 21:03:28.343 # +sdown slave 192.168.1.101:6379 192.168.1.101 6379 @ mymaster 192.168.1.103 6379

3031:X 17 Apr 2020 21:02:57.686 # +sdown master mymaster 192.168.1.101 6379
3031:X 17 Apr 2020 21:02:57.743 # +new-epoch 1
3031:X 17 Apr 2020 21:02:57.745 # +vote-for-leader a32bc56146695d9ebcbceaff2b0b8a5339c61a5b 1
3031:X 17 Apr 2020 21:02:57.776 # +odown master mymaster 192.168.1.101 6379 #quorum 3/2
3031:X 17 Apr 2020 21:02:57.776 # Next failover delay: I will not start a failover before Fri Apr 17 21:08:57 2020
3031:X 17 Apr 2020 21:02:58.308 # +config-update-from sentinel a32bc56146695d9ebcbceaff2b0b8a5339c61a5b 192.168.1.103 26379 @ mymaster 192.168.1.101 6379
3031:X 17 Apr 2020 21:02:58.308 # +switch-master mymaster 192.168.1.101 6379 192.168.1.103 6379
3031:X 17 Apr 2020 21:02:58.309 * +slave slave 192.168.1.102:6379 192.168.1.102 6379 @ mymaster 192.168.1.103 6379
3031:X 17 Apr 2020 21:02:58.309 * +slave slave 192.168.1.101:6379 192.168.1.101 6379 @ mymaster 192.168.1.103 6379
3031:X 17 Apr 2020 21:03:28.352 # +sdown slave 192.168.1.101:6379 192.168.1.101 6379 @ mymaster 192.168.1.103 6379

2833:X 17 Apr 2020 21:02:57.690 # +sdown master mymaster 192.168.1.101 6379
2833:X 17 Apr 2020 21:02:57.749 # +odown master mymaster 192.168.1.101 6379 #quorum 2/2
2833:X 17 Apr 2020 21:02:57.749 # +new-epoch 1
2833:X 17 Apr 2020 21:02:57.749 # +try-failover master mymaster 192.168.1.101 6379
2833:X 17 Apr 2020 21:02:57.750 # +vote-for-leader a32bc56146695d9ebcbceaff2b0b8a5339c61a5b 1
2833:X 17 Apr 2020 21:02:57.759 # 643817dcf5ba6d53a737782a75706a62df869e33 voted for a32bc56146695d9ebcbceaff2b0b8a5339c61a5b 1
2833:X 17 Apr 2020 21:02:57.759 # e784d728f7a813de688ea800a88bda6aca0512ff voted for a32bc56146695d9ebcbceaff2b0b8a5339c61a5b 1
2833:X 17 Apr 2020 21:02:57.841 # +elected-leader master mymaster 192.168.1.101 6379
2833:X 17 Apr 2020 21:02:57.841 # +failover-state-select-slave master mymaster 192.168.1.101 6379
2833:X 17 Apr 2020 21:02:57.924 # +selected-slave slave 192.168.1.103:6379 192.168.1.103 6379 @ mymaster 192.168.1.101 6379
2833:X 17 Apr 2020 21:02:57.925 * +failover-state-send-slaveof-noone slave 192.168.1.103:6379 192.168.1.103 6379 @ mymaster 192.168.1.101 6379
2833:X 17 Apr 2020 21:02:58.001 * +failover-state-wait-promotion slave 192.168.1.103:6379 192.168.1.103 6379 @ mymaster 192.168.1.101 6379
2833:X 17 Apr 2020 21:02:58.266 # +promoted-slave slave 192.168.1.103:6379 192.168.1.103 6379 @ mymaster 192.168.1.101 6379
2833:X 17 Apr 2020 21:02:58.266 # +failover-state-reconf-slaves master mymaster 192.168.1.101 6379
2833:X 17 Apr 2020 21:02:58.317 * +slave-reconf-sent slave 192.168.1.102:6379 192.168.1.102 6379 @ mymaster 192.168.1.101 6379
2833:X 17 Apr 2020 21:02:58.817 # -odown master mymaster 192.168.1.101 6379
2833:X 17 Apr 2020 21:02:59.292 * +slave-reconf-inprog slave 192.168.1.102:6379 192.168.1.102 6379 @ mymaster 192.168.1.101 6379
2833:X 17 Apr 2020 21:02:59.292 * +slave-reconf-done slave 192.168.1.102:6379 192.168.1.102 6379 @ mymaster 192.168.1.101 6379
2833:X 17 Apr 2020 21:02:59.347 # +failover-end master mymaster 192.168.1.101 6379
2833:X 17 Apr 2020 21:02:59.347 # +switch-master mymaster 192.168.1.101 6379 192.168.1.103 6379
2833:X 17 Apr 2020 21:02:59.347 * +slave slave 192.168.1.102:6379 192.168.1.102 6379 @ mymaster 192.168.1.103 6379
2833:X 17 Apr 2020 21:02:59.347 * +slave slave 192.168.1.101:6379 192.168.1.101 6379 @ mymaster 192.168.1.103 6379
2833:X 17 Apr 2020 21:03:29.355 # +sdown slave 192.168.1.101:6379 192.168.1.101 6379 @ mymaster 192.168.1.103 6379

從日誌可以看到大致的過程

三個sentinel都發下master宕機了，把它的狀態設置成odown
開啓一輪投票，選出了新的master爲192.168.1.103:6379
sentinel更新配置文件
192.168.1.103:6379成爲新的master，故障轉移完成

從最後幾行日誌可以看出，現在的master是192.168.1.103 6379，而slave是192.168.1.102:6379和192.168.1.101:6379，並且192.168.1.101:6379是sdown狀態的slave。

Redis 的 Sentinel 中關於下線（down）有兩個不同的概念：

主觀下線（Subjectively Down，簡稱 SDOWN）指的是單個 Sentinel 實例對服務器做出的下線判斷。
客觀下線（Objectively Down，簡稱 ODOWN）指的是多個 Sentinel 實例在對同一個服務器做出 SDOWN 判斷，並且通過 SENTINEL is-master-down-by-addr 命令互相交流之後，得出的服務器下線判斷。（一個 Sentinel 可以通過向另一個 Sentinel 發送 SENTINEL is-master-down-by-addr 命令來詢問對方是否認爲給定的服務器已下線。）

查看哨兵配置文件，發現哨兵監控的節點已經成新的master了

[root@localhost redis-5.0.8]> cat /etc/redis/sentinel.conf |grep "sentinel monitor mymaster"
sentinel monitor mymaster 192.168.1.103 6379 2

之前的master出故障了，但是現在故障修復了，準備重啓。重新啓動原來的192.168.1.101（master），它會心甘情願的成爲slave，還是搶回master地位呢？

欲知後事如何，請聽下回分解。

看下哨兵日誌就知道，哨兵會打印如下日誌

3031:X 17 Apr 2020 21:05:32.297 * +convert-to-slave slave 192.168.1.101:6379 192.168.1.101 6379 @ mymaster 192.168.1.103 6379

就是把192.168.1.101:6379變成可用的slave，所以即使原來的master重啓了，也不會去搶回master地位。

至此，基於哨兵的高可用redis集羣纔算搭建完成。

補充

這裏再把主從複製相關的理論總結一下。slave節點第一次追隨master的時候，會發送sync請求同步。請求同步在Redis2.8之後由psync [runId] [offset]命令完成，psync命令既支持全量複製，也支持增量複製。Redis4.0之後，psync再一次進行了優化。

runId：是每個redis節點啓動都會生成的唯一標識，每次redis重啓後，runId也會發生變化
offset：是複製的偏移量，master和slave都會記錄自己和對方的複製偏移量，如果不一致，表示需要繼續同步

除此之外master節點還會維護一個緩衝隊列（replication backlog buffer，複製積壓緩衝區默認大小1M，參數repl-backing-size設置），當slave正在複製master時，如果出現網絡異常導致命令丟失時。slave會向master要求重新發送丟失的命令數據，如果master的複製積壓緩衝區內存將這部分數據則直接發送給slave，這樣就可以保持主從節點複製的一致性。

然而redis2.8版本的psync還有兩個問題無法解決：redis重啓時觸發全量複製、故障切換之後，slave追隨新的master觸發全量同步。

這兩個問題在redis4.0版本的psync得到了解決。主要通過兩個複製id（master_replid和master_replid2）來實現

這些信息都可以通過info replication命令來查詢
這是master節點的信息

192.168.1.103:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.1.101,port=6379,state=online,offset=98,lag=0
slave1:ip=192.168.1.102,port=6379,state=online,offset=98,lag=0
master_replid:8b1d6db7a9e63c0360ffed0ec6d3a51199f08f2b
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:98
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:98

這是slave節點的信息

192.168.1.101:6379> info replication
# Replication
role:slave
master_host:192.168.1.103
master_port:6379
master_link_status:up
master_last_io_seconds_ago:3
master_sync_in_progress:0
slave_repl_offset:5334
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:8b1d6db7a9e63c0360ffed0ec6d3a51199f08f2b
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:5334
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:5334

總結

本文以穿插的方式講解了redis主從複製的實踐和部分原理，可能會導致看起來略顯凌亂。之所以採用穿插的方式，是爲了讓讀者把理論和實踐關聯起來，形成一個完整的知識體系，而不僅僅是零碎的知識點。

只關心實驗的旁友可以先跳過文中理論部分，並不會影響實驗效果。

參考

http://redis.cn/topics/sentinel.html

高可用的Redis主從複製集羣，從理論到實踐

前言

前置閱讀

實驗環境

注意事項

redis單機安裝

主從複製

集羣結構

配置

啓動

設置主從關係

操作

故障

slave故障

master故障

哨兵

補充

總結

參考

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

GPT-4o 引領人機交互新風向，向量數據庫賽道沸騰了

free AI online tools All In One

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（二）使用kube-vip實現集羣VIP訪問

企業大模型如何成爲自己數據的“百科全書”？

本地SSL證書過期輸入命令在IIS自動生成

.NET週刊【5月第2期 2024-05-12】

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（一）部署K8s

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（三）數據卷掛載NFS（網絡文件系統）

不知道要學什麼？不知道怎麼學？或許你應該看看這篇

面試必問的AQS（AbstractQueuedSynchronizer），一文全搞定

零散的MySQL基礎總是記不住？看這一篇如何拯救你

Redis：數據類型

強人鎖男，MySQL到底有多少鎖？

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結