Redis排錯經歷:MISCONF Redis is configured to save RDB snapshots

MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.

 

Redis集羣模式部署,在使用很長一段時間後,所有的寫操作都報錯,導致系統無法使用。

緊急解決辦法:服務器上,使用 redis-cli連接到集羣命令行,執行如下指令:

config set stop-writes-on-bgsave-error no

或者修改配置:redis.conf (修改配置後需要重啓)

stop-writes-on-bgsave-error no

默認配置是 yes,修改成 no,可以臨時解決該報錯,快速讓業務恢復。

解決思路:

1、報錯原因:
因爲Redis在將數據通過RDB模式持久化到硬盤時報錯(報錯原因後面會分析),
在配置爲stop-writes-on-bgsave-error yes 時,Redis爲了保證現有Redis中的數據安全,
將拒絕Redis的數據繼續寫入,所以會對所有寫入的請求報錯,從而阻止數據的繼續提交。

2、解決辦法:
通過將配置調整成 stop-writes-on-bgsave-error no  從而關閉Redis的該安全機制,
從而達到暫時覺得Redis不可用的情況,來快速回復業務服務。

 

問題終極解決方案:

既然我們都知道出問題的根本原因在於RDB模式持久化失敗,那怎麼解決這個失敗的問題呢?


Redis background saving schema relies on the copy-on-write semantic of fork in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will share the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can't tell in advance how much memory the child will take, so if the overcommit_memory setting is set to zero fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages, with the result that if you have a Redis dataset of 3 GB and just 2 GB of free memory it will fail.

Setting overcommit_memory to 1 says Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.


以上內容是官方給出的解釋,具體原因就是:

這個寫失敗是因爲Redis在進行BGSAVE的時候失敗了(進行RDB模式持久化時失敗了)。Redis的BGSAVE持久化時,Redis會Fork一個子進程將數據保存到磁盤上。具體BGSAVE失敗的原因可以在Redis的日誌中查看(Redis默認是沒開啓日誌文件的,Redis在後臺運行的時候,控制檯日誌直接輸出到黑洞,下面會介紹如何開啓Redis的日誌)。但大部分BGAVE失敗,是因爲Fork的子進程無法分配到足夠的內存。很多時候,由於操作系統的優化衝突,Fork無法分配內存(儘管機器有足夠的RAM可用)。

所以Redis在進行持久化的時候無法分配到足夠的內存,所以報錯了,可以通過如下操作來調整系統參數來解決這個問題:

修改/etc/sysctl.conf並添加:

vm.overcommit_memory=1

在執行如下命令,使配置生效:

sudo sysctl -p /etc/sysctl.conf

這樣,我們好像就已經解決了這個問題,但是很遺憾的,問題也只是暫時解決了而已,其實我們還需要對Redis的相關配置進行優化,來使我們的Redis長期處於健康狀態。

優化1:配置日誌文件、日誌級別,以備以後出現的問題能通過查看日誌分析問題:

# verbose (many rarely useful info, but not a mess like the debug level)
loglevel notice
# 指定到項目組規定的日誌位置
logfile "/data/log/redis-6379.log"

優化2:對內存上線進行優化,以及配置適合的緩存回收策略。

如下命令所示,通過客戶端+Redis指令可以查看Redis的內存使用情況:

[root@localhost redis-5.0.3]# redis-cli -h 192.168.50.222 -p 6371 -c
192.168.50.222:6371> info
# Memory
used_memory:1073485064
used_memory_human:1023.76M
used_memory_rss:1127706624
used_memory_rss_human:1.05G
used_memory_peak:1077872280
used_memory_peak_human:1.00G
used_memory_peak_perc:99.59%
used_memory_overhead:2072046
used_memory_startup:1454768
used_memory_dataset:1071413018
used_memory_dataset_perc:99.94%
allocator_allocated:1073897952
allocator_active:1118851072
allocator_resident:1134903296
total_system_memory:8186183680
total_system_memory_human:7.62G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:1073741824
maxmemory_human:1.00G
maxmemory_policy:allkeys-lru
allocator_frag_ratio:1.04
allocator_frag_bytes:44953120
allocator_rss_ratio:1.01
allocator_rss_bytes:16052224
rss_overhead_ratio:0.99
rss_overhead_bytes:-7196672
mem_fragmentation_ratio:1.05
mem_fragmentation_bytes:54263816
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:49694
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0

實際生產環境,都是需要配置max memory的,一般不超過機器最大內存。但Redis是基於內存的,所以提前做好內存容量規劃是必要的,防止out of max memory。通常來講實際內存達到最大內存的3/4時就要考慮加大內存或者拆分數據了。 

備註:如上面的內存設置,是因爲我在測試環境中情況,不足以作爲參考值

# maxmemory <bytes> 單位bytes,根據自己的內存按比例分配
# 1M = 1024 * 1024 * 1
# 1G = 1024 * 1024 * 1024 * 1
maxmemory 1073741824

# volatile-lru -> Evict using approximated LRU among the keys with an expire set.
# allkeys-lru -> Evict any key using approximated LRU.
# volatile-lfu -> Evict using approximated LFU among the keys with an expire set.
# allkeys-lfu -> Evict any key using approximated LFU.
# volatile-random -> Remove a random key among the ones with an expire set.
# allkeys-random -> Remove a random key, any key.
# volatile-ttl -> Remove the key with the nearest expire time (minor TTL)
# noeviction -> Don't evict anything, just return an error on write operations.

maxmemory-policy volatile-lru

根據項目的實際使用場景,可以適當的選擇使用Redis的回收策略,畢竟Redis提供了8種回收策略,noeviction是默認策略。

我這裏使用的是volatile-lru,因爲我們項目允許長期不活躍的數據從Redis丟失。

 

優化3:數據持久化優化

################################ SNAPSHOTTING  ################################
#
# Save the DB on disk:
#
#   save <seconds> <changes>
#
#   Will save the DB if both the given number of seconds and the given
#   number of write operations against the DB occurred.
#
#   In the example below the behaviour will be to save:
#   after 900 sec (15 min) if at least 1 key changed
#   after 300 sec (5 min) if at least 10 keys changed
#   after 60 sec if at least 10000 keys changed
#
#   Note: you can disable saving completely by commenting out all "save" lines.
#
#   It is also possible to remove all the previously configured save
#   points by adding a save directive with a single empty string argument
#   like in the following example:
#
#   save ""
# 如果在【900】秒內存在【1】個數據變更,則進行一次rdb持久化
save 900 1
# 如果在【300】秒內存在【10】個數據變更,則進行一次rdb持久化
save 300 10
save 60 10000

根據項目的實際需要來進行調整,可以更適合各自項目的需要

############################## APPEND ONLY MODE ###############################

# By default Redis asynchronously dumps the dataset on disk. This mode is
# good enough in many applications, but an issue with the Redis process or
# a power outage may result into a few minutes of writes lost (depending on
# the configured save points).
#
# The Append Only File is an alternative persistence mode that provides
# much better durability. For instance using the default data fsync policy
# (see later in the config file) Redis can lose just one second of writes in a
# dramatic event like a server power outage, or a single write if something
# wrong with the Redis process itself happens, but the operating system is
# still running correctly.
#
# AOF and RDB persistence can be enabled at the same time without problems.
# If the AOF is enabled on startup Redis will load the AOF, that is the file
# with the better durability guarantees.
#
# Please check http://redis.io/topics/persistence for more information.

appendonly no

# The name of the append only file (default: "appendonly.aof")

appendfilename "appendonly.aof"

如上所示的配置是Redis開啓 AOF實時追加持久化配置開關,以及持久化文件配置,根據項目的實際使用情況,可以選擇開啓、關閉AOF。

 

優化4:注意!注意!注意!注意! Redis的重啓不要kill、kill、kill、kill、kill

[root@localhost redis-5.0.3]# redis-cli -h 192.168.50.222 -p 6371 -c
192.168.50.222:6371> SHUTDOWN

一定要通過命令行shutdown,這樣才能保證數據的完整性。因爲shutdown的時候Redis會在完成持久化後才進行關閉。

 

最後對配置、系統參數優化後,記得把最開始臨時解決方案改回來,我們還需要這樣的一個配置來爲我們報警。當然如果項目組已經存在其他相關Redis監控報警工具的話,這個配置可以關閉。

stop-writes-on-bgsave-error yes

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章