Redis Cluster搭建与维护

1 安装搭建

1.1 Redis安装

1.1.1 下载安装redis

mkdir -p /opt/redis-4.0.9 && cd /opt/redis-4.0.9
wget http://download.redis.io/releases/redis-4.0.9.tar.gz
yum install -y gcc g++ gcc-c++ make
yum -y update
tar -zxvf redis-4.0.9.tar.gz
make

1.1.2 安装更新ruby

yum -y install ruby ruby-devel rubygems rpm-build
yum install openssl-devel
curl -L get.rvm.io | bash -s stable
source /etc/profile.d/rvm.sh
rvm install 2.4.1
rvm use 2.4.1

1.1.3 安装redis.rb

gem install redis -v 3.3.3

1.2 Cluster配置

1.2.1 搭建规划

每台机安装一个redis,做两份配置文件,起两个进程,搭建一个3主3从的集群。
10.110.211.191:7000/7001
10.110.211.192:7002/7003
10.110.174.25:7004/7005

ssh root@10.110.211.191
mkdir -p /opt/redis-4.0.9/redis-cluster/7000

1.2.2 集群配置

创建/opt/redis-4.0.9/redis-cluster/7000/redis.conf文件然后填入如下内容:

#端口7000,7001,7002
port 7000
#默认ip为127.0.0.1,需要改为其他节点机器可访问的ip,否则创建集群时无法访问对应的端口,无法创建集群
bind 10.110.211.191
#redis后台运行
daemonize yes
#pidfile文件对应7000,7001,7002...
pidfile /var/run/redis_7000.pid
#开启集群
cluster-enabled yes
#集群的配置,配置文件首次启动自动生成
cluster-config-file nodes_7000.conf
#After node timeout has elapsed, a master node is considered to be failing, and can be replaced by one of its replicas.
#Similarly after node timeout has elapsed without a master node to be able to sense the majority of the other master nodes, it enters an error state and stops accepting writes.
cluster-node-timeout 5000
#aof日志开启,有需要就开启,它会每次写操作都记录一条日志
appendonly yes
#log日志路径
logfile /var/log/redis/redis_7000.log
#redis备份文件名
dbfilename dump_7000.rdb
#进程根路径,aof、dump、cluster-config等文件位于这里
dir /opt/redis-4.0.9/redis-cluster/7000/

其他实例配置参考这个并修改相应端口和路径即可。

1.2.3 启动集群

首先依次启动各Redis实例:

ssh root@10.110.211.191
/opt/redis-4.0.9/src/redis-server /opt/redis-4.0.9/redis-cluster/7000/redis.conf
...
ssh root@10.110.174.25
/opt/redis-4.0.9/src/redis-server /opt/redis-4.0.9/redis-cluster/7005/redis.conf

然后使用redis-trib.rb创建集群:

/opt/redis-4.0.9/src/redis-trib.rb create --replicas 1 10.110.211.191:7000 10.110.211.191:7001 10.110.211.192:7002 10.110.211.192:7003 10.110.174.25:7004 10.110.174.25:7005

–replicas 1的意思是每个server有一个备份。

2 集群使用

2.1 命令工具

2.1.1 集群信息

  • cluster info :打印集群的信息
  • cluster nodes :列出集群当前已知的所有节点( node),以及这些节点的相关信息
  • redis-trib.rb check 192.168.252.101:7000 检查集群状态

2.1.2 节点操作

  • cluster meet :将 ip 和 port 所指定的节点添加到集群当中,让它成为集群的一份子。
  • cluster forget :从集群中移除 node_id 指定的节点。
  • cluster replicate :将当前节点设置为 node_id 指定的节点的从节点。
  • cluster saveconfig :将节点的配置文件保存到硬盘里面。
  • cluster failover: executed in one of the slaves of the master you want to failover

2.1.3 槽

  • cluster addslots [slot …] :将一个或多个槽( slot)指派( assign)给当前节点。
  • cluster delslots [slot …] :移除一个或多个槽对当前节点的指派。
  • cluster flushslots :移除指派给当前节点的所有槽,让当前节点变成一个没有指派任何槽的节点。
  • cluster setslot node :将槽 slot 指派给 node_id 指定的节点,如果槽已经指派给另一个节点,那么先让另一个节点删除该槽>,然后再进行指派。
  • cluster setslot migrating :将本节点的槽 slot 迁移到 node_id 指定的节点中。
  • cluster setslot importing :从 node_id 指定的节点中导入槽 slot 到本节点。
  • cluster setslot stable :取消对槽 slot 的导入( import)或者迁移( migrate)。

2.1.4 键

  • cluster keyslot :计算键 key 应该被放置在哪个槽上。
  • cluster countkeysinslot :返回槽 slot 目前包含的键值对数量。
  • cluster getkeysinslot :返回 count 个 slot 槽中的键 。

2.2 日常操作

2.2.1 添加节点

  • add master
    启动新Redis实例,然后使用redis-trib加入到集群:
redis-trib.rb add-node 127.0.0.1:7006(new node) 127.0.0.1:7000(any existing node)

然后可以通过reshard命令给新节点分配slot。

redis-trib.rb reshard 10.110.211.191:7000
  • add slave
redis-trib.rb add-node --slave --master-id 0de0233df887e024575f73f57e74a9ddaeed009d 10.110.211.192:7002new node)10.110.211.191:7000(existing node)

也可以redis-cli连接到任意空节点上然后使用replicate命令使某之成为某个节点的slave:

cluster replicate 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e

2.2.2 删除节点

  • 删除slave节点
./redis-trib del-node 127.0.0.1:7000 `<node-id>`
  • 删除master节点
    先reshard该节点的所有slot分给其他节点,再使用del-node命令删除
    或者使用failover命令使它的一个slave升级为master再删除(master节点数没有减少)
redis-cli -h 10.110.211.191 -p 7000 cluster failover #注意是在其slave节点上执行

2.2.3 节点重启/升级

  • slave
    直接停掉节点,升级完成后重新启动
redis-cli -h 10.110.211.191 -p 7001 shutdown
redis-server /opt/redis-4.0.9/redis-cluster/7001/redis.conf
  • master
    使用cluster failover使该master成为slave,然后升级重启。
    如需要使其重新成为master,可以在其新master上再使用一次failover命令切换回来。

3 Troubleshoot

3.1 ruby version太低无法安装redis

3.2 migrate失败

  • 错误表现

    [ERR] Calling MIGRATE: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)

  • 原因
    redis.rb版本太新,不向后兼容

  • 处理
    降级redis.rb

gem uninstall redis --version 4.0.x
gem install redis -v 3.3.3

参考:https://stackoverflow.com/questions/47774093/redis-cluster-reshard-err-calling-migrate-err-syntax-error

3.3 reshard失败

  • 错误表现

    Check for open slots…
    [WARNING] Node 192.168.44.189:6631 has slots in importing state (12927).
    [WARNING] The following slots are open: 12927

  • 原因
    由于之前命令失败导致redis集群中slot状态有问题。

  • 处理
    尝试使用redis-trib.rb fix命令,不行就使用setslot命令设置slot状态为stable

redis-cli -p 6631 CLUSTER SETSLOT 12927 STABLE

参考:https://github.com/antirez/redis/issues/2776

3.4 add-node失败

  • 错误表现

    [ERR] Node 10.110.211.192:7002 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.

  • 原因
    一台机器上启动了多个redis进程,aof和dump位置重复或者原先在线的redis节点断线太久后重新连接上来,redis db不为空会导致无法加入集群。

  • 处理
    清除配置文件中指定的nodes_7000.conf文件并清除redis中的数据,然后重新加入节点

redis-cli -h 10.110.211.192 -p 7002 flushdb

3.5 集群节点状态为failed

错误表现

redis-cli -h 10.110.211.192 -p 7002 cluster nodes
……
141c68cc373c29fef2b33ee93b64a3425a475275 :0@0 slave,fail,noaddr 1c80515b06f44df77151fcb5dac1c0f3eb499874 1528451188725 1528451188000 13 disconnected
……

  • 原因
    节点重启后从dump恢复数据,集群中又未forget本节点,或者其他原因导致数据不一致无法同步。

  • 处理
    在正常的集群节点中forget掉本节点,flushdb清除节点数据,然后重新add-node

redis-cli -h 10.110.211.191 -p 7000 cluster forget 141c68cc373c29fef2b33ee93b64a3425a475275
redis-cli -h 10.110.211.191 -p 7001 flushdb
redis-trib.rb add-node --slave --master-id 1c80515b06f44df77151fcb5dac1c0f3eb499874 10.110.211.191:7001 10.110.211.191:7000

3.6 创建集群失败

  • 错误表现
    使用redis-trib.rb create创建集群时候抛出错误:

    ERR Slot 12730 is already busy (Redis::CommandError)

  • 原因
    之前的创建失败导致slot处于busy状态

  • 处理
    登陆到对应节点,flushall,cluster reset soft,然后删除nodes.conf文件,再次执行集群创建即可。

4 官网介绍摘抄

  • Every Redis Cluster node requires two TCP connections open. The normal Redis TCP port used to serve clients, for example 6379, plus the port obtained by adding 10000 to the data port.
  • Redis Cluster does not support NATted environments, In order to make Docker compatible with Redis Cluster you need to use the host networking mode of Docker.
  • There are 16384(4k * 4) hash slots in Redis Cluster, and to compute what is the hash slot of a given key, we simply take the CRC16 of the key modulo 16384. Every node in a Redis Cluster is responsible for a subset of the hash slots.
  • Redis Cluster supports multiple key operations as long as all the keys involved into a single command execution (or whole transaction, or Lua script execution) all belong to the same hash slot. The user can force multiple keys to be part of the same hash slot by using a concept called hash tags. if there is a substring between {} brackets in a key, only what is inside the string is hashed.
  • Redis Cluster uses a master-slave model where every hash slot has from 1 (the master itself) to N replicas (N-1 additional slaves nodes).
  • Redis Cluster is not able to guarantee strong consistency. In practical terms this means that under certain conditions it is possible that Redis Cluster will lose writes that were acknowledged by the system to the client.
  • The first reason why Redis Cluster can lose writes is because it uses asynchronous replication. Redis Cluster has support for synchronous writes when absolutely needed, implemented via the WAIT command, but this usually results into prohibitively low performance. Redis Cluster does not implement strong consistency even when synchronous replication is used: it is always possible under more complex failure scenarios that a slave that was not able to receive the write is elected as master.
  • After node timeout has elapsed, a master node is considered to be failing, and can be replaced by one of its replicas. Similarly after node timeout has elapsed without a master node to be able to sense the majority of the other master nodes, it enters an error state and stops accepting writes.
  • Multiple keys operations, transactions, or Lua scripts involving multiple keys are used but only with keys having the same hash tag, which means that the keys used together all have a {…} sub-string that happens to be identical. For example the following multiple keys operation is defined in the context of the same hash tag: SUNION {user:1000}.foo {user:1000}.bar.
  • Redis Cluster configuration parameters:
cluster-enabled <yes/no>

cluster-config-file <filename>

cluster-node-timeout <milliseconds>

cluster-slave-validity-factor <factor>

cluster-migration-barrier <count>

cluster-require-full-coverage <yes/no>
  • A serious client is able to do better than that, and cache the map between hash slots and nodes addresses, to directly use the right connection to the right node.
  • A cluster where every master has a single replica can’t continue operations if the master and its replica fail at the same time, simply because there is no other instance to have a copy of the hash slots the master was serving.
  • The cluster will try to migrate a replica from the master that has the greatest number of replicas in a given moment to an orphaned master.To benefit from replica migration you have just to add a few more replicas to a single master in your cluster, it does not matter what master.
  • Application with multiple keys operations, transactions, or Lua scripts involving multiple keys are used with key names not having an explicit, or the same, hash tag: requires to be modified in order to don’t use multi keys operations or only use them in the context of the same hash tag.

5 参考资料:

CentOs7.3 搭建 Redis-4.0.1 Cluster 集群服务
Redis Quick Start
Redis cluster tutorial
Redis Cluster Specification
Life in a Redis Cluster: Meet and Gossip with your neighbors

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章