redis分片集群搭建
哨兵可以解决高可用、高并发读的问题,但是海量数据存储、高并发写的问题仍没有解决,而分片集群具有如下特征,可完美解决这2个问题。
- 集群中有多个master,每个master保存不同数据
- 每个master都可以有多个slave节点
- master之间通过ping监测彼此健康状态
- 客户端请求可以访问集群任意节点,最终都会被转发到正确节点
[TOC]
搭建 redis 分片集群
- 机器规划清单
IP地址 | 角色 | redis端口 | 操作系统 | 配置文件 |
---|---|---|---|---|
10.30.3.231 | 主分片A1 | 7001 | CentOS 7.9 x64 | /data/redis_7001/redis_7001.conf |
10.30.3.232 | 主分片B1 | 7001 | CentOS 7.9 x64 | /data/redis_7001/redis_7001.conf |
10.30.3.233 | 主分片C1 | 7001 | CentOS 7.9 x64 | /data/redis_7001/redis_7001.conf |
10.30.3.231 | 副本分片C2 | 7002 | CentOS 7.9 x64 | /data/redis_7002/redis_7002.conf |
10.30.3.232 | 副本分片A2 | 7002 | CentOS 7.9 x64 | /data/redis_7002/redis_7002.conf |
10.30.3.233 | 副本分片B2 | 7002 | CentOS 7.9 x64 | /data/redis_7002/redis_7002.conf |
1.安装 redis 单节点
- 所有节点:安装2个redis实例,规划
7001
实例为master节点,7002
实例为slave节点
cd /opt/
wget -c http://iso.sqlfans.cn/redis/redis-7.0.11.tar.gz
wget -c http://iso.sqlfans.cn/redis/install_redis_7011.sh
sh install_redis_7011.sh /data 7001
sh install_redis_7011.sh /data 7002
2.配置 redis 分片集群
- 2.1.所有节点:修改redis配置文件,启用cluster
sed -i '/^cluster/d' /data/redis_7001/redis_7001.conf
echo "cluster-enabled yes" >> /data/redis_7001/redis_7001.conf
echo "cluster-config-file nodes.conf" >> /data/redis_7001/redis_7001.conf
echo "cluster-node-timeout 15000" >> /data/redis_7001/redis_7001.conf
echo "cluster-require-full-coverage no" >> /data/redis_7001/redis_7001.conf
echo "cluster-slave-validity-factor 6" >> /data/redis_7001/redis_7001.conf
cat /data/redis_7001/redis_7001.conf | grep "^cluster"
sed -i '/^cluster/d' /data/redis_7002/redis_7002.conf
echo "cluster-enabled yes" >> /data/redis_7002/redis_7002.conf
echo "cluster-config-file nodes.conf" >> /data/redis_7002/redis_7002.conf
echo "cluster-node-timeout 15000" >> /data/redis_7002/redis_7002.conf
echo "cluster-require-full-coverage no" >> /data/redis_7002/redis_7002.conf
echo "cluster-slave-validity-factor 6" >> /data/redis_7002/redis_7002.conf
cat /data/redis_7002/redis_7002.conf | grep "^cluster"
- 2.2.所有节点:同redis哨兵一样,启用密码认证需同时配置requirepass和masterauth,密码建议一致,另外,不可禁用CONFIG命令,否则会导致集群无法 failover
cat /data/redis_7001/redis_7001.conf | grep "^masterauth" || echo "masterauth RbY9k2_NBf1QWy8I" >> /data/redis_7001/redis_7001.conf
cat /data/redis_7002/redis_7002.conf | grep "^masterauth" || echo "masterauth RbY9k2_NBf1QWy8I" >> /data/redis_7002/redis_7002.conf
sed -i "s/^rename-command CONFIG/#rename-command CONFIG/" /data/redis_7001/redis_7001.conf
sed -i "s/^rename-command CONFIG/#rename-command CONFIG/" /data/redis_7002/redis_7002.conf
- 2.3.所有节点:重启redis服务
/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 127.0.0.1 -p 7001 shutdown
/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 127.0.0.1 -p 7002 shutdown
sudo -u redis /usr/local/bin/redis-server /data/redis_7001/redis_7001.conf
sudo -u redis /usr/local/bin/redis-server /data/redis_7002/redis_7002.conf
sleep 2 && netstat -lnpt | grep redis
- 2.4.任意节点:创建群集(对6个节点分配3主3从),实测:每个ip的第一个port作为master节点,但主从关系是无法指定的(如有需要的话可以在创建后手动调整)
注:redis 5.0之前须安装ruby并借助redis-trib.rb来创建集群,而5.0以后可直接使用redis-cli
/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster create 10.30.3.231:7001 10.30.3.232:7001 10.30.3.233:7001 10.30.3.231:7002 10.30.3.232:7002 10.30.3.233:7002 --cluster-replicas 1
3.验证集群状态
- 3.1.任意节点:执行
cluster info
验证集群状态
[root@redis01 ~]# echo "cluster info" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | egrep "(cluster_state)"
cluster_state:ok
- 3.2.任意节点:执行
cluster nodes
确认master与slave的对应关系、散列槽(master节点上)的分布
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
117d2a0bcc6795ad64ca037219f6832d5cdf96a8 10.30.3.231:7001@17001 myself,master - 0 1690362415000 1 connected 0-5460
22f2a7ce9e12a80d8a542514723cf3a4ed815bba 10.30.3.232:7001@17001 master - 0 1690362414134 2 connected 5461-10922
a9d8bc51d2433a2fa8bfd29894b11c918ad70382 10.30.3.233:7001@17001 master - 0 1690362413130 3 connected 10923-16383
efb5846fa366a330bfa2106d9e6d4014d4e91e42 10.30.3.231:7002@17002 slave a9d8bc51d2433a2fa8bfd29894b11c918ad70382 0 1690362416141 3 connected
66f8d3ff88e633f92e4be4a85aafad84e30fc38e 10.30.3.232:7002@17002 slave 117d2a0bcc6795ad64ca037219f6832d5cdf96a8 0 1690362415137 1 connected
df3c9677abbbd33f96d5bebc59f7f09bb3d891d9 10.30.3.233:7002@17002 slave 22f2a7ce9e12a80d8a542514723cf3a4ed815bba 0 1690362417144 2 connected
- 3.3.任意节点:通过集群任一节点,执行
--cluster check
检查集群状态(示例7001
节点均为master节点)
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 | egrep "(keys|covered)"
10.30.3.231:7001 (52ab7e31...) -> 0 keys | 5461 slots | 1 slaves.
10.30.3.232:7001 (8353531f...) -> 0 keys | 5462 slots | 1 slaves.
10.30.3.233:7001 (9a878709...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.
[OK] All 16384 slots covered.
4.灾难演练 - 模拟主节点宕机
- 4.1.任意节点:往 master 主服务(即
7001
端口)写入数据,并读取测试(示例key
写入10.30.3.232:7001
)
[root@redis01 ~]# echo "set dba kevin" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [5732] located at 10.30.3.232:7001
OK
[root@redis01 ~]# echo "get dba" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [5732] located at 10.30.3.232:7001
"kevin"
- 4.2.宕机节点:将测试key所在的master服务杀掉(示例
key
所在主节点为10.30.3.232:7001
),模拟master主库宕机
[root@redis02 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 10.30.3.232 -p 7001 shutdown
- 4.3.正常节点:执行
cluster info
查看集群状态为cluster_state:ok
,执行cluster nodes
确认模拟宕机的master节点状态为disconnected
,但此时有4个master节点,而新启动的master节点(示例10.30.3.233:7002
)其实是宕机节点的原slave节点,注意原master节点的散列槽(示例5461-10922
)转移到了新的master节点上了。总结:杀掉任意master节点,集群会将该节点的slave提升为master
[root@redis01 ~]# echo "cluster info" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | egrep "(cluster_state)"
cluster_state:ok
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | grep master
117d2a0bcc6795ad64ca037219f6832d5cdf96a8 10.30.3.231:7001@17001 myself,master - 0 1690362859000 1 connected 0-5460
22f2a7ce9e12a80d8a542514723cf3a4ed815bba 10.30.3.232:7001@17001 master,fail - 1690362473321 1690362467295 2 disconnected #.宕机节点,状态fail,散列槽被转移到新master节点上
df3c9677abbbd33f96d5bebc59f7f09bb3d891d9 10.30.3.233:7002@17002 master - 0 1690362860000 7 connected 5461-10922 #.新master,继承了宕机节点的散列槽(5461-10922)
a9d8bc51d2433a2fa8bfd29894b11c918ad70382 10.30.3.233:7001@17001 master - 0 1690362860946 3 connected 10923-16383
[root@redis01 ~]# echo "get dba" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [5732] located at 10.30.3.233:7002
"kevin"
[root@redis01 ~]# echo "set dba kevin2" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [5732] located at 10.30.3.233:7002
OK
- 4.4.宕机节点:测试完毕,将杀掉的进程开起来
[root@redis02 ~]# sudo -u redis /usr/local/bin/redis-server /data/redis_7001/redis_7001.conf
5.灾难演练 - 模拟从节点宕机
- 5.1.任意节点:往 master 主服务(即
7001
端口)写入数据,并读取测试(示例key
写入10.30.3.233:7001
)
[root@redis01 ~]# echo "set dev sam" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [11557] located at 10.30.3.233:7001
OK
[root@redis01 ~]# echo "get dev" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [11557] located at 10.30.3.233:7001
"sam"
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
117d2a0bcc6795ad64ca037219f6832d5cdf96a8 10.30.3.231:7001@17001 myself,master - 0 1690363755000 1 connected 0-5460
df3c9677abbbd33f96d5bebc59f7f09bb3d891d9 10.30.3.233:7002@17002 master - 0 1690363754926 7 connected 5461-10922
a9d8bc51d2433a2fa8bfd29894b11c918ad70382 10.30.3.233:7001@17001 master - 0 1690363755930 3 connected 10923-16383 #.测试key所在的master节点
efb5846fa366a330bfa2106d9e6d4014d4e91e42 10.30.3.231:7002@17002 slave a9d8bc51d2433a2fa8bfd29894b11c918ad70382 0 1690363756933 3 connected
22f2a7ce9e12a80d8a542514723cf3a4ed815bba 10.30.3.232:7001@17001 slave df3c9677abbbd33f96d5bebc59f7f09bb3d891d9 0 1690363756000 7 connected
66f8d3ff88e633f92e4be4a85aafad84e30fc38e 10.30.3.232:7002@17002 slave 117d2a0bcc6795ad64ca037219f6832d5cdf96a8 0 1690363756000 1 connected
- 5.2.宕机节点:将测试key所在的slave服务杀掉(示例
key
所在slave节点为10.30.3.231:7002
),模拟slave从库宕机
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | grep "10.30.3.233:7001"
a9d8bc51d2433a2fa8bfd29894b11c918ad70382 10.30.3.233:7001@17001 master - 0 1690363808099 3 connected 10923-16383
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | grep a9d8bc51d2433a2fa8bfd29894b11c918ad70382
a9d8bc51d2433a2fa8bfd29894b11c918ad70382 10.30.3.233:7001@17001 master - 0 1690363821000 3 connected 10923-16383
efb5846fa366a330bfa2106d9e6d4014d4e91e42 10.30.3.231:7002@17002 slave a9d8bc51d2433a2fa8bfd29894b11c918ad70382 0 1690363823150 3 connected
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 10.30.3.231 -p 7002 shutdown
- 5.3.正常节点:执行
cluster info
查看集群状态为cluster_state:ok
,执行cluster nodes
确认模拟宕机的slave节点状态为disconnected
,但读写数据均正常。总结:杀掉任一从库不会影响集群
[root@redis02 ~]# echo "cluster info" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | egrep "(cluster_state)"
cluster_state:ok
[root@redis02 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | grep disconnected
efb5846fa366a330bfa2106d9e6d4014d4e91e42 10.30.3.231:7002@17002 slave,fail a9d8bc51d2433a2fa8bfd29894b11c918ad70382 1690363847982 1690363843000 3 disconnected
[root@redis02 ~]# echo "get dev" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [11557] located at 10.30.3.233:7001
"sam"
[root@redis02 ~]# echo "set dev sam2" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [11557] located at 10.30.3.233:7001
OK
[root@redis02 ~]# echo "get dev" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [11557] located at 10.30.3.233:7001
"sam2"
- 5.4.宕机节点:测试完毕,将杀掉的进程开起来
[root@redis01 ~]# sudo -u redis /usr/local/bin/redis-server /data/redis_7002/redis_7002.conf
遇到的问题
场景1:如何彻底卸载redis
- 参考redis的安装过程,可按照如下步骤彻底卸载redis
cd /opt/
ps -ef | grep redis | grep -v grep | awk '{print $2}' | xargs kill -9 2> /dev/null
userdel -r redis 2> /dev/null
rm -f /opt/install_redis_*.sh
rm -rf /data/redis*
rm -rf /usr/local/redis*
rm -f /usr/local/bin/redis*
sed -i '/redis/d' /etc/rc.local
netstat -lnpt | grep redis
场景2:如何配置redis集群连接池
<connectionStrings>
<add name="Connection_Redis" connectionString="127.0.0.1:7001,127.0.0.1:7002,127.0.0.1:7003,password=123456,abortConnect=false" />
</connectionStrings>
场景3:主节点宕机导致集群fail
- 症状:节点1 杀掉 master 主服务,模拟主库宕机,节点2 执行
cluster info
查看集群状态为cluster_state:fail
,查询数据提示CLUSTERDOWN
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 10.30.3.231 -p 7001 shutdown
[root@redis02 ~]# echo "cluster info" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.232 -p 7001 2>/dev/null | egrep "(cluster_state)"
cluster_state:fail
[root@redis02 ~]# echo "get dba" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.232 -p 7001 2>/dev/null
(error) CLUSTERDOWN The cluster is down
- 解决:将参数
cluster-require-full-coverage no
设置为no,即使部分主节点不可用,剩下的节点分片也能提供查询服务。该参数默认为yes,表示整个Cluster需要所有slot(16384个)都正常的时候才能对外提供服务,只要任何一个slot异常那么整个cluster不对外提供服务,因此生产环境一般为no。
#.所有节点配置并重启服务生效
cat /data/redis_7001/redis_7001.conf | grep "^cluster-require-full-coverage" || echo "cluster-require-full-coverage no" >> /data/redis_7001/redis_7001.conf
cat /data/redis_7002/redis_7002.conf | grep "^cluster-require-full-coverage" || echo "cluster-require-full-coverage no" >> /data/redis_7002/redis_7002.conf
/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 127.0.0.1 -p 7001 shutdown
/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 127.0.0.1 -p 7002 shutdown
sudo -u redis /usr/local/bin/redis-server /data/redis_7001/redis_7001.conf
sudo -u redis /usr/local/bin/redis-server /data/redis_7002/redis_7002.conf
场景4:主节点宕机不会发生故障转移
- 模拟故障:将测试key所在的master服务杀掉(示例
key
所在主节点为10.30.3.232:7001
),模拟master主库宕机。
[root@redis01 ~]# echo "set dba kevin" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [5732] located at 10.30.3.232:7001
OK
[root@redis02 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 10.30.3.232 -p 7001 shutdown
- 症状:正常节点执行
cluster info
查看集群状态为cluster_state:ok
,执行cluster nodes
确认宕机节点的状态为disconnected
,整个集群并未发生故障转移,对测试key进行读写均无返回
[root@redis01 ~]# echo "cluster info" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | egrep "(cluster_state)"
cluster_state:ok
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | grep master
d191f0e520e7ed3ebe7554c73681935ab396f887 10.30.3.231:7001@17001 myself,master - 0 1690186384000 1 connected 0-5460
327761af0479d913a1c8d4dfb139fb837661e6fd 10.30.3.232:7001@17001 master,fail - 1690186968071 1690186963046 2 disconnected 5461-10922
823620cbb05d4335ac3319e5e4f9b58b8f46f887 10.30.3.233:7001@17001 master - 0 1690186388977 3 connected 10923-16383
[root@redis01 ~]# echo "get dba" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [5732] located at 10.30.3.232:7001
[root@redis01 ~]# echo "set dba kevin2" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [5732] located at 10.30.3.232:7001
- 解决:redis集群同哨兵一样,启用密码认证需同时配置requirepass和masterauth,密码建议一致,另外,不可禁用CONFIG命令,否则会导致集群无法 failover
#.所有节点配置并重启服务生效
cat /data/redis_7001/redis_7001.conf | grep "^masterauth" || echo "masterauth RbY9k2_NBf1QWy8I" >> /data/redis_7001/redis_7001.conf
cat /data/redis_7002/redis_7002.conf | grep "^masterauth" || echo "masterauth RbY9k2_NBf1QWy8I" >> /data/redis_7002/redis_7002.conf
sed -i "s/^rename-command CONFIG/#rename-command CONFIG/" /data/redis_7001/redis_7001.conf
sed -i "s/^rename-command CONFIG/#rename-command CONFIG/" /data/redis_7002/redis_7002.conf
/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 127.0.0.1 -p 7001 shutdown
/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 127.0.0.1 -p 7002 shutdown
sudo -u redis /usr/local/bin/redis-server /data/redis_7001/redis_7001.conf
sudo -u redis /usr/local/bin/redis-server /data/redis_7002/redis_7002.conf