redis分片集群搭建

哨兵可以解决高可用、高并发读的问题,但是海量数据存储、高并发写的问题仍没有解决,而分片集群具有如下特征,可完美解决这2个问题。

- 集群中有多个master,每个master保存不同数据
- 每个master都可以有多个slave节点
- master之间通过ping监测彼此健康状态
- 客户端请求可以访问集群任意节点,最终都会被转发到正确节点

[TOC]

搭建 redis 分片集群

  • 机器规划清单
IP地址 角色 redis端口 操作系统 配置文件
10.30.3.231 主分片A1 7001 CentOS 7.9 x64 /data/redis_7001/redis_7001.conf
10.30.3.232 主分片B1 7001 CentOS 7.9 x64 /data/redis_7001/redis_7001.conf
10.30.3.233 主分片C1 7001 CentOS 7.9 x64 /data/redis_7001/redis_7001.conf
10.30.3.231 副本分片C2 7002 CentOS 7.9 x64 /data/redis_7002/redis_7002.conf
10.30.3.232 副本分片A2 7002 CentOS 7.9 x64 /data/redis_7002/redis_7002.conf
10.30.3.233 副本分片B2 7002 CentOS 7.9 x64 /data/redis_7002/redis_7002.conf

1.安装 redis 单节点

  • 所有节点:安装2个redis实例,规划7001实例为master节点,7002实例为slave节点
cd /opt/
wget -c http://iso.sqlfans.cn/redis/redis-7.0.11.tar.gz
wget -c http://iso.sqlfans.cn/redis/install_redis_7011.sh
sh install_redis_7011.sh /data 7001
sh install_redis_7011.sh /data 7002

2.配置 redis 分片集群

  • 2.1.所有节点:修改redis配置文件,启用cluster
sed -i '/^cluster/d' /data/redis_7001/redis_7001.conf
echo "cluster-enabled yes" >> /data/redis_7001/redis_7001.conf
echo "cluster-config-file nodes.conf" >> /data/redis_7001/redis_7001.conf
echo "cluster-node-timeout 15000" >> /data/redis_7001/redis_7001.conf
echo "cluster-require-full-coverage no" >> /data/redis_7001/redis_7001.conf
echo "cluster-slave-validity-factor 6" >> /data/redis_7001/redis_7001.conf
cat /data/redis_7001/redis_7001.conf | grep "^cluster"

sed -i '/^cluster/d' /data/redis_7002/redis_7002.conf
echo "cluster-enabled yes" >> /data/redis_7002/redis_7002.conf
echo "cluster-config-file nodes.conf" >> /data/redis_7002/redis_7002.conf
echo "cluster-node-timeout 15000" >> /data/redis_7002/redis_7002.conf
echo "cluster-require-full-coverage no" >> /data/redis_7002/redis_7002.conf
echo "cluster-slave-validity-factor 6" >> /data/redis_7002/redis_7002.conf
cat /data/redis_7002/redis_7002.conf | grep "^cluster"
  • 2.2.所有节点:同redis哨兵一样,启用密码认证需同时配置requirepass和masterauth,密码建议一致,另外,不可禁用CONFIG命令,否则会导致集群无法 failover
cat /data/redis_7001/redis_7001.conf | grep "^masterauth" || echo "masterauth RbY9k2_NBf1QWy8I" >> /data/redis_7001/redis_7001.conf
cat /data/redis_7002/redis_7002.conf | grep "^masterauth" || echo "masterauth RbY9k2_NBf1QWy8I" >> /data/redis_7002/redis_7002.conf

sed -i "s/^rename-command CONFIG/#rename-command CONFIG/" /data/redis_7001/redis_7001.conf
sed -i "s/^rename-command CONFIG/#rename-command CONFIG/" /data/redis_7002/redis_7002.conf
  • 2.3.所有节点:重启redis服务
/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 127.0.0.1 -p 7001 shutdown
/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 127.0.0.1 -p 7002 shutdown
sudo -u redis /usr/local/bin/redis-server /data/redis_7001/redis_7001.conf
sudo -u redis /usr/local/bin/redis-server /data/redis_7002/redis_7002.conf
sleep 2 && netstat -lnpt | grep redis
  • 2.4.任意节点:创建群集(对6个节点分配3主3从),实测:每个ip的第一个port作为master节点,但主从关系是无法指定的(如有需要的话可以在创建后手动调整)

注:redis 5.0之前须安装ruby并借助redis-trib.rb来创建集群,而5.0以后可直接使用redis-cli

/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster create 10.30.3.231:7001 10.30.3.232:7001 10.30.3.233:7001 10.30.3.231:7002 10.30.3.232:7002 10.30.3.233:7002 --cluster-replicas 1

3.验证集群状态

  • 3.1.任意节点:执行 cluster info 验证集群状态
[root@redis01 ~]# echo "cluster info" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | egrep "(cluster_state)"
cluster_state:ok
  • 3.2.任意节点:执行 cluster nodes 确认master与slave的对应关系、散列槽(master节点上)的分布
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
117d2a0bcc6795ad64ca037219f6832d5cdf96a8 10.30.3.231:7001@17001 myself,master - 0 1690362415000 1 connected 0-5460
22f2a7ce9e12a80d8a542514723cf3a4ed815bba 10.30.3.232:7001@17001 master - 0 1690362414134 2 connected 5461-10922
a9d8bc51d2433a2fa8bfd29894b11c918ad70382 10.30.3.233:7001@17001 master - 0 1690362413130 3 connected 10923-16383
efb5846fa366a330bfa2106d9e6d4014d4e91e42 10.30.3.231:7002@17002 slave a9d8bc51d2433a2fa8bfd29894b11c918ad70382 0 1690362416141 3 connected
66f8d3ff88e633f92e4be4a85aafad84e30fc38e 10.30.3.232:7002@17002 slave 117d2a0bcc6795ad64ca037219f6832d5cdf96a8 0 1690362415137 1 connected
df3c9677abbbd33f96d5bebc59f7f09bb3d891d9 10.30.3.233:7002@17002 slave 22f2a7ce9e12a80d8a542514723cf3a4ed815bba 0 1690362417144 2 connected
  • 3.3.任意节点:通过集群任一节点,执行 --cluster check 检查集群状态(示例 7001 节点均为master节点)
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 | egrep "(keys|covered)"

10.30.3.231:7001 (52ab7e31...) -> 0 keys | 5461 slots | 1 slaves.
10.30.3.232:7001 (8353531f...) -> 0 keys | 5462 slots | 1 slaves.
10.30.3.233:7001 (9a878709...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.
[OK] All 16384 slots covered.

4.灾难演练 - 模拟主节点宕机

  • 4.1.任意节点:往 master 主服务(即7001端口)写入数据,并读取测试(示例key写入10.30.3.232:7001
[root@redis01 ~]# echo "set dba kevin" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [5732] located at 10.30.3.232:7001
OK

[root@redis01 ~]# echo "get dba" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [5732] located at 10.30.3.232:7001
"kevin"
  • 4.2.宕机节点:将测试key所在的master服务杀掉(示例key所在主节点为10.30.3.232:7001),模拟master主库宕机
[root@redis02 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 10.30.3.232 -p 7001 shutdown
  • 4.3.正常节点:执行 cluster info 查看集群状态为 cluster_state:ok,执行 cluster nodes 确认模拟宕机的master节点状态为 disconnected,但此时有4个master节点,而新启动的master节点(示例10.30.3.233:7002)其实是宕机节点的原slave节点,注意原master节点的散列槽(示例5461-10922)转移到了新的master节点上了。总结:杀掉任意master节点,集群会将该节点的slave提升为master
[root@redis01 ~]# echo "cluster info" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | egrep "(cluster_state)"
cluster_state:ok

[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | grep master
117d2a0bcc6795ad64ca037219f6832d5cdf96a8 10.30.3.231:7001@17001 myself,master - 0 1690362859000 1 connected 0-5460
22f2a7ce9e12a80d8a542514723cf3a4ed815bba 10.30.3.232:7001@17001 master,fail - 1690362473321 1690362467295 2 disconnected    #.宕机节点,状态fail,散列槽被转移到新master节点上
df3c9677abbbd33f96d5bebc59f7f09bb3d891d9 10.30.3.233:7002@17002 master - 0 1690362860000 7 connected 5461-10922             #.新master,继承了宕机节点的散列槽(5461-10922)
a9d8bc51d2433a2fa8bfd29894b11c918ad70382 10.30.3.233:7001@17001 master - 0 1690362860946 3 connected 10923-16383

[root@redis01 ~]# echo "get dba" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [5732] located at 10.30.3.233:7002
"kevin"

[root@redis01 ~]# echo "set dba kevin2" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [5732] located at 10.30.3.233:7002
OK
  • 4.4.宕机节点:测试完毕,将杀掉的进程开起来
[root@redis02 ~]# sudo -u redis /usr/local/bin/redis-server /data/redis_7001/redis_7001.conf

5.灾难演练 - 模拟从节点宕机

  • 5.1.任意节点:往 master 主服务(即7001端口)写入数据,并读取测试(示例key写入10.30.3.233:7001
[root@redis01 ~]# echo "set dev sam" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [11557] located at 10.30.3.233:7001
OK

[root@redis01 ~]# echo "get dev" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [11557] located at 10.30.3.233:7001
"sam"

[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
117d2a0bcc6795ad64ca037219f6832d5cdf96a8 10.30.3.231:7001@17001 myself,master - 0 1690363755000 1 connected 0-5460
df3c9677abbbd33f96d5bebc59f7f09bb3d891d9 10.30.3.233:7002@17002 master - 0 1690363754926 7 connected 5461-10922
a9d8bc51d2433a2fa8bfd29894b11c918ad70382 10.30.3.233:7001@17001 master - 0 1690363755930 3 connected 10923-16383    #.测试key所在的master节点
efb5846fa366a330bfa2106d9e6d4014d4e91e42 10.30.3.231:7002@17002 slave a9d8bc51d2433a2fa8bfd29894b11c918ad70382 0 1690363756933 3 connected
22f2a7ce9e12a80d8a542514723cf3a4ed815bba 10.30.3.232:7001@17001 slave df3c9677abbbd33f96d5bebc59f7f09bb3d891d9 0 1690363756000 7 connected
66f8d3ff88e633f92e4be4a85aafad84e30fc38e 10.30.3.232:7002@17002 slave 117d2a0bcc6795ad64ca037219f6832d5cdf96a8 0 1690363756000 1 connected
  • 5.2.宕机节点:将测试key所在的slave服务杀掉(示例key所在slave节点为10.30.3.231:7002),模拟slave从库宕机
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | grep "10.30.3.233:7001"
a9d8bc51d2433a2fa8bfd29894b11c918ad70382 10.30.3.233:7001@17001 master - 0 1690363808099 3 connected 10923-16383

[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | grep a9d8bc51d2433a2fa8bfd29894b11c918ad70382
a9d8bc51d2433a2fa8bfd29894b11c918ad70382 10.30.3.233:7001@17001 master - 0 1690363821000 3 connected 10923-16383
efb5846fa366a330bfa2106d9e6d4014d4e91e42 10.30.3.231:7002@17002 slave a9d8bc51d2433a2fa8bfd29894b11c918ad70382 0 1690363823150 3 connected

[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 10.30.3.231 -p 7002 shutdown
  • 5.3.正常节点:执行 cluster info 查看集群状态为 cluster_state:ok,执行 cluster nodes 确认模拟宕机的slave节点状态为 disconnected,但读写数据均正常。总结:杀掉任一从库不会影响集群
[root@redis02 ~]# echo "cluster info" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | egrep "(cluster_state)"
cluster_state:ok

[root@redis02 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | grep disconnected
efb5846fa366a330bfa2106d9e6d4014d4e91e42 10.30.3.231:7002@17002 slave,fail a9d8bc51d2433a2fa8bfd29894b11c918ad70382 1690363847982 1690363843000 3 disconnected

[root@redis02 ~]# echo "get dev" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [11557] located at 10.30.3.233:7001
"sam"

[root@redis02 ~]# echo "set dev sam2" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [11557] located at 10.30.3.233:7001
OK

[root@redis02 ~]# echo "get dev" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [11557] located at 10.30.3.233:7001
"sam2"
  • 5.4.宕机节点:测试完毕,将杀掉的进程开起来
[root@redis01 ~]# sudo -u redis /usr/local/bin/redis-server /data/redis_7002/redis_7002.conf

遇到的问题

场景1:如何彻底卸载redis

  • 参考redis的安装过程,可按照如下步骤彻底卸载redis
cd /opt/
ps -ef | grep redis | grep -v grep | awk '{print $2}' | xargs kill -9 2> /dev/null
userdel -r redis 2> /dev/null
rm -f /opt/install_redis_*.sh
rm -rf /data/redis*
rm -rf /usr/local/redis*
rm -f /usr/local/bin/redis*
sed -i '/redis/d' /etc/rc.local
netstat -lnpt | grep redis

场景2:如何配置redis集群连接池

<connectionStrings>
    <add name="Connection_Redis" connectionString="127.0.0.1:7001,127.0.0.1:7002,127.0.0.1:7003,password=123456,abortConnect=false" />
</connectionStrings>

场景3:主节点宕机导致集群fail

  • 症状:节点1 杀掉 master 主服务,模拟主库宕机,节点2 执行 cluster info 查看集群状态为 cluster_state:fail,查询数据提示 CLUSTERDOWN
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 10.30.3.231 -p 7001 shutdown

[root@redis02 ~]# echo "cluster info" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.232 -p 7001 2>/dev/null | egrep "(cluster_state)"
cluster_state:fail

[root@redis02 ~]# echo "get dba" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.232 -p 7001 2>/dev/null
(error) CLUSTERDOWN The cluster is down
  • 解决:将参数 cluster-require-full-coverage no 设置为no,即使部分主节点不可用,剩下的节点分片也能提供查询服务。该参数默认为yes,表示整个Cluster需要所有slot(16384个)都正常的时候才能对外提供服务,只要任何一个slot异常那么整个cluster不对外提供服务,因此生产环境一般为no。
#.所有节点配置并重启服务生效
cat /data/redis_7001/redis_7001.conf | grep "^cluster-require-full-coverage" || echo "cluster-require-full-coverage no" >> /data/redis_7001/redis_7001.conf
cat /data/redis_7002/redis_7002.conf | grep "^cluster-require-full-coverage" || echo "cluster-require-full-coverage no" >> /data/redis_7002/redis_7002.conf

/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 127.0.0.1 -p 7001 shutdown
/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 127.0.0.1 -p 7002 shutdown
sudo -u redis /usr/local/bin/redis-server /data/redis_7001/redis_7001.conf
sudo -u redis /usr/local/bin/redis-server /data/redis_7002/redis_7002.conf

场景4:主节点宕机不会发生故障转移

  • 模拟故障:将测试key所在的master服务杀掉(示例key所在主节点为10.30.3.232:7001),模拟master主库宕机。
[root@redis01 ~]# echo "set dba kevin" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [5732] located at 10.30.3.232:7001
OK

[root@redis02 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 10.30.3.232 -p 7001 shutdown
  • 症状:正常节点执行 cluster info 查看集群状态为 cluster_state:ok,执行 cluster nodes 确认宕机节点的状态为 disconnected整个集群并未发生故障转移,对测试key进行读写均无返回
[root@redis01 ~]# echo "cluster info" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | egrep "(cluster_state)"
cluster_state:ok

[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null | grep master
d191f0e520e7ed3ebe7554c73681935ab396f887 10.30.3.231:7001@17001 myself,master - 0 1690186384000 1 connected 0-5460
327761af0479d913a1c8d4dfb139fb837661e6fd 10.30.3.232:7001@17001 master,fail - 1690186968071 1690186963046 2 disconnected 5461-10922
823620cbb05d4335ac3319e5e4f9b58b8f46f887 10.30.3.233:7001@17001 master - 0 1690186388977 3 connected 10923-16383

[root@redis01 ~]# echo "get dba" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [5732] located at 10.30.3.232:7001

[root@redis01 ~]# echo "set dba kevin2" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 2>/dev/null
-> Redirected to slot [5732] located at 10.30.3.232:7001
  • 解决:redis集群同哨兵一样,启用密码认证需同时配置requirepass和masterauth,密码建议一致,另外,不可禁用CONFIG命令,否则会导致集群无法 failover
#.所有节点配置并重启服务生效
cat /data/redis_7001/redis_7001.conf | grep "^masterauth" || echo "masterauth RbY9k2_NBf1QWy8I" >> /data/redis_7001/redis_7001.conf
cat /data/redis_7002/redis_7002.conf | grep "^masterauth" || echo "masterauth RbY9k2_NBf1QWy8I" >> /data/redis_7002/redis_7002.conf

sed -i "s/^rename-command CONFIG/#rename-command CONFIG/" /data/redis_7001/redis_7001.conf
sed -i "s/^rename-command CONFIG/#rename-command CONFIG/" /data/redis_7002/redis_7002.conf

/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 127.0.0.1 -p 7001 shutdown
/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 127.0.0.1 -p 7002 shutdown
sudo -u redis /usr/local/bin/redis-server /data/redis_7001/redis_7001.conf
sudo -u redis /usr/local/bin/redis-server /data/redis_7002/redis_7002.conf
Copyright © www.sqlfans.cn 2024 All Right Reserved更新时间: 2024-06-16 12:01:53

results matching ""

    No results matching ""