redis哨兵搭建
[TOC]
安装 redis 单节点
- [所有节点] 快速安装 Redis 7.0.11,示例 数据目录
/data/redis_${port}
,端口7100
cd /opt/
wget -c http://iso.sqlfans.cn/redis/redis-7.0.11.tar.gz
wget -c http://iso.sqlfans.cn/redis/install_redis_7011.sh
sh install_redis_7011.sh /data 7100
- 确认读写及版本信息
echo "set dba kevin" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7100 2>/dev/null
echo "get dba" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7100 2>/dev/null
echo "save" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7100 2>/dev/null
echo "info server" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7100 2>/dev/null | grep redis_version
搭建 redis 哨兵
- 机器规划清单
IP地址 | 角色 | redis端口 | sentinel端口 | 操作系统 | 数据目录 |
---|---|---|---|---|---|
10.30.3.231 | 主库1 | 7100 | 7200 | CentOS 7.9 x64 | /data/redis_7100 |
10.30.3.232 | 从库2 | 7100 | 7200 | CentOS 7.9 x64 | /data/redis_7100 |
10.30.3.233 | 从库3 | 7100 | 7200 | CentOS 7.9 x64 | /data/redis_7100 |
1.安装 redis 单节点
- 所有节点:关于 redis 的单节点安装,可参考上面
- 所有节点:关于 redis 的配置,可参考
cat /data/redis_7100/redis_7100.conf
2.搭建 redis 主从
- 2.1.主节点:添加 masterauth 密码(与
requirepass
保持一致),主库不用配置replicaof
cat /data/redis_7100/redis_7100.conf | grep "^masterauth" || echo "masterauth RbY9k2_NBf1QWy8I" >> /data/redis_7100/redis_7100.conf
- 2.2.从节点:添加 masterauth 及 replicaof 参数,指定master的ip(示例
10.30.3.231
)和port(示例7100
)
cat /data/redis_7100/redis_7100.conf | grep "^masterauth" || echo "masterauth RbY9k2_NBf1QWy8I" >> /data/redis_7100/redis_7100.conf
cat /data/redis_7100/redis_7100.conf | grep "^replicaof" || echo "replicaof 10.30.3.231 7100" >> /data/redis_7100/redis_7100.conf
- 2.3.所有节点:重启redis服务,并确认主从状态
#.不可禁用CONFIG命令否则会导致哨兵无法故障转移
sed -i "s/^rename-command CONFIG/#rename-command CONFIG/" /data/redis_7100/redis_7100.conf
/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -h 127.0.0.1 -p 7100 shutdown
sudo -u redis /usr/local/bin/redis-server /data/redis_7100/redis_7100.conf
echo "info replication" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7100 2>/dev/null | egrep "(role|slave0|slave1)"
3.搭建 reidis 哨兵
- 3.1.所有节点:配置哨兵,
sentinel monitor
指定主节点信息,mymaster
指定主节点的名称(自定义),10.30.3.231 7100
指定主节点的ip和端口,2
指定选举master时的quorum值
cat > /data/redis_7100/sentinel_7200.conf <<EOF
bind 0.0.0.0
port 7200
daemonize yes
supervised systemd
logfile /data/redis_7100/log/sentinel.log
pidfile /data/redis_7100/pid/sentinel.pid
dir /data/redis_7100/dump
sentinel monitor mymaster 10.30.3.231 7100 2
sentinel auth-pass mymaster RbY9k2_NBf1QWy8I
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000
EOF
- 3.2.所有节点:启动哨兵进程
chown -R redis.redis /data/redis_7100
chmod 600 /data/redis_7100/*.conf
sudo -u redis /usr/local/bin/redis-sentinel /data/redis_7100/sentinel_7200.conf
netstat -lntp | egrep "(7100|7200)"
- 3.3.所有节点:添加到开机启动
cat /etc/rc.local | grep sentinel_7200 || echo "sudo -u redis /usr/local/bin/redis-sentinel /data/redis_7100/sentinel_7200.conf" >> /etc/rc.local
4.验证集群
- 4.1.主节点:写入数据
echo "info replication" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7100 2>/dev/null | egrep "(role|slave0|slave1)"
echo "set dev sam" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7100 2>/dev/null
echo "get dev" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7100 2>/dev/null
- 4.2.从节点:确认集群状态 + 只读状态 + 数据同步
echo "info replication" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7100 2>/dev/null | egrep "(role|slave0|slave1)"
echo "get dev" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7100 2>/dev/null
echo "set dev sam" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7100 2>/dev/null
- 4.3.任意节点:查看哨兵状态
/usr/local/bin/redis-cli -c -h 10.30.3.231 -p 7200 info sentinel | grep master
/usr/local/bin/redis-cli -c -h 10.30.3.232 -p 7200 info sentinel | grep master
/usr/local/bin/redis-cli -c -h 10.30.3.233 -p 7200 info sentinel | grep master
5.灾难演练 - 模拟主库宕机
- 5.1.主节点:将redis进程杀掉,模拟主库宕机
ps -ef | grep redis | grep -v grep | awk '{print $2}' | xargs kill -9 2> /dev/null
- 5.2.任意节点:验证 who 为当前主节点
echo "info replication" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7100 2>/dev/null | egrep "(role|slave0|slave1)"
echo "info replication" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.232 -p 7100 2>/dev/null | egrep "(role|slave0|slave1)"
echo "info replication" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.233 -p 7100 2>/dev/null | egrep "(role|slave0|slave1)"
- 5.3.任意从节点:查看哨兵的故障转移过程,总结:主节点宕机会触发选举,并自动故障转移
cat /data/redis_7100/log/sentinel.log | egrep "(+monitor|+sentinel|+sdown|+vote-for-leader|+odown|+config-update-from|+switch-master|+slave)" | tail -n20
注:附上一个完整的故障转移的demo日志:
curl http://iso.sqlfans.cn/redis/switch_master_log.txt
- 5.4.故障节点:测试完毕,再将redis及哨兵启动
sudo -u redis /usr/local/bin/redis-server /data/redis_7100/redis_7100.conf
sudo -u redis /usr/local/bin/redis-sentinel /data/redis_7100/sentinel_7200.conf
netstat -lntp | egrep "(7100|7200)"
6.灾难演练 - 模拟从库宕机
- 6.1.任意从节点(假设
reids03
):将redis进程杀掉,模拟从库宕机
ps -ef | grep redis | grep -v grep | awk '{print $2}' | xargs kill -9 2> /dev/null
- 6.2.任意节点:验证 who 为当前主节点
echo "info replication" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7100 2>/dev/null | egrep "(role|slave0|slave1)"
echo "info replication" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.232 -p 7100 2>/dev/null | egrep "(role|slave0|slave1)"
echo "info replication" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.233 -p 7100 2>/dev/null | egrep "(role|slave0|slave1)"
- 6.3.主节点(假设
reids01
):主库写入数据,其他从库(假设reids02
)正常同步,总结:单个从库宕机不影响哨兵同步
echo "set dev2 sam2" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7100 2>/dev/null
echo "get dev2" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.232 -p 7100 2>/dev/null
- 6.4.故障节点:测试完毕,再将redis及哨兵启动
sudo -u redis /usr/local/bin/redis-server /data/redis_7100/redis_7100.conf
sudo -u redis /usr/local/bin/redis-sentinel /data/redis_7100/sentinel_7200.conf
netstat -lntp | egrep "(7100|7200)"
遇到的问题
场景1:如何彻底卸载redis
- 参考redis的安装过程,可按照如下步骤彻底卸载redis
cd /opt/
ps -ef | grep redis | grep -v grep | awk '{print $2}' | xargs kill -9 2> /dev/null
userdel -r redis 2> /dev/null
rm -f /opt/install_redis_*.sh
rm -rf /data/redis*
rm -rf /usr/local/redis*
rm -f /usr/local/bin/redis*
sed -i '/redis/d' /etc/rc.local
netstat -lnpt | grep redis
场景2:nacos如何配置redis连接串
- 如果redis为单机、主备、读写分离、Proxy集群实例,那么在nacos配置的时候,请使用redis服务的ip和端口
spring.redis.host=10.30.3.231
spring.redis.port=7100
spring.redis.password=<password>
- 如果redis为哨兵模式,那么在nacos配置的时候,请使用哨兵服务的ip和端口
spring.redis.sentinel.master=mymaster
spring.redis.sentinel.nodes=10.30.3.231:7200,10.30.3.232:7200,10.30.3.233:7200
spring.redis.password=<password>
- 如果redis为Cluster集群实例,比如3主3从的架构,请使用所有节点的ip和端口,更多参考 Jedis客户端连接Redis
spring.redis.cluster.nodes=<ip1:port1>,<ip2:port2>,<ip3:port3>,<ip4:port4>,<ip5:port5>,<ip6:port6>
spring.redis.password=<password>
场景3:主节点宕机不会发生故障转移
- 症状:2023.07.21,将主节点的redis进程杀掉,模拟主库宕机,但redis哨兵不会发生故障转移,执行
info replication
看到2个从节点仍然是slave - 原因:由于最佳化安装的时候,配置了
rename-command CONFIG ""
禁止修改配置项,从而导致主机点宕机的时候无法发生故障转移 - 解决:将禁用CONFIG的这一行注释掉并重启redis服务即可,即
#rename-command CONFIG ""