redis分片集群管理
[TOC]
增加节点
- 基线环境:以下测试基于目前的redis集群:
分组 | master节点 | slave节点 |
---|---|---|
group_0 | 10.30.3.231:7001 | 10.30.3.232:7002 |
group_1 | 10.30.3.232:7001 | 10.30.3.233:7002 |
group_2 | 10.30.3.233:7001 | 10.30.3.231:7002 |
1.添加为主节点
- 添加之前:3个master、3个slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690427612000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690427610000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690427613195 3 connected 10923-16383
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690427611187 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690427610000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690427612191 2 connected
- 使用
--cluster add-node ip1:port1 ip2:port2
将ip1:port1
作为master节点添加到现有集群中(ip2:port2
为集群中任意master或slave节点),示例将10.30.3.234:7001
增加为集群的master节点
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster add-node 10.30.3.234:7001 10.30.3.231:7001
>>> Adding node 10.30.3.234:7001 to cluster 10.30.3.231:7001
>>> Performing Cluster Check (using node 10.30.3.231:7001)
......
>>> Getting functions from cluster
>>> Send FUNCTION LIST to 10.30.3.234:7001 to verify there is no functions in it
>>> Send FUNCTION RESTORE to 10.30.3.234:7001
>>> Send CLUSTER MEET to node 10.30.3.234:7001 to make it join the cluster.
[OK] New node added correctly.
- 添加之后:4个master、3个slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690427612000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690427610000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690427613195 3 connected 10923-16383
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690427611187 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690427610000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690427612191 2 connected
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690427611000 0 connected #.新增的master节点
2.添加为随机的从节点
- 添加之前:4个master、3个slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690442195000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690442199000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690442198769 3 connected 10923-16383
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690442200776 0 connected #.这个master节点没有slave
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690442098000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690442098000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690442098445 2 connected
- 使用
--cluster add-node ip1:port1 ip2:port2 --cluster-slave
将ip1:port1
作为slave节点随机添加到现有集群中(ip2:port2
为集群中任意master或slave节点),默认会将其分配给slave最少的master节点(示例10.30.3.234:7001
没有slave节点)做从库
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster add-node 10.30.3.234:7002 10.30.3.231:7001 --cluster-slave
>>> Adding node 10.30.3.234:7002 to cluster 10.30.3.231:7001
>>> Performing Cluster Check (using node 10.30.3.231:7001)
......
>>> Configure node as replica of 10.30.3.234:7001.
[OK] New node added correctly.
- 添加之后:4个master、4个slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690442195000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690442199000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690442198769 3 connected 10923-16383
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690442200776 0 connected #.作为slave分配给此master节点了
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690442098000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690442098000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690442098445 2 connected
1d49af7153442e8ecd0138d9225ce57984ef923c 10.30.3.234:7002@17002 slave a373409411ac0e11f37bae7c2b97c78c9f6e9897 0 1690427775000 0 connected #.新增的slave节点
3.删除新加的随机从节点
- 添加之前:4个master、4个slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690442195000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690442199000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690442198769 3 connected 10923-16383
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690442200776 0 connected
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690442098000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690442098000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690442098445 2 connected
1d49af7153442e8ecd0138d9225ce57984ef923c 10.30.3.234:7002@17002 slave a373409411ac0e11f37bae7c2b97c78c9f6e9897 0 1690427775000 0 connected #.要删除的slave节点
- 使用
--cluster del-node ip:port <node-id>
将slot为空的节点<node-id>
从集群(ip:port
为集群中任意master或slave节点)中删除,示例删除10.30.3.234:7002
这个新加的slave节点,因为该实例上没有slot,是可以直接移除的
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster del-node 10.30.3.231:7001 1d49af7153442e8ecd0138d9225ce57984ef923c
>>> Removing node 1d49af7153442e8ecd0138d9225ce57984ef923c from cluster 10.30.3.231:7001
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.
- 添加之后:4个master、3个slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690442195000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690442199000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690442198769 3 connected 10923-16383
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690442200776 0 connected
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690442098000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690442098000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690442098445 2 connected
4.重新添加为指定master节点的从节点
- 添加之前:4个master、3个slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690442195000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690442199000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690442198769 3 connected 10923-16383 #.将新节点指定为此节点的slave
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690442200776 0 connected #.此master节点没有slave
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690442098000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690442098000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690442098445 2 connected
- 先找到指定master节点的id,比如
10.30.3.233:7001
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null | grep "10.30.3.233:7001"
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690443114539 3 connected 10923-16383
- 使用
--cluster add-node ip1:port1 ip2:port2 --cluster-slave --cluster-master-id <node-id>
将ip1:port1
作为<node-id>
的slave节点添加到现有集群中(ip2:port2
为集群中任意master或slave节点),示例将10.30.3.234:7002
作为10.30.3.233:7001
的从库添加到集群中
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster add-node 10.30.3.234:7002 10.30.3.231:7001 --cluster-slave --cluster-master-id c274c1ef585ad0f9801e701d0b7fdc66956ae4c8
>>> Adding node 10.30.3.234:7002 to cluster 10.30.3.231:7001
>>> Performing Cluster Check (using node 10.30.3.231:7001)
......
>>> Configure node as replica of 10.30.3.233:7001.
[OK] New node added correctly.
- 添加之后:4个master、4个slave,确认
10.30.3.233:7001
有2个slave,而10.30.3.234:7001
没有slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690443420000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690443424000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690443424474 3 connected 10923-16383 #.这个master有2个slave
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690443422000 0 connected
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690443422000 3 connected #.第1个slave
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690443423471 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690443425476 2 connected
1d49af7153442e8ecd0138d9225ce57984ef923c 10.30.3.234:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690443423000 3 connected #.第2个slave
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 2812 keys | 5461 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 2924 keys | 5462 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 2869 keys | 5461 slots | 2 slaves. #.这个master有2个slave
10.30.3.234:7001 (a3734094...) -> 0 keys | 0 slots | 0 slaves. #.这个master没有slave
slot 数量平衡
- 执行如下语句,批量插入 15000 条随机数据
for n in $(seq -w 1 15000); do echo set hello_$RANDOM $RANDOM; done | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 > /dev/null
/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep keys | grep slaves
reshard 手动转移 slot
- 转移slot之前:使用
--cluster check ip:port
确认集群中(ip:port
为集群中任意master或slave节点)各master节点的keys及slot数量,示例10.30.3.234:7001
的keys为0、slots为0、slaves为0
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 4912 keys | 5461 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 5046 keys | 5462 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 4984 keys | 5461 slots | 2 slaves.
10.30.3.234:7001 (a3734094...) -> 0 keys | 0 slots | 0 slaves. #.注意这里 0 keys 且 0 slots 且 0 slaves
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690443420000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690443424000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690443424474 3 connected 10923-16383 #.示例将此节点转移100个给新master节点
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690443422000 0 connected #.注意这里没有分配slot范围
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690443422000 3 connected #.10.30.3.233:7001的第1个slave
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690443423471 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690443425476 2 connected
1d49af7153442e8ecd0138d9225ce57984ef923c 10.30.3.234:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690443423000 3 connected #.10.30.3.233:7001的第2个slave
- 利用
--cluster reshard ip:port
在集群中(ip:port
为集群中任意master或slave节点)将源节点转移一定数量的slot给目标节点,示例将10.30.3.233:7001
转移100个slot给新加入的master节点(即10.30.3.234:7001
)
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster reshard 10.30.3.231:7001
>>> Performing Cluster Check (using node 10.30.3.231:7001)
......
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 100 #.选择要转移多少个slot,示例 100
What is the receiving node ID? a373409411ac0e11f37bae7c2b97c78c9f6e9897 #.选择要接收slot的节点id,即新的master节点id
Please enter all the source node IDs.
Type 'all' to use all the nodes as source nodes for the hash slots.
Type 'done' once you entered all the source nodes IDs.
Source node #1: c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 #.输入第一个源master的id,若要从所有master节点转移则输入 all
Source node #2: done #.输入第二个源master的id,若只有1个则输入 done
Ready to move 100 slots.
......
Do you want to proceed with the proposed reshard plan (yes/no)? yes #.输入yes
......
- 转移slot之后:源master节点的100个slot被转移到了目标master节点上,同时源master节点上的keys也一并转移到了目标master节点上,更神奇的是,slave节点也发生了转移(示例
10.30.3.234:7002
从10.30.3.233:7001
的从库变成10.30.3.234:7001
的从库)
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 4912 keys | 5461 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 5046 keys | 5462 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 4896 keys | 5361 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 88 keys | 100 slots | 1 slaves. #.注意这里 88 keys 且 100 slots 且 1 slaves
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690445122000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690445123000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690445124000 3 connected 11023-16383 #.源master少了100个slot
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690445125521 8 connected 10923-11022 #.新master多了100个slot
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690445125000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690445123000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690445126525 2 connected
1d49af7153442e8ecd0138d9225ce57984ef923c 10.30.3.234:7002@17002 slave a373409411ac0e11f37bae7c2b97c78c9f6e9897 0 1690445124000 8 connected #.从10.30.3.233:7001的从库变成10.30.3.234:7001的从库
rebalance 自动均衡分配 slot
- 通过reshard转移slot,可模拟slot分布不均的场景,执行
--cluster check ip:port
确认集群中各个master节点的slot数量
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 4912 keys | 5461 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 5046 keys | 5462 slots | 1 slaves. #.此节点slot过多
10.30.3.233:7001 (c274c1ef...) -> 4896 keys | 5361 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 88 keys | 100 slots | 1 slaves. #.此节点slot过少
- 当slot数量分布不均匀时,若使用
reshard
对slot分配过于繁琐的话,建议使用--cluster rebalance ip:port
将集群中(ip:port
为集群中任意master或slave节点)各个节点的slot进行自动均衡分配
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster rebalance --cluster-threshold 1 10.30.3.231:7001
>>> Performing Cluster Check (using node 10.30.3.231:7001)
......
>>> Rebalancing across 4 nodes. Total weight = 4.00
Moving 1366 slots from 10.30.3.232:7001 to 10.30.3.234:7001
......
Moving 1365 slots from 10.30.3.231:7001 to 10.30.3.234:7001
......
Moving 1265 slots from 10.30.3.233:7001 to 10.30.3.234:7001
......
- 看下 rebalance 的效果吧
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 3666 keys | 4096 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 3781 keys | 4096 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 3716 keys | 4096 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 3779 keys | 4096 slots | 1 slaves.
删除节点
1.删除slave节点
- 使用
--cluster del-node ip:port <node-id>
可将集群中(ip:port
为集群中任意master或slave节点)任意slot为空的节点删除,slave节点无slot,可直接删除
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null | grep slave
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690447370000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690447373000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690447374360 2 connected
1d49af7153442e8ecd0138d9225ce57984ef923c 10.30.3.234:7002@17002 slave a373409411ac0e11f37bae7c2b97c78c9f6e9897 0 1690447373000 8 connected #.示例删这个
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster del-node 10.30.3.231:7001 1d49af7153442e8ecd0138d9225ce57984ef923c
>>> Removing node 1d49af7153442e8ecd0138d9225ce57984ef923c from cluster 10.30.3.231:7001
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null | grep slave
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690447554000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690447556000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690447555000 2 connected
2.删除slot非空的master节点
- 使用
--cluster del-node ip:port <node-id>
删除slot非空的master节点会报错,提示该实例不为空,示例删除10.30.3.234:7001
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690447746000 1 connected 1365-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690447750000 2 connected 6827-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690447749000 3 connected 12288-16383
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690447749000 8 connected 0-1364 5461-6826 10923-12287 #.示例删这个
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690449224999 9 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690449227006 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690449224598 2 connected
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster del-node 10.30.3.231:7001 a373409411ac0e11f37bae7c2b97c78c9f6e9897
>>> Removing node a373409411ac0e11f37bae7c2b97c78c9f6e9897 from cluster 10.30.3.231:7001
[ERR] Node 10.30.3.234:7001 is not empty! Reshard data away and try again.
- 首先,确认要删除的master节点有多少个slot,示例
10.30.3.234:7001
有 4096 slots
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 3666 keys | 4096 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 3781 keys | 4096 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 3716 keys | 4096 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 3779 keys | 4096 slots | 1 slaves.
- 然后,执行 reshard 将这个master节点的slot转移掉
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster reshard 10.30.3.231:7001
>>> Performing Cluster Check (using node 10.30.3.231:7001)
......
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096 #.要删除的节点有多少slot就填写多少个,示例 4096
What is the receiving node ID? c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 #.选择要接收slot的节点id,示例迁移给 10.30.3.233:7001
Please enter all the source node IDs.
Type 'all' to use all the nodes as source nodes for the hash slots.
Type 'done' once you entered all the source nodes IDs.
Source node #1: a373409411ac0e11f37bae7c2b97c78c9f6e9897 #.源master的id就填写要删除的节点id,示例 10.30.3.234:7001
Source node #2: done #.没有第二个源master节点,输入 done
Ready to move 100 slots.
......
Do you want to proceed with the proposed reshard plan (yes/no)? yes #.输入yes
......
- 确认集群状态,由于要删除的master节点已经没有slot,此时该节点被降为 slave 节点
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 3666 keys | 4096 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 3781 keys | 4096 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 7495 keys | 8192 slots | 2 slaves.
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690449226000 1 connected 1365-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690449223995 2 connected 6827-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690449226002 9 connected 0-1364 5461-6826 10923-16383
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690449224999 9 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690449227006 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690449224598 2 connected
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690449225000 9 connected #.被降为slave
- 再次删除该节点(示例
10.30.3.234:7001
),顺利完成
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster del-node 10.30.3.231:7001 a373409411ac0e11f37bae7c2b97c78c9f6e9897
>>> Removing node a373409411ac0e11f37bae7c2b97c78c9f6e9897 from cluster 10.30.3.231:7001
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.
3.重新加入并删除slot为空的master节点
- 再次将
10.30.3.234:7001
增加为集群的master节点,确认新加的master节点有 0 个slot
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster add-node 10.30.3.234:7001 10.30.3.231:7001
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null | grep master
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690449839000 1 connected 1365-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690449835000 2 connected 6827-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690449839971 9 connected 0-1364 5461-6826 10923-16383
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690449837000 8 connected
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 3666 keys | 4096 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 3781 keys | 4096 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 7495 keys | 8192 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 0 keys | 0 slots | 0 slaves. #.注意这里 0 keys 且 0 slots 且 0 slaves
- 使用
--cluster del-node ip:port <node-id>
删除集群中slot为空的节点 (示例10.30.3.234:7001
)
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster del-node 10.30.3.231:7001 a373409411ac0e11f37bae7c2b97c78c9f6e9897
>>> Removing node a373409411ac0e11f37bae7c2b97c78c9f6e9897 from cluster 10.30.3.231:7001
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.
- 确认集群状态:slot为空的master节点(示例
10.30.3.234:7001
)已被删除
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690450007485 2 connected 6827-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690450005000 9 connected 0-1364 5461-6826 10923-16383
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690450006483 1 connected
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690450006000 1 connected 1365-5460
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690450004479 9 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690450008488 2 connected
故障转移
手动故障转移
- 转移之前:
10.30.3.233:7002
为slave,10.30.3.232:7001
为master
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690459307000 1 connected 1365-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690459306100 2 connected 6827-10922 #.此节点为failover节点的master
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690459306000 9 connected 0-1364 5461-6826 10923-16383
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690459309108 9 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690459308106 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690459307103 2 connected #.对此slave节点执行failover
- 登录某slave节点,使用
cluster failover
将该节点从slave切换为master,示例将10.30.3.233:7002
从slave切换为master
[root@redis01 ~]# echo "cluster failover" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.233 -p 7002 2>/dev/null
OK
- 转移之后:
10.30.3.233:7002
从slave变为master,而10.30.3.232:7001
从master变为slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690463969000 1 connected 1365-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 slave 0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 0 1690463972447 10 connected #.master变slave
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690463969000 9 connected 0-1364 5461-6826 10923-16383
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690463968436 9 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690463970000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 master - 0 1690463971444 10 connected 6827-10922 #.slave变master
遇到的问题
场景1:新加master节点后接着rebalance无效果
- 症状:向集群新加一个master节点后(示例
10.30.3.234:7001
节点slots为0),接着进行rebalance
提示No rebalancing needed! All nodes are within the 1.00% threshold
,且slot为空的master节点没有自动分配slots和keys
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster add-node 10.30.3.234:7001 10.30.3.232:7001
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 4930 keys | 5461 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 4984 keys | 5461 slots | 1 slaves.
10.30.3.233:7002 (0cf02550...) -> 5028 keys | 5462 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 0 keys | 0 slots | 0 slaves. #.新加的master节点 0 keys, 0 slots
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster rebalance --cluster-threshold 1 10.30.3.231:7001
>>> Performing Cluster Check (using node 10.30.3.231:7001)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
*** No rebalancing needed! All nodes are within the 1.00% threshold.
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves | grep "10.30.3.234:7001"
10.30.3.234:7001 (a3734094...) -> 0 keys | 0 slots | 0 slaves. #.看这里,rebalance没有效果
- 解决:先利用
reshard
迁移100个slot到新加的节点,再利用rebalance
对整个集群做slot的自动平衡
[root@redis01 ~]#/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster reshard 10.30.3.231:7001
...过程省略...
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 4930 keys | 5461 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 4896 keys | 5361 slots | 1 slaves.
10.30.3.233:7002 (0cf02550...) -> 5028 keys | 5462 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 88 keys | 100 slots | 0 slaves. #.reshard后该节点 88 keys, 100 slots
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster rebalance --cluster-threshold 1 10.30.3.231:7001
>>> Performing Cluster Check (using node 10.30.3.231:7001)
......
[OK] All 16384 slots covered.
>>> Rebalancing across 4 nodes. Total weight = 4.00
Moving 1366 slots from 10.30.3.233:7002 to 10.30.3.234:7001
Moving 1365 slots from 10.30.3.231:7001 to 10.30.3.234:7001
Moving 1265 slots from 10.30.3.233:7001 to 10.30.3.234:7001
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 3708 keys | 4096 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 3716 keys | 4096 slots | 1 slaves.
10.30.3.233:7002 (0cf02550...) -> 3781 keys | 4096 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 3737 keys | 4096 slots | 0 slaves. #.rebalance后该节点 3737 keys, 4096 slots