redis分片集群管理

[TOC]

增加节点

  • 基线环境:以下测试基于目前的redis集群:
分组 master节点 slave节点
group_0 10.30.3.231:7001 10.30.3.232:7002
group_1 10.30.3.232:7001 10.30.3.233:7002
group_2 10.30.3.233:7001 10.30.3.231:7002

1.添加为主节点

  • 添加之前:3个master、3个slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null

6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690427612000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690427610000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690427613195 3 connected 10923-16383
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690427611187 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690427610000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690427612191 2 connected
  • 使用 --cluster add-node ip1:port1 ip2:port2ip1:port1 作为master节点添加到现有集群中ip2:port2为集群中任意master或slave节点),示例将 10.30.3.234:7001 增加为集群的master节点
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster add-node 10.30.3.234:7001 10.30.3.231:7001

>>> Adding node 10.30.3.234:7001 to cluster 10.30.3.231:7001
>>> Performing Cluster Check (using node 10.30.3.231:7001)
......
>>> Getting functions from cluster
>>> Send FUNCTION LIST to 10.30.3.234:7001 to verify there is no functions in it
>>> Send FUNCTION RESTORE to 10.30.3.234:7001
>>> Send CLUSTER MEET to node 10.30.3.234:7001 to make it join the cluster.
[OK] New node added correctly.
  • 添加之后:4个master、3个slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null

6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690427612000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690427610000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690427613195 3 connected 10923-16383
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690427611187 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690427610000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690427612191 2 connected
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690427611000 0 connected        #.新增的master节点

2.添加为随机的从节点

  • 添加之前:4个master、3个slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null

6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690442195000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690442199000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690442198769 3 connected 10923-16383
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690442200776 0 connected               #.这个master节点没有slave
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690442098000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690442098000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690442098445 2 connected
  • 使用 --cluster add-node ip1:port1 ip2:port2 --cluster-slaveip1:port1 作为slave节点随机添加到现有集群中ip2:port2为集群中任意master或slave节点),默认会将其分配给slave最少的master节点(示例10.30.3.234:7001没有slave节点)做从库
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster add-node 10.30.3.234:7002 10.30.3.231:7001 --cluster-slave

>>> Adding node 10.30.3.234:7002 to cluster 10.30.3.231:7001
>>> Performing Cluster Check (using node 10.30.3.231:7001)
......
>>> Configure node as replica of 10.30.3.234:7001.
[OK] New node added correctly.
  • 添加之后:4个master、4个slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null

6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690442195000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690442199000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690442198769 3 connected 10923-16383
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690442200776 0 connected                   #.作为slave分配给此master节点了
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690442098000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690442098000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690442098445 2 connected
1d49af7153442e8ecd0138d9225ce57984ef923c 10.30.3.234:7002@17002 slave a373409411ac0e11f37bae7c2b97c78c9f6e9897 0 1690427775000 0 connected    #.新增的slave节点

3.删除新加的随机从节点

  • 添加之前:4个master、4个slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null

6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690442195000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690442199000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690442198769 3 connected 10923-16383
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690442200776 0 connected
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690442098000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690442098000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690442098445 2 connected
1d49af7153442e8ecd0138d9225ce57984ef923c 10.30.3.234:7002@17002 slave a373409411ac0e11f37bae7c2b97c78c9f6e9897 0 1690427775000 0 connected    #.要删除的slave节点
  • 使用 --cluster del-node ip:port <node-id>slot为空的节点 <node-id> 从集群(ip:port为集群中任意master或slave节点)中删除,示例删除 10.30.3.234:7002 这个新加的slave节点,因为该实例上没有slot,是可以直接移除的
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster del-node 10.30.3.231:7001 1d49af7153442e8ecd0138d9225ce57984ef923c

>>> Removing node 1d49af7153442e8ecd0138d9225ce57984ef923c from cluster 10.30.3.231:7001
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.
  • 添加之后:4个master、3个slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null

6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690442195000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690442199000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690442198769 3 connected 10923-16383
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690442200776 0 connected
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690442098000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690442098000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690442098445 2 connected

4.重新添加为指定master节点的从节点

  • 添加之前:4个master、3个slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null

6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690442195000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690442199000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690442198769 3 connected 10923-16383       #.将新节点指定为此节点的slave
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690442200776 0 connected                   #.此master节点没有slave
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690442098000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690442098000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690442098445 2 connected
  • 先找到指定master节点的id,比如 10.30.3.233:7001
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null | grep "10.30.3.233:7001"

c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690443114539 3 connected 10923-16383
  • 使用 --cluster add-node ip1:port1 ip2:port2 --cluster-slave --cluster-master-id <node-id> ip1:port1 作为<node-id>的slave节点添加到现有集群中ip2:port2为集群中任意master或slave节点),示例将 10.30.3.234:7002 作为 10.30.3.233:7001 的从库添加到集群中
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster add-node 10.30.3.234:7002 10.30.3.231:7001 --cluster-slave --cluster-master-id c274c1ef585ad0f9801e701d0b7fdc66956ae4c8

>>> Adding node 10.30.3.234:7002 to cluster 10.30.3.231:7001
>>> Performing Cluster Check (using node 10.30.3.231:7001)
......
>>> Configure node as replica of 10.30.3.233:7001.
[OK] New node added correctly.
  • 添加之后:4个master、4个slave,确认 10.30.3.233:7001 有2个slave,而 10.30.3.234:7001 没有slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690443420000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690443424000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690443424474 3 connected 10923-16383    #.这个master有2个slave
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690443422000 0 connected
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690443422000 3 connected      #.第1个slave
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690443423471 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690443425476 2 connected
1d49af7153442e8ecd0138d9225ce57984ef923c 10.30.3.234:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690443423000 3 connected      #.第2个slave

[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 2812 keys | 5461 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 2924 keys | 5462 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 2869 keys | 5461 slots | 2 slaves.        #.这个master有2个slave
10.30.3.234:7001 (a3734094...) -> 0 keys | 0 slots | 0 slaves.              #.这个master没有slave

slot 数量平衡

  • 执行如下语句,批量插入 15000 条随机数据
for n in $(seq -w 1 15000); do echo set hello_$RANDOM $RANDOM; done | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -p 7001 > /dev/null

/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep keys | grep slaves

reshard 手动转移 slot

  • 转移slot之前:使用 --cluster check ip:port 确认集群中(ip:port为集群中任意master或slave节点)各master节点的keys及slot数量,示例10.30.3.234:7001keys为0、slots为0、slaves为0
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 4912 keys | 5461 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 5046 keys | 5462 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 4984 keys | 5461 slots | 2 slaves.
10.30.3.234:7001 (a3734094...) -> 0 keys | 0 slots | 0 slaves.              #.注意这里 0 keys 且 0 slots 且 0 slaves

[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690443420000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690443424000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690443424474 3 connected 10923-16383        #.示例将此节点转移100个给新master节点
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690443422000 0 connected                    #.注意这里没有分配slot范围
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690443422000 3 connected      #.10.30.3.233:7001的第1个slave
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690443423471 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690443425476 2 connected
1d49af7153442e8ecd0138d9225ce57984ef923c 10.30.3.234:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690443423000 3 connected      #.10.30.3.233:7001的第2个slave
  • 利用 --cluster reshard ip:port 在集群中(ip:port为集群中任意master或slave节点)将源节点转移一定数量的slot给目标节点,示例将10.30.3.233:7001转移100个slot给新加入的master节点(即10.30.3.234:7001
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster reshard 10.30.3.231:7001

>>> Performing Cluster Check (using node 10.30.3.231:7001)
......
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 100                   #.选择要转移多少个slot,示例 100
What is the receiving node ID? a373409411ac0e11f37bae7c2b97c78c9f6e9897     #.选择要接收slot的节点id,即新的master节点id
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1: c274c1ef585ad0f9801e701d0b7fdc66956ae4c8                    #.输入第一个源master的id,若要从所有master节点转移则输入 all
Source node #2: done                                                        #.输入第二个源master的id,若只有1个则输入 done

Ready to move 100 slots.
......
Do you want to proceed with the proposed reshard plan (yes/no)? yes         #.输入yes
......
  • 转移slot之后:源master节点的100个slot被转移到了目标master节点上,同时源master节点上的keys也一并转移到了目标master节点上,更神奇的是,slave节点也发生了转移(示例10.30.3.234:700210.30.3.233:7001的从库变成10.30.3.234:7001的从库)
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 4912 keys | 5461 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 5046 keys | 5462 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 4896 keys | 5361 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 88 keys | 100 slots | 1 slaves.               #.注意这里 88 keys 且 100 slots 且 1 slaves

[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690445122000 1 connected 0-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690445123000 2 connected 5461-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690445124000 3 connected 11023-16383        #.源master少了100个slot
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690445125521 8 connected 10923-11022        #.新master多了100个slot
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690445125000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690445123000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690445126525 2 connected
1d49af7153442e8ecd0138d9225ce57984ef923c 10.30.3.234:7002@17002 slave a373409411ac0e11f37bae7c2b97c78c9f6e9897 0 1690445124000 8 connected      #.从10.30.3.233:7001的从库变成10.30.3.234:7001的从库

rebalance 自动均衡分配 slot

  • 通过reshard转移slot,可模拟slot分布不均的场景,执行 --cluster check ip:port 确认集群中各个master节点的slot数量
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 4912 keys | 5461 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 5046 keys | 5462 slots | 1 slaves.        #.此节点slot过多
10.30.3.233:7001 (c274c1ef...) -> 4896 keys | 5361 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 88 keys | 100 slots | 1 slaves.           #.此节点slot过少
  • 当slot数量分布不均匀时,若使用 reshard 对slot分配过于繁琐的话,建议使用 --cluster rebalance ip:port 将集群中(ip:port为集群中任意master或slave节点)各个节点的slot进行自动均衡分配
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster rebalance --cluster-threshold 1 10.30.3.231:7001

>>> Performing Cluster Check (using node 10.30.3.231:7001)
......
>>> Rebalancing across 4 nodes. Total weight = 4.00
Moving 1366 slots from 10.30.3.232:7001 to 10.30.3.234:7001
......
Moving 1365 slots from 10.30.3.231:7001 to 10.30.3.234:7001
......
Moving 1265 slots from 10.30.3.233:7001 to 10.30.3.234:7001
......
  • 看下 rebalance 的效果吧
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 3666 keys | 4096 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 3781 keys | 4096 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 3716 keys | 4096 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 3779 keys | 4096 slots | 1 slaves.

删除节点

1.删除slave节点

  • 使用 --cluster del-node ip:port <node-id> 可将集群中(ip:port为集群中任意master或slave节点)任意slot为空的节点删除,slave节点无slot,可直接删除
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null | grep slave
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690447370000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690447373000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690447374360 2 connected
1d49af7153442e8ecd0138d9225ce57984ef923c 10.30.3.234:7002@17002 slave a373409411ac0e11f37bae7c2b97c78c9f6e9897 0 1690447373000 8 connected      #.示例删这个

[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster del-node 10.30.3.231:7001 1d49af7153442e8ecd0138d9225ce57984ef923c
>>> Removing node 1d49af7153442e8ecd0138d9225ce57984ef923c from cluster 10.30.3.231:7001
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.

[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null | grep slave
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690447554000 3 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690447556000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690447555000 2 connected

2.删除slot非空的master节点

  • 使用 --cluster del-node ip:port <node-id> 删除slot非空的master节点会报错,提示该实例不为空,示例删除 10.30.3.234:7001
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690447746000 1 connected 1365-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690447750000 2 connected 6827-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690447749000 3 connected 12288-16383
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690447749000 8 connected 0-1364 5461-6826 10923-12287      #.示例删这个
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690449224999 9 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690449227006 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690449224598 2 connected

[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster del-node 10.30.3.231:7001 a373409411ac0e11f37bae7c2b97c78c9f6e9897
>>> Removing node a373409411ac0e11f37bae7c2b97c78c9f6e9897 from cluster 10.30.3.231:7001
[ERR] Node 10.30.3.234:7001 is not empty! Reshard data away and try again.
  • 首先,确认要删除的master节点有多少个slot,示例 10.30.3.234:7001 有 4096 slots
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves

10.30.3.231:7001 (6c15dd7b...) -> 3666 keys | 4096 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 3781 keys | 4096 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 3716 keys | 4096 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 3779 keys | 4096 slots | 1 slaves.
  • 然后,执行 reshard 将这个master节点的slot转移掉
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster reshard 10.30.3.231:7001

>>> Performing Cluster Check (using node 10.30.3.231:7001)
......
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096                  #.要删除的节点有多少slot就填写多少个,示例 4096
What is the receiving node ID? c274c1ef585ad0f9801e701d0b7fdc66956ae4c8     #.选择要接收slot的节点id,示例迁移给 10.30.3.233:7001
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1: a373409411ac0e11f37bae7c2b97c78c9f6e9897                    #.源master的id就填写要删除的节点id,示例 10.30.3.234:7001
Source node #2: done                                                        #.没有第二个源master节点,输入 done

Ready to move 100 slots.
......
Do you want to proceed with the proposed reshard plan (yes/no)? yes         #.输入yes
......
  • 确认集群状态,由于要删除的master节点已经没有slot,此时该节点被降为 slave 节点
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
    10.30.3.231:7001 (6c15dd7b...) -> 3666 keys | 4096 slots | 1 slaves.
    10.30.3.232:7001 (eef25676...) -> 3781 keys | 4096 slots | 1 slaves.
    10.30.3.233:7001 (c274c1ef...) -> 7495 keys | 8192 slots | 2 slaves.

[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690449226000 1 connected 1365-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690449223995 2 connected 6827-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690449226002 9 connected 0-1364 5461-6826 10923-16383
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690449224999 9 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690449227006 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690449224598 2 connected
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690449225000 9 connected    #.被降为slave
  • 再次删除该节点(示例10.30.3.234:7001),顺利完成
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster del-node 10.30.3.231:7001 a373409411ac0e11f37bae7c2b97c78c9f6e9897

>>> Removing node a373409411ac0e11f37bae7c2b97c78c9f6e9897 from cluster 10.30.3.231:7001
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.

3.重新加入并删除slot为空的master节点

  • 再次将 10.30.3.234:7001 增加为集群的master节点,确认新加的master节点有 0 个slot
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster add-node 10.30.3.234:7001 10.30.3.231:7001

[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null | grep master
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690449839000 1 connected 1365-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690449835000 2 connected 6827-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690449839971 9 connected 0-1364 5461-6826 10923-16383
a373409411ac0e11f37bae7c2b97c78c9f6e9897 10.30.3.234:7001@17001 master - 0 1690449837000 8 connected

[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 3666 keys | 4096 slots | 1 slaves.
10.30.3.232:7001 (eef25676...) -> 3781 keys | 4096 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 7495 keys | 8192 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 0 keys | 0 slots | 0 slaves.          #.注意这里 0 keys 且 0 slots 且 0 slaves
  • 使用 --cluster del-node ip:port <node-id> 删除集群中slot为空的节点(示例10.30.3.234:7001
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster del-node 10.30.3.231:7001 a373409411ac0e11f37bae7c2b97c78c9f6e9897

>>> Removing node a373409411ac0e11f37bae7c2b97c78c9f6e9897 from cluster 10.30.3.231:7001
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.
  • 确认集群状态:slot为空的master节点(示例10.30.3.234:7001)已被删除
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null

eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690450007485 2 connected 6827-10922
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690450005000 9 connected 0-1364 5461-6826 10923-16383
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690450006483 1 connected
6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690450006000 1 connected 1365-5460
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690450004479 9 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690450008488 2 connected

故障转移

手动故障转移

  • 转移之前:10.30.3.233:7002 为slave,10.30.3.232:7001 为master
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null

6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690459307000 1 connected 1365-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 master - 0 1690459306100 2 connected 6827-10922        #.此节点为failover节点的master
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690459306000 9 connected 0-1364 5461-6826 10923-16383
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690459309108 9 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690459308106 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 slave eef25676cf77b01eb01e45a67139c25317b0443a 0 1690459307103 2 connected        #.对此slave节点执行failover
  • 登录某slave节点,使用 cluster failover 将该节点从slave切换为master,示例将 10.30.3.233:7002 从slave切换为master
[root@redis01 ~]# echo "cluster failover" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.233 -p 7002 2>/dev/null
OK
  • 转移之后:10.30.3.233:7002 从slave变为master,而10.30.3.232:7001 从master变为slave
[root@redis01 ~]# echo "cluster nodes" | /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I -c -h 10.30.3.231 -p 7001 2>/dev/null

6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 10.30.3.231:7001@17001 myself,master - 0 1690463969000 1 connected 1365-5460
eef25676cf77b01eb01e45a67139c25317b0443a 10.30.3.232:7001@17001 slave 0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 0 1690463972447 10 connected   #.master变slave
c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 10.30.3.233:7001@17001 master - 0 1690463969000 9 connected 0-1364 5461-6826 10923-16383
a88289bd260ba913ef53a9cccf0f91d1f73f2083 10.30.3.231:7002@17002 slave c274c1ef585ad0f9801e701d0b7fdc66956ae4c8 0 1690463968436 9 connected
b013107a1e70fbb2dd50a25f97afe2421e066a80 10.30.3.232:7002@17002 slave 6c15dd7b8cf928b36671a10ae54bc50bd5ed01b0 0 1690463970000 1 connected
0cf02550cfd8093ce6569c1ec6558311ec2b7ae3 10.30.3.233:7002@17002 master - 0 1690463971444 10 connected 6827-10922   #.slave变master

遇到的问题

场景1:新加master节点后接着rebalance无效果

  • 症状:向集群新加一个master节点后(示例10.30.3.234:7001节点slots为0),接着进行 rebalance 提示 No rebalancing needed! All nodes are within the 1.00% threshold,且slot为空的master节点没有自动分配slots和keys
[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster add-node 10.30.3.234:7001 10.30.3.232:7001

[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 4930 keys | 5461 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 4984 keys | 5461 slots | 1 slaves.
10.30.3.233:7002 (0cf02550...) -> 5028 keys | 5462 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 0 keys | 0 slots | 0 slaves.      #.新加的master节点 0 keys, 0 slots

[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster rebalance --cluster-threshold 1 10.30.3.231:7001
>>> Performing Cluster Check (using node 10.30.3.231:7001)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
*** No rebalancing needed! All nodes are within the 1.00% threshold.

[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves | grep "10.30.3.234:7001"
10.30.3.234:7001 (a3734094...) -> 0 keys | 0 slots | 0 slaves.      #.看这里,rebalance没有效果
  • 解决先利用 reshard 迁移100个slot到新加的节点,再利用 rebalance 对整个集群做slot的自动平衡
[root@redis01 ~]#/usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster reshard 10.30.3.231:7001
...过程省略...

[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 4930 keys | 5461 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 4896 keys | 5361 slots | 1 slaves.
10.30.3.233:7002 (0cf02550...) -> 5028 keys | 5462 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 88 keys | 100 slots | 0 slaves.       #.reshard后该节点 88 keys, 100 slots

[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster rebalance --cluster-threshold 1 10.30.3.231:7001
>>> Performing Cluster Check (using node 10.30.3.231:7001)
......
[OK] All 16384 slots covered.
>>> Rebalancing across 4 nodes. Total weight = 4.00
Moving 1366 slots from 10.30.3.233:7002 to 10.30.3.234:7001
Moving 1365 slots from 10.30.3.231:7001 to 10.30.3.234:7001
Moving 1265 slots from 10.30.3.233:7001 to 10.30.3.234:7001

[root@redis01 ~]# /usr/local/bin/redis-cli -a RbY9k2_NBf1QWy8I --cluster check 10.30.3.231:7001 2>/dev/null | grep slaves
10.30.3.231:7001 (6c15dd7b...) -> 3708 keys | 4096 slots | 1 slaves.
10.30.3.233:7001 (c274c1ef...) -> 3716 keys | 4096 slots | 1 slaves.
10.30.3.233:7002 (0cf02550...) -> 3781 keys | 4096 slots | 1 slaves.
10.30.3.234:7001 (a3734094...) -> 3737 keys | 4096 slots | 0 slaves.    #.rebalance后该节点 3737 keys, 4096 slots
Copyright © www.sqlfans.cn 2024 All Right Reserved更新时间: 2023-07-28 09:13:26

results matching ""

    No results matching ""