一、说明

redis 3.0集群功能出来已经有一段时间了，目前最新稳定版是3.0.5，我了解到已经有很多互联网公司在生产环境使用，比如唯品会、美团等等，刚好公司有个新项目，预估的量单机redis无法满足，开发又不想在代码层面做拆分，所以就推荐他们尝试一下redis集群，下面做了一些相关笔记，以备后用

二、环境

1、redis节点

2、redis版本

三、安装配置

1、安装redis

2、安装ruby及ruby的redis模块

3、内核调优

4、建立目录

5、撰写redis配置文件（cp配置文件注意修改端口）

6、启动服务

7、初始化集群

节点角色由顺序决定,先master之后是slave，本文中6300是master，6301是slave

redis-trib.rb create --replicas 1 10.10.2.70:6300 10.10.2.71:6300 10.10.2.85:6300 10.10.2.70:6301 10.10.2.71:6301 10.10.2.85:6301

8、查看集群状态

PS：
redis-trib.rb是一个ruby工具，封装了redis集群的一些命令，用这个工具操作集群非常方便，比如上面初始化集群，查看集群状态，还有添加、删除节点，迁移slot等等功能

四、redis集群维护

A、场景1
线上的集群已经有瓶颈，集群需要扩容，比如我们已经准备了一主一从（10.10.2.85:6302、10.10.2.85:6303），如下：

1、添加一个主节点

10.10.2.85:6302是要加的新节点，10.10.2.70:6300是集群中已存在的任意节点

2、给主节点添加从节点
[root@yw_0_0 ~]# redis-trib.rb add-node --slave --master-id 5ef18f95f75756891aa948ea1f200044f1d3947c 10.10.2.85:6303 10.10.2.70:6300

Adding node 10.10.2.85:6303 to cluster 10.10.2.70:6300

Connecting to node 10.10.2.70:6300: OK
Connecting to node 10.10.2.85:6300: OK
Connecting to node 10.10.2.85:6302: OK
Connecting to node 10.10.2.85:6301: OK
Connecting to node 10.10.2.71:6300: OK
Connecting to node 10.10.2.70:6301: OK
Connecting to node 10.10.2.71:6301: OK

Performing Cluster Check (using node 10.10.2.70:6300)

S: cd1f2c1f348bb4359337e7462c1e21dc82f1551b 10.10.2.70:6300
slots: (0 slots) slave
replicates 85412cf3d8e69354115fc0991f470b32b9213cd7
M: 6bea6afa2ee8dfb0cc3c96f804eb3fa77ce98013 10.10.2.85:6300
slots:0-5460 (5461 slots) master
1 additional replica(s)
M: 5ef18f95f75756891aa948ea1f200044f1d3947c 10.10.2.85:6302
slots: (0 slots) master
0 additional replica(s)
S: a74642c0fbc98f921be477eabcdd22eccd89891f 10.10.2.85:6301
slots: (0 slots) slave
replicates 2568dbd91fffa16ff93ea8db19275fd7ec8af41a
M: 2568dbd91fffa16ff93ea8db19275fd7ec8af41a 10.10.2.71:6300
slots:5461-10922 (5462 slots) master
1 additional replica(s)
M: 85412cf3d8e69354115fc0991f470b32b9213cd7 10.10.2.70:6301
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: 22d2dec483824b84571a60e8c037fff957615552 10.10.2.71:6301
slots: (0 slots) slave
replicates 6bea6afa2ee8dfb0cc3c96f804eb3fa77ce98013
[OK] All nodes agree about slots configuration.

Check for open slots...
Check slots coverage...

[OK] All 16384 slots covered.
Connecting to node 10.10.2.85:6303: OK

Send CLUSTER MEET to node 10.10.2.85:6303 to make it join the cluster.

Waiting for the cluster to join.

Configure node as replica of 10.10.2.85:6302.

[OK] New node added correctly.

--slave 指定要加的是从节点，--master-id 指定这个从节点的主节点ID，10.10.2.85:6303是需要新加的从节点，10.10.2.70:6300是集群已存在的任意节点

3、迁移一些slot给新节点
[root@yw_0_0 ~]# redis-trib.rb reshard 10.10.2.70:6300
Connecting to node 10.10.2.70:6300: OK
Connecting to node 10.10.2.85:6300: OK
Connecting to node 10.10.2.85:6303: OK
Connecting to node 10.10.2.85:6302: OK
Connecting to node 10.10.2.85:6301: OK
Connecting to node 10.10.2.71:6300: OK
Connecting to node 10.10.2.70:6301: OK
Connecting to node 10.10.2.71:6301: OK

Performing Cluster Check (using node 10.10.2.70:6300)

S: cd1f2c1f348bb4359337e7462c1e21dc82f1551b 10.10.2.70:6300
slots: (0 slots) slave
replicates 85412cf3d8e69354115fc0991f470b32b9213cd7
M: 6bea6afa2ee8dfb0cc3c96f804eb3fa77ce98013 10.10.2.85:6300
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: fc90d090fae909fd4f962752941c039d081d3854 10.10.2.85:6303
slots: (0 slots) slave
replicates 5ef18f95f75756891aa948ea1f200044f1d3947c
M: 5ef18f95f75756891aa948ea1f200044f1d3947c 10.10.2.85:6302
slots: (0 slots) master
1 additional replica(s)
S: a74642c0fbc98f921be477eabcdd22eccd89891f 10.10.2.85:6301
slots: (0 slots) slave
replicates 2568dbd91fffa16ff93ea8db19275fd7ec8af41a
M: 2568dbd91fffa16ff93ea8db19275fd7ec8af41a 10.10.2.71:6300
slots:5461-10922 (5462 slots) master
1 additional replica(s)
M: 85412cf3d8e69354115fc0991f470b32b9213cd7 10.10.2.70:6301
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: 22d2dec483824b84571a60e8c037fff957615552 10.10.2.71:6301
slots: (0 slots) slave
replicates 6bea6afa2ee8dfb0cc3c96f804eb3fa77ce98013
[OK] All nodes agree about slots configuration.

Check for open slots...
Check slots coverage...

[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 3000 #设置需要把3000个slot做移动
What is the receiving node ID? 5ef18f95f75756891aa948ea1f200044f1d3947c #设置接收这3000个slot的节点ID，也就是刚才新加的10.10.2.85:6302的ID
Please enter all the source node IDs.
Type 'all' to use all the nodes as source nodes for the hash slots.
Type 'done' once you entered all the source nodes IDs.
Source node #1:85412cf3d8e69354115fc0991f470b32b9213cd7 #设置这3000slot的来源ID，这里我从集群之前的3个节点分别去取一部分slot
Source node #2:6bea6afa2ee8dfb0cc3c96f804eb3fa77ce98013 #设置这3000slot的来源ID，这里我从集群之前的3个节点分别去取一部分slot
Source node #3:2568dbd91fffa16ff93ea8db19275fd7ec8af41a #设置这3000slot的来源ID，这里我从集群之前的3个节点分别去取一部分slot
Source node #4:done #输入done开始做一些初始化操作
此处省略
Do you want to proceed with the proposed reshard plan (yes)? yes 输入yes确认开始迁移slot

B、场景二

上面的例子是集群扩容，相对的，由于各种原因集群可能也需要缩容，下面的例子把上文扩容的节点下线，步骤如下：

1、迁移这个节点的slot到其他节点（有slot的节点是不可以直接下线的）
[root@yw_0_0 ~]# redis-trib.rb reshard 10.10.2.70:6300
Connecting to node 10.10.2.70:6300: OK
Connecting to node 10.10.2.85:6300: OK
Connecting to node 10.10.2.85:6303: OK
Connecting to node 10.10.2.85:6302: OK
Connecting to node 10.10.2.85:6301: OK
Connecting to node 10.10.2.71:6300: OK
Connecting to node 10.10.2.70:6301: OK
Connecting to node 10.10.2.71:6301: OK

Performing Cluster Check (using node 10.10.2.70:6300)

S: cd1f2c1f348bb4359337e7462c1e21dc82f1551b 10.10.2.70:6300
slots: (0 slots) slave
replicates 85412cf3d8e69354115fc0991f470b32b9213cd7
M: 6bea6afa2ee8dfb0cc3c96f804eb3fa77ce98013 10.10.2.85:6300
slots:999-5460 (4462 slots) master
1 additional replica(s)
S: fc90d090fae909fd4f962752941c039d081d3854 10.10.2.85:6303
slots: (0 slots) slave
replicates 5ef18f95f75756891aa948ea1f200044f1d3947c
M: 5ef18f95f75756891aa948ea1f200044f1d3947c 10.10.2.85:6302
slots:0-998,5461-6461,10923-11921 (2999 slots) master
1 additional replica(s)
S: a74642c0fbc98f921be477eabcdd22eccd89891f 10.10.2.85:6301
slots: (0 slots) slave
replicates 2568dbd91fffa16ff93ea8db19275fd7ec8af41a
M: 2568dbd91fffa16ff93ea8db19275fd7ec8af41a 10.10.2.71:6300
slots:6462-10922 (4461 slots) master
1 additional replica(s)
M: 85412cf3d8e69354115fc0991f470b32b9213cd7 10.10.2.70:6301
slots:11922-16383 (4462 slots) master
1 additional replica(s)
S: 22d2dec483824b84571a60e8c037fff957615552 10.10.2.71:6301
slots: (0 slots) slave
replicates 6bea6afa2ee8dfb0cc3c96f804eb3fa77ce98013
[OK] All nodes agree about slots configuration.

Check for open slots...
Check slots coverage...

[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 3000 #上文给这个节点迁入了3000个slot，所以这里还选择迁出3000个slot
What is the receiving node ID? 85412cf3d8e69354115fc0991f470b32b9213cd7 #接收这3000slot节点的主ID
Please enter all the source node IDs.
Type 'all' to use all the nodes as source nodes for the hash slots.
Type 'done' once you entered all the source nodes IDs.
Source node #1:5ef18f95f75756891aa948ea1f200044f1d3947c #要下线节点的主ID
Source node #4:done
此处省略
Do you want to proceed with the proposed reshard plan (yes)?yes

2、然后查看10.10.2.85:6302这个maser上已经没有slot了
10.10.2.71:6300> cluster nodes
85412cf3d8e69354115fc0991f470b32b9213cd7 10.10.2.70:6301 master - 0 1445853133399 12 connected 0-999 6462-7460 10923-16383
22d2dec483824b84571a60e8c037fff957615552 10.10.2.71:6301 slave 6bea6afa2ee8dfb0cc3c96f804eb3fa77ce98013 0 1445853132898 10 connected
6bea6afa2ee8dfb0cc3c96f804eb3fa77ce98013 10.10.2.85:6300 master - 0 1445853134400 10 connected 1000-5461
2568dbd91fffa16ff93ea8db19275fd7ec8af41a 10.10.2.71:6300 myself,master - 0 0 11 connected 5462-6461 7461-10922
cd1f2c1f348bb4359337e7462c1e21dc82f1551b 10.10.2.70:6300 slave 85412cf3d8e69354115fc0991f470b32b9213cd7 0 1445853131395 12 connected
fc90d090fae909fd4f962752941c039d081d3854 10.10.2.85:6303 slave 5ef18f95f75756891aa948ea1f200044f1d3947c 0 1445853133899 8 connected
a74642c0fbc98f921be477eabcdd22eccd89891f 10.10.2.85:6301 slave 2568dbd91fffa16ff93ea8db19275fd7ec8af41a 0 1445853129394 11 connected
5ef18f95f75756891aa948ea1f200044f1d3947c 10.10.2.85:6302 master - 0 1445853132397 8 connected

3、下线slave节点
[root@yw_0_0 ~]# redis-trib.rb del-node 10.10.2.85:6303 fc90d090fae909fd4f962752941c039d081d3854

Removing node fc90d090fae909fd4f962752941c039d081d3854 from cluster 10.10.2.85:6303

Connecting to node 10.10.2.85:6303: OK
Connecting to node 10.10.2.85:6301: OK
Connecting to node 10.10.2.85:6302: OK
Connecting to node 10.10.2.85:6300: OK
Connecting to node 10.10.2.70:6300: OK
Connecting to node 10.10.2.71:6301: OK
Connecting to node 10.10.2.70:6301: OK
Connecting to node 10.10.2.71:6300: OK

Sending CLUSTER FORGET messages to the cluster...
SHUTDOWN the node.

4、下线master节点

C、场景三
集群中一个节点的master挂掉，从节点提升为主节点，还没有来的急给这个新的主节点加从节点，这个新的主节点就又挂掉了，那么集群中这个节点就彻底不可以用了，为了解决这个问题，我们至少保证每个节点的maser下面有两个以上的从节点，这样一来，需要的内存资源或者服务器资源就翻倍了，有没有一个折中的方法呢，答案是肯定的，还节点上文配置文件中的cluster-migration-barrier参数不，我们只需要给集群中其中一个节点的master挂多个从库，当其他节点的master下没有可用的从库时，有多个从库的master会割让一个slave给他，保证整个集群的可用性

1、给10.10.2.70:6300 10.10.2.70:6301 这组节点下面加一个从库10.10.2.85:6302
[root@yw_0_0 ~]# redis-trib.rb add-node --slave --master-id cd1f2c1f348bb4359337e7462c1e21dc82f1551b 10.10.2.85:6302 10.10.2.70:6300

Adding node 10.10.2.85:6302 to cluster 10.10.2.70:6300

Connecting to node 10.10.2.70:6300: OK
Connecting to node 10.10.2.85:6300: OK
Connecting to node 10.10.2.71:6300: OK
Connecting to node 10.10.2.70:6301: OK
Connecting to node 10.10.2.85:6301: OK
Connecting to node 10.10.2.71:6301: OK

Performing Cluster Check (using node 10.10.2.70:6300)

M: cd1f2c1f348bb4359337e7462c1e21dc82f1551b 10.10.2.70:6300
slots:3000-5461,6462-7460,10923-16383 (8922 slots) master
1 additional replica(s)
M: e36cdef7a26ed59e8d9db2cf1dbc1997bfc9dfde 10.10.2.85:6300
slots:0-2999 (3000 slots) master
1 additional replica(s)
M: 2568dbd91fffa16ff93ea8db19275fd7ec8af41a 10.10.2.71:6300
slots:5462-6461,7461-10922 (4462 slots) master
1 additional replica(s)
S: 85412cf3d8e69354115fc0991f470b32b9213cd7 10.10.2.70:6301
slots: (0 slots) slave
replicates cd1f2c1f348bb4359337e7462c1e21dc82f1551b
S: 89fcc4994a99ed2fe9bbb908c58dfda2cf31e7d2 10.10.2.85:6301
slots: (0 slots) slave
replicates e36cdef7a26ed59e8d9db2cf1dbc1997bfc9dfde
S: 1f3ea36eacbe005a4b9ac52aeef6d83337dac051 10.10.2.71:6301
slots: (0 slots) slave
replicates 2568dbd91fffa16ff93ea8db19275fd7ec8af41a
[OK] All nodes agree about slots configuration.

Check for open slots...
Check slots coverage...

[OK] All 16384 slots covered.
Connecting to node 10.10.2.85:6302: OK

Send CLUSTER MEET to node 10.10.2.85:6302 to make it join the cluster.

Waiting for the cluster to join.

Configure node as replica of 10.10.2.70:6300.

[OK] New node added correctly.

2、把10.10.2.71:6300 10.10.2.71:6301这组的从节点停掉
redis-cli -h 10.10.2.71 -p 6301 shutdown

3、查看10.10.2.85:6302这个节点是否成为10.10.2.71：6300的从库
10.10.2.71:6300> CLUSTER nodes
85412cf3d8e69354115fc0991f470b32b9213cd7 10.10.2.70:6301 slave cd1f2c1f348bb4359337e7462c1e21dc82f1551b 0 1445911596844 17 connected
89fcc4994a99ed2fe9bbb908c58dfda2cf31e7d2 10.10.2.85:6301 slave e36cdef7a26ed59e8d9db2cf1dbc1997bfc9dfde 0 1445911594841 20 connected
2568dbd91fffa16ff93ea8db19275fd7ec8af41a 10.10.2.71:6300 myself,master - 0 0 11 connected 5462-6461 7461-10922
cd1f2c1f348bb4359337e7462c1e21dc82f1551b 10.10.2.70:6300 master - 0 1445911593839 17 connected 3000-5461 6462-7460 10923-16383
2b34532cd6937063d1da26cd4652881b73d97a06 10.10.2.85:6302 slave 2568dbd91fffa16ff93ea8db19275fd7ec8af41a 0 1445911592838 17 connected #已成功挂到了10.10.2.71:6300下
1f3ea36eacbe005a4b9ac52aeef6d83337dac051 10.10.2.71:6301 slave,fail 2568dbd91fffa16ff93ea8db19275fd7ec8af41a 1445911561982 1445911559778 11 disconnected
e36cdef7a26ed59e8d9db2cf1dbc1997bfc9dfde 10.10.2.85:6300 master - 0 1445911595843 20 connected 0-2999

五、cluster相关命令

集群
CLUSTER INFO 打印集群的信息
CLUSTER NODES 列出集群当前已知的所有节点（node），以及这些节点的相关信息。
节点
CLUSTER MEET 将 ip 和 port 所指定的节点添加到集群当中，让它成为集群的一份子。
CLUSTER FORGET 从集群中移除 node_id 指定的节点。
CLUSTER REPLICATE 将当前节点设置为 node_id 指定的节点的从节点。
CLUSTER SAVECONFIG 将节点的配置文件保存到硬盘里面。
槽(slot)
CLUSTER ADDSLOTS [slot ...] 将一个或多个槽（slot）指派（assign）给当前节点。
CLUSTER DELSLOTS [slot ...] 移除一个或多个槽对当前节点的指派。
CLUSTER FLUSHSLOTS 移除指派给当前节点的所有槽，让当前节点变成一个没有指派任何槽的节点。
CLUSTER SETSLOT NODE 将槽 slot 指派给 node_id 指定的节点，如果槽已经指派给另一个节点，那么先让另一个节点删除该槽>，然后再进行指派。
CLUSTER SETSLOT MIGRATING 将本节点的槽 slot 迁移到 node_id 指定的节点中。
CLUSTER SETSLOT IMPORTING 从 node_id 指定的节点中导入槽 slot 到本节点。
CLUSTER SETSLOT STABLE 取消对槽 slot 的导入（import）或者迁移（migrate）。
键
CLUSTER KEYSLOT 计算键 key 应该被放置在哪个槽上。
CLUSTER COUNTKEYSINSLOT 返回槽 slot 目前包含的键值对数量。
CLUSTER GETKEYSINSLOT 返回 count 个 slot 槽中的键。

参考文献

[2]H. Berenson, P. Bernstein, J. Gray, J.Melton, E. O’Neil,and P. O’Neil. A critique of ANSI SQL isolation levels. InProceedings of the SIGMOD International Conference on Management of Data, pages1–10, May 1995.

[3]Michael J. Cahill, Uwe Röhm, and Alan D.Fekete. 2008. Serializable isolation for snapshot databases. In SIGMOD ’08:Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 729–738, New York, NY, USA. ACM.

[4]Michael James Cahill. 2009. Serializable Isolation for Snapshot Databases. Sydney Digital Theses. University of Sydney, School of Information Technologies

[5] A. Fekete, D. Liarokapis, E. O’Neil, P.O’Neil, andD. Shasha. Making snapshot isolation serializable. www.codexueyuan.com In ACM transactions on database systems, volume 39(2), pages 492–528, June 2005.

redis 集群指导

节点角色由顺序决定,先master之后是slave，本文中6300是master，6301是slave

热门文章

最新文章

相关课程

相关电子书

相关实验场景