V 5 RHCS-阿里云开发者社区

一、相关概念：

RHCS（redhat cluster suite，redhat集群套件，根本上主要是高可用集群，但内部包含负载均衡集群，如LVS的解决方案，并为LVS前端提供基于GUI的配置工具（piranha），这个工具可让LVS工作在高可用环境中；RHCS包括有：LVS、HA、CFS（cluster file system，如gfs2、OCFS2）、cLVM（cluster logical volume manager）

高可用集群工作特性：

共多少个node，node间通信状况（传递心跳信息）；成员关系（完成成员关系配置）；当前是否具备法宝票数（检查法定票数）；以上这些依赖集群基础架构层（cluster infrastructure，或message layer）

CRM（cluster resource manger，资源配置（添加、删除、迁移等），并通过RA（有LSB,legacy,OCF,stonith）完成启停服务；heartbeatV1（haresources）、heartbeatV2（haresources,crm）、heartbeatV3（pacemaker）、RHCS（rgmanager））

注：OCF（open cluster framework）

注：pacemaker（rhel6.4提供了pcs（集群资源管理工具，支持全CLI和GUI界面，非常强大）

，CLI有crm和pcs，GUI有pygui、hawk、lcmc、pcs）

注：rgmanager（resource group manager，资源组管理器）；RHCS实现高可用的层次、组件等这些概念与corosync、heartbeat基本一样

早期redhat4中RHCS的基础架构层叫cman（cluster manger）

RHCS的RA（resource agent，资源代理）：internal；script（类似LSB）

redhat4：cman是独立的组件，它是完整意义的基础架构层；rgmanager是完整意义的CRM，有RA的机制

redhat5：cman借鉴了openais的功能（cman借助openais完成心跳信息传递、成员关系管理、cman作为插件仅保留了投票系统），cman已不是完整意义的基础架构层，而是作为openais的插件存在（将cman作为openais的模块使用，仅使用cman的投票系统quorum），本质上cman的工作机制已是openais的功能了，但主导者（核心控制者，控制着整个openais的工作机制）依然是cman（/etc/cluster/cluster.conf），资源管理器仍然是rgmanager

注：cman的配置文件/etc/cluster/cluster.conf；corosync的配置文件/etc/corosync/corosync.conf；openais的配置文件/etc/openais/openais.conf

redhat6：cman插件依然存在，但使用的是corosync作为基础架构层，corosync本身自带投票系统，但投票系统没cman优秀；如果使用cman，同redhat5一样，主导者仍是cman，配置文件/etc/cluster/cluster.conf，资源管理器rgmanager；若不用cman，则使用的是corosync配置文件是/etc/corosync/corosync.conf，资源管理器pacemaker

corosync2.3.0版本号needle，其中votequorum这一专门子系统实现了投票功能，在cman的基础上发展起来，比cman优秀，自此cman退出江湖，不再有主导地位，配置文件由corosync主导

gfs（global file system，全局文件系统，依赖高可用集群，可让多个node使用同一个FS，一个node持有的锁会通知其它node，这要借助高可用集群的基础架构层实现DLM及成员关系管理，而ext{2,3,4}、vfat等这些FS同一时刻仅能让一个node使用；gfs（redhat4）、gfs和gfs2（redhat5）、gfs2（redhat6））

注：要让几个node同时挂载一个FS，就要创建几个日志区域（ext2和ext3的最大区别在日志功能上，通过FS的日志可快速修复数据）

注：另一gfs（google file system，应用在海量数据处理中的分布式FS，与这里的集群文件系统是两个概念）

注：分布式文件系统（hdFS，hadoop，分两部分（namenode，前面的一个server放源数据，要做高可用，存放有文件大小、存储路径等；datanode，后端的一堆server存放真正的数据），当用户存储数据时，先联系namenode请求存储，namenode将请求分割（例如以块为单位，每512M为一块），再让用户去联系datanode，依次往后端的node存512M的文件，将数据分散地存在各个node上，之后要读取数据时联系namenode，namenode告诉用户在哪几个后端node上）

注：DLM（distributed lock manager，运行在各node上的守护进程（克隆类型的资源，运行在每个node），各node的DLM通过tcp/ip通信，一旦发现某个node故障，借助高可用集群功能fence掉这个node）

注：drbd的双主模型（dual master）借助CFS的DLM实现

OCFS2（oracle cluster filesystem，在IO量很大时性能非常差，没gfs2流行，甚至oracle自身都建议使用gfs2）

cLVM（cluster LVM，可方便扩展空间，只不过是借助高可用的功能，将某node对LVM的操作通知给其它node，LVM上的命令都可用，在LVM基础上启用集群的功能可将cLVM做成高可用集群中分布式LVM）

/etc/lvm/lvm.conf中locking_type= 3（3表示内建的集群锁；1表示本地基于文件锁，默认此项；2表示外部共享库锁库）

RHCS中ccs服务（cluster configuration system集群配置系统，专门用于实现集群配置文件管理，在每个node开启了cman服务（要在cluster.conf配置文件中说明有几个node）ccsd会自动运行，之后仅在一个node修改配置文件，ccsd监控到有变化，会自动借助message layer同步到其它node，不用像corosync中使用scp复制）

redhat4中RHCS，cman运行于内核，这种工作机制复杂且难于管理

redhat5中RHCS：

cman借助于openais，openais是在用户空间工作， openais的核心进程aisexec（corosync也在配置文件中定义启动它，它完成心跳信息传递、成员关系管理等，cman用于完成投票系统）

groupd（rgmanager，资源管理器的守护进程）

dlm_controld（控制器，对于锁本身的管理，持有锁，释放锁）

lock_dlmd（管理器，将某node持有锁的信息告知其它node）

app<-->libdlm（应用程序借助库通过DLM机制对FS施加锁）

内核空间中gfs和dlm，大多数功能都在user space完成

redhat6中RHCS：

cman借助corosync，资源管理器rgmanager

dlm_controld（OCFS借助dlm_controld，将lock_dlmd的工作移到dlm_controld完成）

gfs_controld（专为gfs提供的专用的优化的控制工具）

注：dlm_controld与gfs-_controld看上去是两个独立的子系统，实际上没有gfs_controld也可，只不过gsf_controld是为gfs专门提供的，若用不到CFS，这两个进程不用启动，有cman和rgmanager就可工作

RHCS中failover domain（故障转移域，限定了node的转移范围，转移到限定的目标node）：

failover domain与服务相关，可设置node在failover domain中的次序，用以设置转移的优先级

定义一个服务就要定义故障时采取什么动作（若重启不成功就要转移出去relocate）；若某node故障，没有可转移的node或不想让它转移，则可将服务stopped

注：heartbeat、corosync组件中有左对称、右对称（见集群第2篇《heartbeatV2》）

集群管理方面：

软件安装（application software或operation system software的安装，cobbler（补鞋匠，fedora项目，网络大批量部署系统，自动化运维重要工具，早期使用PXE+dhcp+tftp配合kickstart）、openQRM、spacewalk）

执行命令（for通过ssh解决（for是按shell的机制，顺序执行，不是并行执行，所以效率不高）；fabric（基于ssh，python程序，http://www.fabfile.org/）

配置文件管理：puppet（重量级工具，软件分发，配置文件管理等）

RHCS中luci/ricci（luci（server-side，统一管理分发的web界面），ricci（client-side，代理接收server-side发来的指令，并在client执行），redhat5和6界面相差很大，但概念一样，用于实现软件安装，启停服务，获取集群状态，修改配置文件等功能）

RHCS中command line administrator tools：clustat、clusvcadm、ccs_tool、cman_tool、fence_tool

redhat5系列使用工具system-config-cluster（此工具只需安装在一个node，在该node配置好后，点send to cluster将配置同步到其它node）

web架构：

N-m（N个node运行m个服务，m<=N）

director易成为单点故障应做高可用，而后端的realserver不应做高可用（仅在director做health check即可）

统一使用一个数据库mysql

对于上传的文件（附件）：

方案一：指定一个RS，文件只能上传到指定的RS上，这要在前端director上实现这种功能（使用七层转发工具haproxy或nginx，LVS不具备这种能力），再使用rsync+inotify同步至其它RS（这对于指定的那个RS来说将会非常繁忙）

方案二：使用共享存储（NAS，文件级别）

方案三：使用共享存储（DAS，iscsi，块级别，要将后端RS做成高可用集群，仅对FS做高可用，并不是把httpd 做成高可用集群，关键要使用CFS（cluster FS，如gfs最佳支持16个node，再多性能会很差，对于读写操作非常频繁的场景，CFS并不理想，而要使用分布式FS）

注：LB+HA，根据业务需要，组合软件来实现

二、操作：

环境：redhat5.8 32bit 2.6.18

准备：时间同步；名称解析，要与#uname -n或#hostname保持一致；配置好各node的yum源（参考集群第二篇《heartbeatV2》设置）

安排：共四个node，node{1,2,3}为高可用node，node4干两件事（跳板机、nfs共享存储）

注：ha场景中，node数最少3个，>=3且是奇数个，若是两个要使用仲裁磁盘

注：cman中没有DC概念

RHCS中：

每个集群都要有惟一集群ID（集群名称），有集群名称cman才能启动越来；

至少要有一个fence设备，对于特定场景执行关机、重启等操作（#fence_manual -n NODE_NAME，#fence_vmware -h（Fence agent for VMWare））；

至少应该有3个node，两node场景中要使用qdisk（仲裁磁盘），3个以上node若是偶数个，两两分裂的机率很小，最好配置，但不配置qdisk也可

软件包：

cman-2.0.115-96.el5.i386.rpm（/mnt/cdrom/Server下）

rgmanager-2.0.52-28.el5.i386.rpm（/mnt/cdrom/Cluster下）

system-config-cluster-1.0.57-12.noarch.rpm（/mnt/cdrom/Cluster下）

node4-side：

[root@node4 ~]# alias ha='for I in{1..3};do ssh node$I'

[root@node4 ~]# ha 'yum -y install cman httpd';done

[root@node1 ~]# sed -i 's@Server@Cluster@g' /etc/yum.repos.d/rhel-debuginfo.repo（node1,2,3都要改，或者改一个使用scp传到其它node）

注：baseurl=file:///mnt/cdrom/Cluster

[root@node4 ~]# ha 'yum clean all';done

[root@node4 ~]# ha 'yum -y install rgmanager system-config-cluster';done

分别在node{1,2,3}准备不同的网页页面#vim /var/www/html/index.html，并分别测试是否能正常访问，注意测试完要关闭服务并设置开机不能自启动

node1-side：

[root@node1 ~]# rpm -ql cman

/etc/cluster

/etc/rc.d/init.d/{cman,qdiskd, scsi_reserve}（qdiskd仲裁磁盘相关进程）

/sbin/ccs_tool

/sbin/dlm_controld

/sbin/dlm_tool

/sbin/fence_manual（模拟手动fence，meatware）

[root@node1 ~]# man ccs_tool（The toolused to make online updates of CCS config files）

[root@node1 ~]# man cman_tool（ClusterManagement Tool）

[root@node1 ~]# rpm -ql rgmanager

/usr/sbin/clustat（ClusterStatus Utility）

/usr/sbin/clusvcadm（Cluster UserService Administration Utility）

（1）GUI图形化界面下，添加集群名称、节点、fence设备，并自动创建配置文件

[root@node1 ~]# system-config-cluster &（执行此命令，打开图形界面配置，类似hb_gui的界面，确保在win下安装并打开xmanager，若打不开图形界面，在Xshell中确认，文件-->属性-->隧道-->转发X11连接至xmanager）

如图：提示没有配置文件，要先手动指定配置文件，这个工具是在当前本机操作node1，点保存后会借助ccs同步至各node，点create new configuration-->choose a name for the cluster（集群名称：tcluster）-->custom configure multicast和use a quorum disk这两项留空-->OK

注：多播地址不指默认会随机生成的；此例中已有三个node不需仲裁磁盘

点cluster nodes-->点右下角add a cluster node-->分别添加三个node，分别为node{1,2,3}.magedu.com，quorum votes填1票-->OK

点fence devices-->点右下角add a fence device-->选manual fencing-->name:meatware-->OK

点file-->save-->OK

[root@node1 ~]# ls /etc/cluster（其它node在没有启动cman时是没有此配置文件的）

cluster.conf

[root@node1 ~]# service cman start（cman启动时会自动同步配置文件，第一个node启动时会到starting fencing处就卡住不动了，等其它node都启动了cman服务，第一个node才启动完毕，cman服务不可以用for来启动，而要一个node一个node的启动）

Starting cluster:

Loading modules... done

Mounting configfs... done

Starting ccsd... done

Starting cman... done

Starting daemons... done

Starting fencing...

node2-side：

[root@node2 ~]# service cman start

node3-side：

[root@node3 ~]# service cman start

[root@node3 ~]# ls /etc/cluster/（成功启动cman服务后，在node2和node3上查看是否有配置文件）

cluster.conf

[root@node3 ~]# vim /etc/cluster/cluster.conf

[root@node3 ~]# cman_tool status（查看集群状态信息）

Version: 6.2.0

Config Version: 2

Cluster Name: tcluster

……

Nodes: 3

Expected votes: 3

Total votes: 3

Node votes: 1

Quorum: 2

Active subsystems: 8

Flags: Dirty

Ports Bound: 0 177

Node name: node3.magedu.com

Node ID: 3

Multicast addresses: 239.192.110.162

Node addresses: 192.168.41.133

node2-side：

[root@node2 ~]# clustat（查看集群状态信息）

Cluster Status for tcluster @ Thu Dec 1017:58:33 2015

Member Status: Quorate

Member Name ID Status

------ ---- ---- ------

node1.magedu.com 1Online

node2.magedu.com 2 Online, Local

node3.magedu.com 3 Online

node4-side：

[root@node4 ~]# ha 'service rgmanagerstart';done

启动 Cluster Service Manager：[确定]

node1-side：

[root@node1 ~]# netstat -tunlp | grep aisexec（多播地址239.192.110.162，端口5405）

udp 0 0 192.168.41.131:5405 0.0.0.0:* 1165/aisexec

udp 0 0 192.168.41.131:5149 0.0.0.0:* 1165/aisexec

udp 0 0 239.192.110.162:5405 0.0.0.0:* 1165/aisexec

（2）添加资源（RHCS中添加资源要么在GUI图形化界面操作，要么直接修改配置文件）

node1-side：

[root@node1 ~]# system-config-cluster &

点resources-->create a resource-->resource type选IP Address-->内容如图显示192.168.41.222/24-->OK

点resources-->create a resource-->resource type选script-->如图所示，name:webserver，file:/etc/rc.d/init.d/httpd-->OK

注：此工具添加某服务是选script

点services-->create a service-->如图name:webservice-->OK-->如图按默认即可autostart this service，recovery policy：restart-->点add a shared resource to this service-->如图分别选择IP addr和script两项加入到services中-->close-->点右侧send to cluster（将配置通告到整个集群）-->Yes--OK

注：此处services，类似资源组的概念

node2-side：

[root@node2 ~]# clustat

……

Member Name ID Status

------ ---- ---- ------

node1.magedu.com 1 Online, rgmanager

node2.magedu.com 2 Online, Local, rgmanager

node3.magedu.com 3 Online,rgmanager

Service Name Owner (Last) State

------- ---- ----- ------ -----

service:webservice node3.magedu.com started

node4-side：

[root@node4 ~]# elinks -dump http://192.168.41.222

node3 server

node3-side：

[root@node3 ~]# ip addr show（注意此处使用#ifconfig是看不出来的，添加别名是用命令#ip addr添加的）

……

2: eth0:<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000

link/ether 00:0c:29:89:2d:74 brd ff:ff:ff:ff:ff:ff

inet 192.168.41.133/24 brd 192.168.41.255 scope global eth0

inet 192.168.41.222/24 scope global secondary eth0

inet6fe80::20c:29ff:fe89:2d74/64 scope link

……

[root@node3 ~]# clusvcadm -h

[root@node3 ~]# clusvcadm -r webservice -m node1.magedu.com（将资源重新定位relocate至指定node，若不指具体node则会随机选择）

Trying to relocate service:webservice to node1.magedu.com...Success

service:webservice is now running on node1.magedu.com

[root@node3 ~]# clustat

Cluster Status for tcluster @ Thu Dec 1018:42:25 2015

……

service:webservice node1.magedu.com started

[root@node3 ~]# clusvcadm -r webservice

Trying to relocateservice:webservice...Success

service:webservice is now running on node3.magedu.com

[root@node3 ~]# clusvcadm -l（Lock localresource group manager，锁定后将不能重新定位至其它node）

Resource groups locked

[root@node3 ~]# clusvcadm -r webservice -m node1.magedu.com

Trying to relocate service:webservice to node1.magedu.com...Failure

[root@node3 ~]# clusvcadm -r webservice -m node2.magedu.com

Trying to relocate service:webservice to node2.magedu.com...Failure

[root@node3 ~]# clusvcadm –u（解锁）

Resource groups unlocked

[root@node3 ~]# clusvcadm -r webservice -m node2.magedu.com

Trying to relocate service:webservice to node2.magedu.com...Success

service:webservice is now running on node2.magedu.com

注：

#clusvcadm -r GROUP [-m NODE]（relocate，member）

#clusvcadm -l|-u（lock|unlock）

#clusvcadm -M GROUP -m NODE（migrate，实时迁移虚拟机资源）

#cman_tools nodes（显示集群中各node）

#clustat（不仅显示各node，还有各node的状态，还显示资源在哪个node运行）

#ccs_tool lsnode|lsfence

#ccs_tool addnode|delnode NODE

#ccs_tool addfence|delfence FENCE_NAME FENCE_DEV

#ccs_tool create（Create a skeleton config file，自动创建配置文件）

#ccs_tool update XML_FILE（Tells ccsdto upgrade to new config file）

node4-side（在node4准备共享目录文件）：

[root@node4 ~]# mkidr -pv /web/htdocs

[root@node4 ~]# echo "nfs server">> /web/htdocs/index.html

[root@node4 ~]# vim /etc/exports

/web/htdocs 192.168.41.131(ro) 192.168.41.132(ro) 192.168.41.133(ro)

[root@node4 ~]# service nfs start

[root@node4 ~]# showmount -e 192.168.41.134

Export list for 192.168.41.134:

/web/htdocs192.168.41.133,192.168.41.132,192.168.41.131

[root@node4 ~]# chkconfig nfs on

[root@node4 ~]# chkconfig --list nfs

nfs 0:关闭 1:关闭 2:启用 3:启用 4:启用 5:启用 6:关闭

node1-side（在node{1,2,3}上测试是否可以正常挂载，记得要卸载；并将共享目录自动挂载加入资源中）：

[root@node1 ~]# mount -t nfs 192.168.41.134:/web/htdocs /var/www/html

[root@node1 ~]# cat /var/www/html/index.html

nfs server

[root@node1 ~]# umount /var/www/html

[root@node1 ~]# system-config-cluster &

点service webservice-->edit service properties-->create new reource for this service-->选NFS Mount-->name:webstore，mountpoint:/var/www/html，host:192.168.41.134，export path:/web/htdocs-->OK-->close-->send to cluster

[root@node1 ~]# clustat

Cluster Status for tcluster @ Thu Dec 1020:05:08 2015

……

Service Name Owner (Last) State

------- ---- ----- ------ -----

service:webservice node2.magedu.com started

node2-side（在当前活动节点查看，已自动挂载共享；并重新定位至node3，查看状态）：

[root@node2 ~]# mount | grep /web/htdocs

192.168.41.134:/web/htdocs on /var/www/htmltype nfs (rw,sync,soft,noac,addr=192.168.41.134)

[root@node2 ~]# clusvcadm -r webservice -m node3.magedu.com

Trying to relocate service:webservice tonode3.magedu.com...Success

service:webservice is now running onnode3.magedu.com

[root@node2 ~]# elinks -dump http://192.168.41.222

nfs server

（3）补充（全CLI命令行下生成配置文件、添加node、添加fence device；若要添加资源要么在GUI下操作，要么直接修改配置文件）：

node4-side：

[root@node4 ~]# ha 'service rgmanager stop';done

node3-side：

[root@node3 ~]# service cman stop

node2-side：

[root@node2 ~]# service cman stop

node1-side：

[root@node1 ~]# service cman stop

[root@node1 ~]# ls /etc/cluster/（每send acluster向其它node同步一次，会将旧版本自动备份）

cluster.conf cluster.conf.bak.1 cluster.conf.bak.2 cluster.conf.bak.3 cluster.conf.old

node4-side：

node1-side：

[root@node1 ~]# ls /etc/cluster

[root@node1 ~]# ccs_tool create tcluster（生成配置文件，Create a new, skeleton, configuration file）

[root@node1 ~]# cat/etc/cluster/cluster.conf

<?xml version="1.0"?>

<rm>

</rm>

</cluster>

#ccs_tool addfence NAME AGENT

[root@node1 ~]# ccs_tool addfence meatware fence_manual（添加fencedevice，delfenc删除fence device）

running ccs_tool update...

[root@node1 ~]# cat/etc/cluster/cluster.conf

<?xml version="1.0"?>

<rm>

</rm>

</cluster>

[root@node1 ~]# ccs_tool lsfence

Name Agent

meatware fence_manual

#ccs_tool addnode -v VOTES [-n NODEID] -f FENCE_NAME NODE_NAME（-v指定该node法定票数，-n指定节点ID号必须唯一可选项，-f让该节点使用这个fence设备）

[root@node1 ~]# man ccs_tool

[root@node1 ~]# ccs_tool addnode -v 1 -n 1 -f meatware node1.magedu.com

running ccs_tool update...

Segmentation fault

[root@node1 ~]# ccs_tool addnode -v 1 -n 2 -f meatware node2.magedu.com

running ccs_tool update...

Segmentation fault

[root@node1 ~]# ccs_tool addnode -v 1 -n 3 -f meatware node3.magedu.com

running ccs_tool update...

Segmentation fault

[root@node1 ~]# ccs_tool lsnode

Cluster name: tcluster, config_version: 5

Nodename Votes Nodeid Fencetype

node1.magedu.com 1 1 meatware

node2.magedu.com 1 2 meatware

node3.magedu.com 1 3 meatware

（4）扩展（虚拟机）：

#clusvcadm -M GROUP -m NODE（migrate，实时迁移虚拟机资源）

N台物理机（N个node）上分散地运行多个虚拟机，服务在虚拟机上运行，虚拟机作为高可用资源，若当前运行虚拟机的这台物理机故障时，虚拟机不停,正常运行并转移至其它物理机node上；数据要在共享存储上（虚拟机运行要依赖很多数据）

web集群，http是stateless，如电商站点中购物车，后端某一RS挂掉，VIP和httpd要迁移至其它node，为使session信息继续可用，将session保存在memcache中，不论哪一台RS挂掉用户都能访问到session，若当前与用户建立连接的RS挂掉，用户这时继续访问时服务会中断（如提示404），接着再次访问则会正常；若是ssh连接在某一个RS上（ssh是始终在线的服务），若这个RS故障，则连接会断开，需要再次发起请求；若ssh连接在RS中的虚拟机上，则当RS故障，虚拟机会迁移至其它RS，则连接不会中断（这是虚拟机的好处）

虚拟化的好处（虚拟机在高可用的物理node上，任何一个node故障，虚拟机只要实时迁移，迁移时是不需要停止服务的）

云（提供成千上万个node，多个虚拟机平均地运行在多个node上，iaas云提供虚拟机的高可用环境，所以我们使用虚拟机不会掉线）

在虚拟磁盘中创建的文件，不管在哪个node上运行的虚拟机都要能访问到，要用到共享存储，用以存放多个虚拟机的镜像文件（虚拟磁盘文件），将虚拟机进程关联至镜像文件，虚拟机提供商提供的虚拟机中有系统（是在镜像文件中灌装好的系统，存放有linux的根文件系统，这里面可能没有内核，可依赖外部的内核启动）

本文转自 chaijowin 51CTO博客，原文链接：http://blog.51cto.com/jowin/1722241，如需转载请自行联系原作者

V 5 RHCS

热门文章

最新文章

相关电子书

相关实验场景