实验环境:
先搭建好MFS。我们利用mfs做这篇集群的实验。
server1 172.25.254.1 master
server2 172.25.254.2 chunk
server3 172.25.254.3 chunk
server4 172.25.254.4 备用master
1和4 做高可用结点,2和3 做数据存储
server1和4:
yum install moosefs-master-3.0.113-1.rhsystemd.x86_64.rpm -y
server2和server3:
yum install moosefs-chunkserver-3.0.113-1.rhsystemd.x86_64.rpm -y
在server1和server4上配置启动脚本:
vim /usr/lib/systemd/system/moosefs-master.service
systemctl daemon-reload 刷新
加上-a参数避免异常退出后服务启动不了。
在四台主机上配置yum源(高可用数据库和存储数据库)。
配置集群
在server1和server4上安装集群软件:
yum install pacemaker corosync pcs -y
pacemaker 主软件
corosync 同步复制和心跳检测
pcs 命令行
[root@server1 3.0.113]# id hacluster
uid=189(hacluster) gid=189(haclient) groups=189(haclient)
# 安装完成后自动生成hacluster用户
1 和 4 主机之间做免密
接下来配置集群:
[root@server4 ~]# systemctl start pcsd.service
[root@server4 ~]# systemctl enable pcsd.service 开机自起
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@server4 ~]# passwd hacluster 给hacluster用户一个密码
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
集群主机做认证:
[root@server1 3.0.113]# pcs cluster auth server1 server4 当前只有1和4
Username: hacluster
Password:
server4: Authorized
server1: Authorized
配置集群服务:
[root@server1 3.0.113]# pcs cluster setup --name mycluster server1 server4
Destroying cluster on nodes: server1, server4...
server1: Stopping Cluster (pacemaker)...
server4: Stopping Cluster (pacemaker)...
server1: Successfully destroyed cluster
server4: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'server1', 'server4'
server1: successful distribution of the file 'pacemaker_remote authkey'
server4: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
server1: Succeeded
server4: Succeeded
Synchronizing pcsd certificates on nodes server1, server4...
server4: Success
server1: Success
Restarting pcsd on the nodes in order to reload the certificates...
server4: Success
server1: Success
[root@server1 3.0.113]# pcs cluster start --all 开启集群
server1: Starting Cluster (corosync)...
server4: Starting Cluster (corosync)...
server1: Starting Cluster (pacemaker)...
server4: Starting Cluster (pacemaker)...
它会自动的帮我们开启两个服务:
配置集群开机自起:
[root@server1 3.0.113]# pcs cluster enable --all
server1: Cluster Enabled
server4: Cluster Enabled
查看状态:
[root@server1 3.0.113]# pcs status
Cluster name: mycluster
WARNINGS:
No stonith devices and stonith-enabled is not false
## stonith设备(fence)是用来配置当系统出现问题后卡死,强制重启主机的,这样有利于资源的释放
我们当前没有这个设备,所以应该先关闭它
Stack: corosync
Current DC: server1 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Wed May 20 16:25:45 2020
Last change: Wed May 20 16:22:11 2020 by hacluster via crmd on server1
2 nodes configured
0 resources configured
Online: [ server1 server4 ]
No resources
## 我们当前是没有资源的
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@server1 3.0.113]# pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 server1 (local)
2 1 server4
[root@server1 3.0.113]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 172.25.254.1
status = ring 0 active with no faults
[root@server1 ~]# pcs property set stonith-enabled=false 关闭stonith-enabled
[root@server1 ~]# crm_verify -L -V 检测,没有问题
[root@server1 ~]# pcs property set no-quorum-policy=ignore 如果本机无法进行投票时就忽略
配置vip
[root@server1 ~]# pcs resource list ## 列出资源 有很多
。。。
[root@server1 ~]# pcs resource standards ##大致有这四种类型
lsb
ocf
service
systemd
创建vip,方便我们客户在访问的时候有统一的入口,因为我们的集群中有多台主机。
## 添加vip ,ocf类型,
[root@server1 ~]# pcs resource create vip ocf:heartbeat:IPaddr2 ip=172.25.254.100 cidr_netmask=32 op monitor interval=30s
[root@server1 ~]# pcs resource show ## 列出资源
vip (ocf::heartbeat:IPaddr2): Started server1
[root@server1 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:e4:9b:44 brd ff:ff:ff:ff:ff:ff
inet 172.25.254.1/24 brd 172.25.254.255 scope global ens3
valid_lft forever preferred_lft forever
inet 172.25.254.100/32 brd 172.25.254.255 scope global ens3
valid_lft forever preferred_lft forever ## vip就已经添加上了
在控制台查看:
[root@server1 ~]# crm_mon
我们来测试vip的漂移:
server1:
[root@server1 ~]# pcs cluster stop server1
server1: Stopping Cluster (pacemaker)...
server1: Stopping Cluster (corosync)...
server4:
[root@server4 ~]# ip a
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:46:90:50 brd ff:ff:ff:ff:ff:ff
inet 172.25.254.4/24 brd 172.25.254.255 scope global ens3
valid_lft forever preferred_lft forever
inet 172.25.254.100/32 brd 172.25.254.255 scope global ens3
valid_lft forever preferred_lft forever ## 实现了VIP的漂移
我们在打开server1:
[root@server1 ~]# pcs cluster start server1
server1: Starting Cluster (corosync)...
server1: Starting Cluster (pacemaker)...
[root@server1 ~]# crm_mon
# 在控制台查看
vip还在server4上,因为在企业中,频繁的切换会影响服务的稳定性。
配置apache
我们在server1和4上安装apache:
yum install httpd -y
echo server1 > /var/www//html/index.html 配置两个默认发布页
echo server4 > /var/www//html/index.html
创建apache资源;
[root@server1 ~]# pcs resource create apache systemd:httpd op monitor interval=1min
[root@server1 ~]# pcs resource show ## 控制启动方式为systemd的方式
vip (ocf::heartbeat:IPaddr2): Started server4
apache (systemd:httpd): Started server1
[root@server1 ~]# systemctl status httpd
● httpd.service - Cluster Controlled httpd
Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
Drop-In: /run/systemd/system/httpd.service.d
└─50-pacemaker.conf
Active: active (running) since Thu 2020-05-21 10:08:18 CST; 58s ago
apache服务就自动开启了,
我们查看日志:
这就说明时集群控制开启了httpd。
但是此时:
[root@server1 ~]# pcs resource show
vip (ocf::heartbeat:IPaddr2): Started server4
apache (systemd:httpd): Started server1
## vip和apache不再同一台服务器上
[root@server1 ~]# curl 172.25.254.100
curl: (7) Failed connect to 172.25.254.100:80; Connection refused
我们就通过vip无法访问。这是不合理的,所以我们需要创建资源组
将两个资源放在一个组里面:
[root@server1 ~]# pcs resource group add apache_group vip apache
[root@server1 ~]# pcs resource show
Resource Group: apache_group
vip (ocf::heartbeat:IPaddr2): Started server4
apache (systemd:httpd): Started server4 ## apache就自动加到server4里面去了
[root@server1 ~]# systemctl status httpd.service
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
Active: inactive (dead) ## 并且此时server1的apache是关闭的,说明server4上开启了
[root@server1 ~]# curl 172.25.254.100
server4 ## 可以访问到了
添加共享磁盘,创建mfs资源
我们在server3上添加一块磁盘,用作共享
/dev/sdb
安装服务软件:
[root@server3 ~]# yum install targetcli.noarch -y
配置一下;
[root@server3 ~]# targetcli
/> backstores/block create my_disk1 /dev/sdb
Created block storage object my_disk1 using /dev/sdb.
/> iscsi/ create iqn.2020-05.com.example:server3 ## 创建iqn
Created target iqn.2020-05.com.example:server3.
Created TPG 1.
Global pref auto_add_default_portal=true
Created default portal listening on all IPs (0.0.0.0), port 3260.
/> iscsi/iqn.2020-05.com.example:server3/tpg1/luns create /backstores/block/my_disk1
Created LUN 0.
/> iscsi/iqn.2020-05.com.example:server3/tpg1/acls create iqn.2020-05.com.example:client
Created Node ACL for iqn.2020-05.com.example:client #允许这个名称的访问
Created mapped LUN 0.
3260端口打开
server1和server4为客户端,安装服务,共享server3的磁盘
这两台主机上配置解析:
[root@server1 ~]# yum install iscsi-* -y
[root@server1 ~]# vim /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2020-05.com.example:client # 改为上面设定的名称
[root@server1 ~]# systemctl restart iscsid # 重启服务
[root@server1 ~]# iscsiadm -m discovery -t st -p 172.25.254.3
172.25.254.3:3260,1 iqn.2020-05.com.example:server3 # 发现
[root@server1 ~]# iscsiadm -m node -l # 登陆
Logging in to [iface: default, target: iqn.2020-05.com.example:server3, portal: 172.25.254.3,3260] (multiple)
Login to [iface: default, target: iqn.2020-05.com.example:server3, portal: 172.25.254.3,3260] successful.
[root@server1 ~]# fdisk -l
就获取到了这个设备
我们先进行分区:
先之分一个区,然后格式化
mkfs.xfs /dev/sdb1
然后我们给这个设备中存入数据:
[root@server1 ~]# mount /dev/sdb1 /mnt/ 挂载设备
[root@server1 ~]# cp -p /var/lib/mfs/* /mnt/ 存入数据
c[root@server1 ~]# chown mfs.mfs /mnt/ 更改权限属于mfs用户
[root@server1 ~]# ls /mnt
changelog.1.mfs changelog.2.mfs changelog.4.mfs changelog.5.mfs metadata.crc metadata.mfs metadata.mfs.back.1 metadata.mfs.empty stats.mfs
[root@server1 ~]# umount /mnt/
[root@server1 ~]# mount /dev/sdb1 /var/lib/mfs/ 挂载到mfs工作目录下
[root@server1 ~]# systemctl start moosefs-master mfs正常开启,说明配置正确
[root@server1 ~]# systemctl stop moosefs-master.service
[root@server1 ~]# umount /var/lib/mfs 然后卸载掉
server4:
[root@server4 ~]# yum install -y iscsi-*
[root@server4 ~]# vim /etc/iscsi/initiatorname.iscsi
[root@server4 ~]# iscsiadm -m discovery -t st -p 172.25.254.3
172.25.254.3:3260,1 iqn.2020-05.com.example:server3
[root@server4 ~]# iscsiadm -m node -l
Logging in to [iface: default, target: iqn.2020-05.com.example:server3, portal: 172.25.254.3,3260] (multiple)
Login to [iface: default, target: iqn.2020-05.com.example:server3, portal: 172.25.254.3,3260] successful.
[root@server4 ~]# mount /dev/sdb1 /var/lib/mfs/
[root@server4 ~]# systemctl start moosefs-master
[root@server4 ~]# systemctl stop moosefs-master
[root@server4 ~]# umount /var/lib/mfs/ 测试正常启动后关闭并卸载掉,这些都是由集群来完成的
然后我们创建资源:
[root@server1 ~]# pcs resource create mfsdata ocf:heartbeat:Filesystem device=/dev/sdb1 directory=/var/lib/mfs fstype=xfs op monitor interval=30s
## 名称为mfsdata, 文件系统类型 ,设备是/dev/sdb1 挂载点为 /var/lib/mfs
[root@server1 ~]# pcs resource show
Resource Group: apache_group
vip (ocf::heartbeat:IPaddr2): Started server4
apache (systemd:httpd): Started server4
mfsdata (ocf::heartbeat:Filesystem): Started server1
自动挂载上去了
设置自启动:
May 21 13:20:43 server1 systemd[1]: Stopped MooseFS Master server.
[root@server1 ~]# pcs resource create mfsd systemd:moosefs-master op monitor interval=1min
[root@server1 ~]# pcs resource show
Resource Group: apache_group
vip (ocf::heartbeat:IPaddr2): Started server4
apache (systemd:httpd): Started server4
mfsdata (ocf::heartbeat:Filesystem): Started server1
mfsd (systemd:moosefs-master): Started server1
服务就自动启动了。
然后我们再次创建资源组,把vip 和 mfsdata 和mfsd 放到一个资源组中。
[root@server1 ~]# pcs resource delete apache
Attempting to stop: apache... Stopped #apache的资源删掉,我们刚才只是进行一个测试
[root@server1 ~]# pcs resource group add mfsgroup vip mfsdata mfsd #创建组
[root@server1 ~]# pcs resource show #等及秒钟再次查看,三个资源就都放到server4里面去了
Resource Group: mfsgroup
vip (ocf::heartbeat:IPaddr2): Started server4
mfsdata (ocf::heartbeat:Filesystem): Started server4
mfsd (systemd:moosefs-master): Started server4
[root@server1 ~]#
server4中查看:
现在我们测试集群的高可用,我们关闭server4.在server1中监控:
当前都在server4上。
现在关闭server4的集群:
[root@server4 ~]# pcs cluster stop server4
server4: Stopping Cluster (pacemaker)...
server4: Stopping Cluster (corosync)...
就切换过来了,server4已经下线了
这时我们在server1上查看服务和vip和挂载:
全部切换过来了,而且server4中的全部关闭了。
集群中加入fence
开启server4的集群:
[root@server4 ~]# pcs cluster start server4
server4: Starting Cluster (corosync)...
server4: Starting Cluster (pacemaker)...
此时的资源都在server1上,我们先安装fence。fence控制的server1和server4,fence可以控制主机的开关,在主机异常崩溃后会自动重启主机
[root@server4 ~]# yum install -y fence-virt
[root@server4 ~]# mkdir /etc/cluster 建立控制目录
[root@server1 ~]# yum install -y fence-virt
[root@server1 ~]# mkdir /etc/cluster
我们用物理来充当客户端。
安装:
[root@rhel7host ~]# yum install -y fence-virtd.x86_64 fence-virtd-libvirt.x86_64 fence-virtd-multicast.x86_64
安装后进行配置
[root@rhel7host ~]# fence_virtd -c
Module search path [/usr/lib64/fence-virt]:
Listener module [multicast]:
Multicast IP Address [225.0.0.12]:
Multicast IP Port [1229]:
Interface [virbr0]: br0 ## 其它设定都选用默认,只用更改接口:
Key File [/etc/cluster/fence_xvm.key]:
Backend module [libvirt]:
Replace /etc/fence_virt.conf with the above [y/N]? y
[root@rhel7host ~]# mkdir /etc/cluster/ 建立目录
[root@rhel7host ~]# cd /etc/cluster/
[root@rhel7host cluster]# dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=128 count=1 # 生成key文件
1+0 records in
1+0 records out
128 bytes (128 B) copied, 0.000221106 s, 579 kB/s
[root@rhel7host cluster]# ls
fence_xvm.key
[root@rhel7host cluster]# scp fence_xvm.key [email protected]:/etc/cluster/
[email protected]'s password:
fence_xvm.key
## 发送key文件给server1和4
[root@rhel7host cluster]# scp fence_xvm.key [email protected]:/etc/cluster/
[email protected]'s password:
fence_xvm.key
[root@rhel7host cluster]# scp fence_xvm.key [email protected]:/etc/cluster/
[email protected]'s password:
fence_xvm.key
开启服务:
[root@rhel7host cluster]# systemctl start fence_virtd.service
[root@rhel7host cluster]# netstat -ntlup |grep 1229
udp 0 0 0.0.0.0:1229 0.0.0.0:* 11612/fence_virtd
## 1229端口打开
然后我们在server1上添加资源:
[root@server1 ~]# pcs stonith create vmfence fence_xvm pcmk_host_map="server1:node1;server4:node4" op monitor interval=1min # 主机映射,前面是主机名,:后面是虚拟机名
[root@server1 ~]# pcs property set stonith-enabled=true 开启stonith设备
[root@server1 ~]# crm_verify -L -V 检测
[root@server1 ~]# pcs status 查看
Resource Group: mfsgroup
vip (ocf::heartbeat:IPaddr2): Started server1
mfsdata (ocf::heartbeat:Filesystem): Started server1
mfsd (systemd:moosefs-master): Started server1 # 资源都在server1上
vmfence (stonith:fence_xvm): Started server4 # fence开启在server4上
我们现在挂掉server1的集群
[root@server1 ~]# pcs cluster stop server1
server1: Stopping Cluster (pacemaker)...
server1: Stopping Cluster (corosync)...
资源切换到了server4上,fence页再server4上
我们在开启server1:
[root@server1 ~]# pcs cluster start server1
server1: Starting Cluster (corosync)...
server1: Starting Cluster (pacemaker)...
fence切换到了server1上,说明fence总是不跟资源在一台主机上。
我们现在使server4异常崩溃,看fence是否可以使server4重启:
资源全部切换到了server1上,
server4已经重启了。
当server4重启后:
fence又跑到了server4上.
这就是fence的工作原理,当主机的内核崩溃后,界面卡住不动,fence就会主动重启它,让它释放出资源,防止了集群中的资源争抢.
使用MySql资源
我们删除vip mfsdata 和mfsd 资源。
[root@server1 ~]# pcs resource delete vip
Attempting to stop: vip... Stopped
[root@server1 ~]# pcs resource delete mfsdata
Attempting to stop: mfsdata... Stopped
[root@server1 ~]# pcs resource delete mfsd
Attempting to stop: mfsd... Stopped
现在就只剩fence了。此时server1和server4的mfs服务都关闭了
server1 和 server4安装mariadb
yum install -y mariadb-server
删除/dev/sdb1 中的内容(注意看里面的隐藏文件):
mount/dev/sdb1 /mnt
rm -fr /mnt/*
umount /mnt/
将/dev/sdb1挂载到mysql上。
[root@server1 ~]# mount /dev/sdb1 /var/lib/mysql/
[root@server1 ~]# chown mysql.mysql /var/lib/mysql/
[root@server1 mysql]# systemctl start mariadb.service
[root@server1 mysql]# ls
aria_log.00000001 aria_log_control ibdata1 ib_logfile0 ib_logfile1 mysql mysql.sock performance_schema test
这样mysql数据目录里面的内容其实是存放在共享目录里面。
创建资源:
[root@server1 mysql]# pcs resource create vip ocf:heartbeat:IPaddr2 ip=172.25.254.100 cidr_netmask=32 op monitor interval=30s
[root@server1 ~]# pcs resource create mysql_data ocf:heartbeat:Filesystem \
device=/dev/sdb1 directory=/var/lib/mysql fstype=xfs op monitor interval=30s
[root@server1 ~]# pcs resource create mariadb systemd:mariadb op monitor interval=1min
[root@server1 ~]# pcs resource show
vip (ocf::heartbeat:IPaddr2): Started server1
mysql_data (ocf::heartbeat:Filesystem): Started server1
mariadb (systemd:mariadb): Starting server4
此时资源都不再同一个组上,所以我们创建一个资源组。
[root@server1 ~]# pcs resource group add mysql_group mysql_data vip mariadb
[root@server1 ~]# pcs resource show
Resource Group: mysql_group
mysql_data (ocf::heartbeat:Filesystem): Started server1
vip (ocf::heartbeat:IPaddr2): Started server1
mariadb (systemd:mariadb): Stopping server4
[root@server1 ~]# pcs resource show
Resource Group: mysql_group
mysql_data (ocf::heartbeat:Filesystem): Started server1
vip (ocf::heartbeat:IPaddr2): Started server1
mariadb (systemd:mariadb): Started server1
## 过几秒才会更新。
[root@server1 ~]# pcs status
Cluster name: mycluster
Full list of resources:
vmfence (stonith:fence_xvm): Started server4 fence在server4上
Resource Group: mysql_group
mysql_data (ocf::heartbeat:Filesystem): Started server1
vip (ocf::heartbeat:IPaddr2): Started server1
mariadb (systemd:mariadb): Started server1
我们挂掉server1 看server4能否接管资源。
就自动挂载上了。
server1启动后,fence就挂载到server1上了。