DRBD 故障测试及脑裂处理
2014-06-27 by dongnan
环境
请参考 使用 DRBD 与 Heartbeat 实现 Mysql 高可用
故障测试
模拟pn1
主节点故障,测试pn2
备份节点能否成功接管:
sync && init 6
测试结果
vip
、drbd
、mysql
被 pn2
备份节点成功接管。
vip
ifconfig eth0:0
eth0:0 Link encap:Ethernet HWaddr 00:50:56:9C:00:0D
inet addr:172.27.233.48 Bcast:172.27.233.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
drbd
mount | tail -n1
/dev/drbd0 on /mysqldata type ext4 (rw)
mysql
mysqladmin ping
mysqld is alive
测试过程
- 第1次测试,关闭
PN1
主节点vip/drbd/mysqld
自动切换到PN2
,PN1
重启后drbd
状态变成secondary
。 - 第2次测试,关闭
PN2
备节点vip/drbd/mysqld
自动切换到PN1
,PN2
重启后drbd
状态变成secondary
。 - 第3次测试,关闭
PN1
主节点的mysqld
服务,没有触发切换。 - 第4次测试,关闭
PN1
主节点的drbd
服务,没有触发切换。 - 第5次测试,关闭
PN1
主节点的heartbeat
服务,自动切换到PN2
备节点。 - 第6次测试,
PN2
备节点直接拔电源,自动切换到PN1
主节点。
DRBD脑裂
什么情况下DRBD会发生脑裂?
当drbd
两个节点的角色都是Primary
时,会发生脑裂。
可能导致脑裂的行为?
- 心跳设备出现故障,导致
heartbeat
认为对方节点死亡DRBD
角色切换到Primary
, 待心跳设备恢复两个DRBD
节点都是Primary
角色则DRBD
发生脑裂。 - 误操作设置两个节点都是
Primary
角色则DRBD
发生脑裂。
节点维护
日常维护
关闭 heartbeat
服务,如果是Primary
节点自动释放资源,维护完毕启动 heartbeat
服务。
全体维护
- 先关闭
Secondary
角色,再关闭Primary
角色。 - 维护完毕,启动顺序任意,最后
haresources
中定义的主机为Primary
。
测试
DRBD状态
# PN1主节点
cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@pn1, 2013-12-06 14:48:27
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:360 nr:80 dw:440 dr:22957 al:6 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
# PN2备节点
cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@pn2, 2013-12-06 15:08:20
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
ns:80 nr:408 dw:408 dr:80 al:0 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
模拟DRBD脑裂
拔掉PN1
网线,模拟心跳设备出现故障,最后再插入网线:
脑裂状态
# PN1主节点
cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@pn1, 2013-12-06 14:48:27
0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
ns:0 nr:108 dw:348 dr:4305 al:11 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:192
# PN2备节点
cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@pn2, 2013-12-06 15:08:20
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r-----
ns:0 nr:0 dw:256 dr:4065 al:6 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:20512
注意,主备节点都是 Primary
角色。
DRBD日志
PN1主节点
Dec 18 17:09:43 pn1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
Dec 18 17:09:43 pn1 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0
Dec 18 17:09:43 pn1 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
Dec 18 17:09:43 pn1 kernel: d-con mysql: conn( WFReportParams -> Disconnecting )
Dec 18 17:09:43 pn1 kernel: d-con mysql: error receiving ReportState, e: -5 l: 0!
Dec 18 17:09:43 pn1 kernel: d-con mysql: meta connection shut down by peer.
Dec 18 17:09:43 pn1 kernel: d-con mysql: asender terminated
Dec 18 17:09:43 pn1 kernel: d-con mysql: Terminating asender thread
Dec 18 17:09:43 pn1 kernel: d-con mysql: Connection closed
Dec 18 17:09:43 pn1 kernel: d-con mysql: conn( Disconnecting -> StandAlone )
Dec 18 17:09:43 pn1 kernel: d-con mysql: receiver terminated
Dec 18 17:09:43 pn1 kernel: d-con mysql: Terminating receiver thread
PN2备节点
Dec 18 17:09:43 pn2 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
Dec 18 17:09:43 pn2 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0
Dec 18 17:09:43 pn2 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
Dec 18 17:09:43 pn2 kernel: d-con mysql: conn( WFReportParams -> Disconnecting )
Dec 18 17:09:43 pn2 kernel: d-con mysql: error receiving ReportState, e: -5 l: 0!
Dec 18 17:09:43 pn2 kernel: d-con mysql: asender terminated
Dec 18 17:09:43 pn2 kernel: d-con mysql: Terminating asender thread
Dec 18 17:09:43 pn2 kernel: d-con mysql: Connection closed
Dec 18 17:09:43 pn2 kernel: d-con mysql: conn( Disconnecting -> StandAlone )
Dec 18 17:09:43 pn2 kernel: d-con mysql: receiver terminated
Dec 18 17:09:43 pn2 kernel: d-con mysql: Terminating receiver thread
解决方法
备节点
将PN2
备节点强制将为secondary
角色:
/etc/init.d/heartbeat stop
drbdadm secondary mysql
drbdadm connect --discard-my-data mysql
节点状态
cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@pn2, 2013-12-06 15:08:20
0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
ns:0 nr:0 dw:740 dr:4405 al:10 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:316
主节点
PN1
主节点重新链接:
drbdadm connect mysql
cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@pn1, 2013-12-06 14:48:27
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:20576 nr:0 dw:500 dr:24973 al:12 bm:12 lo:2 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
验证
PN1主节点
drbdadm verify mysql
PN1主节点日志
tail /var/log/messages
Dec 18 18:05:38 pn1 kernel: block drbd0: conn( Connected -> VerifyS )
Dec 18 18:05:38 pn1 kernel: block drbd0: Starting Online Verify from sector 0
Dec 18 18:10:00 pn1 kernel: block drbd0: Online verify done (total 262 sec; paused 0 sec; 40020 K/sec)
Dec 18 18:10:00 pn1 kernel: block drbd0: conn( VerifyS -> Connected )
Dec 18 18:10:00 pn1 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies
Dec 18 18:10:00 pn1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
PN2备节点日志
tail /var/log/messages
Dec 18 18:05:37 pn2 kernel: block drbd0: conn( Connected -> VerifyT )
Dec 18 18:05:37 pn2 kernel: block drbd0: Online Verify start sector: 0
Dec 18 18:10:00 pn2 kernel: block drbd0: Online verify done (total 262 sec; paused 0 sec; 40020 K/sec)
Dec 18 18:10:00 pn2 kernel: block drbd0: conn( VerifyT -> Connected )
Dec 18 18:10:00 pn2 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies
Dec 18 18:10:00 pn2 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
小结
drbd/mysqld
服务关闭,不会触发VIP切换;heartbeat
不能接收到对方的心跳包,并且认为对方节点死亡,才能切换;heartbeat
释放资源流程 ,停止mysqld
服务,umount
设备,drbd
进入secondary
角色,停止vip
;- 关闭
heartbeat
服务,节点自动释放资源,可以用于日常维护。