跳转至

DRBD 故障测试及脑裂处理


2014-06-27 by dongnan

环境

请参考 使用 DRBD 与 Heartbeat 实现 Mysql 高可用

故障测试

模拟pn1主节点故障,测试pn2备份节点能否成功接管:

sync && init 6

测试结果

vipdrbdmysqlpn2备份节点成功接管。

vip

ifconfig eth0:0
eth0:0    Link encap:Ethernet  HWaddr 00:50:56:9C:00:0D
inet addr:172.27.233.48  Bcast:172.27.233.255  Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

drbd

mount | tail -n1
/dev/drbd0 on /mysqldata type ext4 (rw)

mysql

mysqladmin ping
mysqld is alive

测试过程

  • 第1次测试,关闭PN1主节点 vip/drbd/mysqld 自动切换到PN2PN1 重启后drbd状态变成secondary
  • 第2次测试,关闭PN2备节点 vip/drbd/mysqld 自动切换到PN1PN2 重启后drbd状态变成secondary
  • 第3次测试,关闭PN1主节点的mysqld服务,没有触发切换。
  • 第4次测试,关闭PN1主节点的drbd服务,没有触发切换。
  • 第5次测试,关闭PN1主节点的heartbeat服务,自动切换到PN2备节点。
  • 第6次测试,PN2备节点直接拔电源,自动切换到PN1主节点。

DRBD脑裂

什么情况下DRBD会发生脑裂?

drbd两个节点的角色都是Primary时,会发生脑裂。

可能导致脑裂的行为?

  • 心跳设备出现故障,导致heartbeat认为对方节点死亡DRBD角色切换到Primary, 待心跳设备恢复两个DRBD节点都是Primary角色则DRBD发生脑裂。
  • 误操作设置两个节点都是Primary角色则DRBD发生脑裂。

节点维护

日常维护

关闭 heartbeat 服务,如果是Primary节点自动释放资源,维护完毕启动 heartbeat 服务。

全体维护

  • 先关闭 Secondary 角色,再关闭Primary 角色。
  • 维护完毕,启动顺序任意,最后haresources 中定义的主机为Primary

测试

DRBD状态

# PN1主节点
cat /proc/drbd

version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@pn1, 2013-12-06 14:48:27
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:360 nr:80 dw:440 dr:22957 al:6 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

# PN2备节点
cat /proc/drbd

version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@pn2, 2013-12-06 15:08:20
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:80 nr:408 dw:408 dr:80 al:0 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

模拟DRBD脑裂

拔掉PN1网线,模拟心跳设备出现故障,最后再插入网线:

脑裂状态

# PN1主节点
cat /proc/drbd

version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@pn1, 2013-12-06 14:48:27
 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
    ns:0 nr:108 dw:348 dr:4305 al:11 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:192

# PN2备节点
cat /proc/drbd

version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@pn2, 2013-12-06 15:08:20
 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:256 dr:4065 al:6 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:20512

注意,主备节点都是 Primary 角色。

DRBD日志

PN1主节点

Dec 18 17:09:43 pn1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
Dec 18 17:09:43 pn1 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0
Dec 18 17:09:43 pn1 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
Dec 18 17:09:43 pn1 kernel: d-con mysql: conn( WFReportParams -> Disconnecting )
Dec 18 17:09:43 pn1 kernel: d-con mysql: error receiving ReportState, e: -5 l: 0!
Dec 18 17:09:43 pn1 kernel: d-con mysql: meta connection shut down by peer.
Dec 18 17:09:43 pn1 kernel: d-con mysql: asender terminated
Dec 18 17:09:43 pn1 kernel: d-con mysql: Terminating asender thread
Dec 18 17:09:43 pn1 kernel: d-con mysql: Connection closed
Dec 18 17:09:43 pn1 kernel: d-con mysql: conn( Disconnecting -> StandAlone )
Dec 18 17:09:43 pn1 kernel: d-con mysql: receiver terminated
Dec 18 17:09:43 pn1 kernel: d-con mysql: Terminating receiver thread

PN2备节点

Dec 18 17:09:43 pn2 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
Dec 18 17:09:43 pn2 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0
Dec 18 17:09:43 pn2 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
Dec 18 17:09:43 pn2 kernel: d-con mysql: conn( WFReportParams -> Disconnecting )
Dec 18 17:09:43 pn2 kernel: d-con mysql: error receiving ReportState, e: -5 l: 0!
Dec 18 17:09:43 pn2 kernel: d-con mysql: asender terminated
Dec 18 17:09:43 pn2 kernel: d-con mysql: Terminating asender thread
Dec 18 17:09:43 pn2 kernel: d-con mysql: Connection closed
Dec 18 17:09:43 pn2 kernel: d-con mysql: conn( Disconnecting -> StandAlone )
Dec 18 17:09:43 pn2 kernel: d-con mysql: receiver terminated
Dec 18 17:09:43 pn2 kernel: d-con mysql: Terminating receiver thread

解决方法

备节点

PN2备节点强制将为secondary角色:

/etc/init.d/heartbeat stop
drbdadm secondary mysql
drbdadm connect --discard-my-data mysql

节点状态

cat /proc/drbd

version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@pn2, 2013-12-06 15:08:20
 0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
    ns:0 nr:0 dw:740 dr:4405 al:10 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:316

主节点

PN1主节点重新链接:

drbdadm connect mysql
cat /proc/drbd

version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@pn1, 2013-12-06 14:48:27
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:20576 nr:0 dw:500 dr:24973 al:12 bm:12 lo:2 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

验证

PN1主节点

drbdadm verify mysql

PN1主节点日志

tail /var/log/messages

Dec 18 18:05:38 pn1 kernel: block drbd0: conn( Connected -> VerifyS )
Dec 18 18:05:38 pn1 kernel: block drbd0: Starting Online Verify from sector 0
Dec 18 18:10:00 pn1 kernel: block drbd0: Online verify  done (total 262 sec; paused 0 sec; 40020 K/sec)
Dec 18 18:10:00 pn1 kernel: block drbd0: conn( VerifyS -> Connected )
Dec 18 18:10:00 pn1 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies
Dec 18 18:10:00 pn1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.

PN2备节点日志

tail /var/log/messages

Dec 18 18:05:37 pn2 kernel: block drbd0: conn( Connected -> VerifyT )
Dec 18 18:05:37 pn2 kernel: block drbd0: Online Verify start sector: 0
Dec 18 18:10:00 pn2 kernel: block drbd0: Online verify  done (total 262 sec; paused 0 sec; 40020 K/sec)
Dec 18 18:10:00 pn2 kernel: block drbd0: conn( VerifyT -> Connected )
Dec 18 18:10:00 pn2 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies
Dec 18 18:10:00 pn2 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.

小结

  • drbd/mysqld 服务关闭,不会触发VIP切换;
  • heartbeat 不能接收到对方的心跳包,并且认为对方节点死亡,才能切换;
  • heartbeat 释放资源流程 ,停止mysqld服务,umount 设备,drbd 进入secondary角色,停止vip
  • 关闭 heartbeat 服务,节点自动释放资源,可以用于日常维护。

参考

使用 DRBD 与 Heartbeat 实现 Mysql 高可用

回到页面顶部