跳转至

MooseFS 2.x 破坏性测试


2015-05-18 by dongnan

环境

操作系统: CentOS 6.6 amd64

配置

ChunkServer: 物理机 4Core CPU/16GB Mem/单磁盘 1TB/1GB NIC
Master: VM 4Core CPU/4GB Mem/100GB Disk/1GB NIC
Client: VM 4Core CPU/4GB Mem/50GB Disk/1GB NIC

IP

master: 172.27.244.69
client: 172.27.244.99
chunk1: 172.27.244.31
chunk2: 172.27.244.32
chunk3: 172.27.244.33

目标

  • MooseFS 集群灾难性测试。
  • 测试环境,客户端没有写操作。
  • 模拟电力故障(2次直接拔电源),导致 MasterChunkServer全部故障。

错误信息

重新服务器后 Master 无法启动

/etc/init.d/mfsmaster status
mfsmaster is stopped

错误日志

tail /var/log/messages
Apr 27 11:32:31 test6 mfsmaster[20600]: set gid to 499
Apr 27 11:32:31 test6 mfsmaster[20600]: set uid to 498
Apr 27 11:32:31 test6 mfsmaster[20600]: monotonic clock function: clock_gettime
Apr 27 11:32:31 test6 mfsmaster[20600]: monotonic clock speed: 5334 ops / 10 mili seconds
Apr 27 11:32:31 test6 mfsmaster[20600]: exports file has been loaded
Apr 27 11:32:31 test6 mfsmaster[20600]: topology file has been loaded
Apr 27 11:32:31 test6 mfsmaster[20600]: can't find metadata.mfs - try using option '-a'
Apr 27 11:32:31 test6 mfsmaster[20600]: init: metadata manager failed !!!
Apr 27 11:32:31 test6 mfsmaster[20600]: exititng ...
Apr 27 11:32:31 test6 mfsmaster[20600]: process exited successfully (status:1)

元数据目录

cd /var/lib/mfs

# 元数据文件
ll metadata.mfs.back*
-rw-r----- 1 mfs mfs 44685761 4月  26 08:00 metadata.mfs.back
-rw-r----- 1 mfs mfs 44685761 4月  26 07:00 metadata.mfs.back.1

解决方法

备份数据

cp -r /var/lib/mfs/ /root/mfs-bakcup

尝试恢复

# 输入命令
/usr/sbin/mfsmaster -a

open files limit has been set to: 4096
working directory: /var/lib/mfs # 注意
lockfile created and locked
initializing mfsmaster modules ...
exports file has been loaded
topology file has been loaded
loading metadata ...            # 注意
loading sessions data ... ok (0.0000)
loading objects (files,directories,etc.) ... ok (0.6090)
loading names ... ok (0.7774)
loading deletion timestamps ... ok (0.0005)
loading quota definitions ... ok (0.0000)
loading xattr data ... ok (0.0000)
loading posix_acl data ... ok (0.0000)
loading open files data ... ok (0.0000)
loading chunkservers data ... ok (0.0001)
loading chunks data ... ok (0.4473)
checking filesystem consistency ... ok
connecting files and chunks ... ok
all inodes: 485719
directory inodes: 494
file inodes: 485225
chunks: 485460
metadata file has been loaded   # 注意
stats file has been loaded
master <-> metaloggers module: listen on *:9419
master <-> chunkservers module: listen on *:9420
main master server module: listen on *:9421
mfsmaster daemon initialized properly

Master状态

/etc/init.d/mfsmaster status
mfsmaster (pid 21404) is running...

元数据目录

维护MFS最重要的是维护元数据服务器,而元数据服务器最重要的目录为/var/lib/mfs (安装可能不同), MFS数据的存储、修改、更新等操作变化都会记录在这个目录的某个文件中, 因此只要保证这个目录的数据安全,就能保证整个MFS文件系统的安全性和可靠性。

/var/lib/mfs 目录下的数据由两部分组成:

  • 一部分是元数据服务器的改变日志文件,文件名称类似于changelog.*.mfs
  • 另一部分是元数据文件metadata.mfs,运行 mfsmaster时该文件会被命名为 metadata.mfs.back

只要保证了这两部分数据的安全,即使元数据服务器遭到致命的破坏,也可以通过备份的元数据文件重新部署 master服务器。

验证

MFS集群恢复

命令帮助

/usr/sbin/mfsmaster -h
usage: /usr/sbin/mfsmaster [-vfun] [-t
locktimeout] [-c cfgfile] [-i] [-a] [-e] [-x [-x]]
[start|stop|restart|reload|info|test|kill]

-v : print version number and exit
-f : run in foreground
-u : log undefined config variables
-n : do not attempt to increase limit of core dump size
-t locktimeout : how long wait for lockfile
-c cfgfile : use given config file
-i : ignore some metadata structure errors (attach orphans to root, ignore names without inode, etc.)
-a : automatically restore metadata from change logs    # 注意
-e : start without metadata (download all from leader)
-x : produce more verbose output
-xx : even more verbose output

小结

  • Master 元数据 与 ChunkServer 数据未丢失所以数据才能恢复,包括副本设置为1的文件。
  • 也就是说保证 Master元数据(drbd)安全及足够的副本(3个副本),就可以保障数据安全。
回到页面顶部