Kubernetes 集群-数据备份
2020-05-03 by dongnan
环境
在前面的三篇文章中,使用 Kubeadm
部署了 K8S
集群环境:
测试的K8S
集群由一个Master
管理节点、两个Worker
计算节点组成。
目标
备份 K8S
集群 Master
的重要数据。
备份计划
由于K8S
集群部署在阿里云上,计划使用ECS
快照备份(全量) + ETCD
等核心数据(增量)备份。
ECS快照每8
小时进行一次快照操作,每天可以进行3
次快照,ECS
快照保留7
天:
- 恢复数据时,找到最近时间的ECS快照恢复。
K8S集群核心数据保存在ETCD
内,备份程序每5
分钟同步一次,同步的数据保存在第2
块磁盘上:
- 恢复数据时,将备份的数据拷贝回到对应的目录。
- 使用第
2
块磁盘,是对应主磁盘快照不可用极端情况下,通过kubeadm
重建集群恢复数据。
系统快照全量备份 + 核心数据增量备份,这个方案仍然有5
分钟的盲区,可能会丢失5分钟内的数据。
目标数据
Master
使用的命名空间为 kube-system
:
kubectl -n kube-system get pod
其中系统Pod
带有数据卷(需要备份)有3
个:
etcd-khost0
kube-apiserver-khost0
kube-scheduler-khost0
其它的 calico-node-xxx
与 kube-proxy-xxx
是每个 woker
节点的 Pod
容器,非重要数据。
etcd
Master
的ETCD
数据卷有两个:
kubectl -n kube-system describe pod etcd-khost0
Name: etcd-khost0
Namespace: kube-system
#...省略
Volumes:
etcd-certs:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/pki/etcd
HostPathType: DirectoryOrCreate
etcd-data:
Type: HostPath (bare host directory volume)
Path: /var/lib/etcd
HostPathType: DirectoryOrCreate
- 证书保存在目录
/etc/kubernetes/pki/etcd
- 数据保存在目录
/var/lib/etcd
数据目录
du -sh /var/lib/etcd/member/*
131M /var/lib/etcd/member/snap
367M /var/lib/etcd/member/wal
ll /var/lib/etcd/member/snap/
total 133468
drwx------ 2 root root 4096 Apr 26 14:21 ./
drwx------ 4 root root 4096 Apr 3 13:00 ../
-rw-r--r-- 1 root root 7135 Apr 26 10:41 0000000000000002-00000000005af394.snap
-rw-r--r-- 1 root root 7135 Apr 26 11:36 0000000000000002-00000000005b1aa5.snap
-rw-r--r-- 1 root root 7135 Apr 26 12:31 0000000000000002-00000000005b41b6.snap
-rw-r--r-- 1 root root 7135 Apr 26 13:25 0000000000000002-00000000005b68c8.snap
-rw-r--r-- 1 root root 7135 Apr 26 14:20 0000000000000002-00000000005b8fd9.snap
-rw------- 1 root root 136597504 Apr 26 14:29 db
ll /var/lib/etcd/member/wal/
total 375024
drwx------ 2 root root 4096 Apr 26 14:18 ./
drwx------ 4 root root 4096 Apr 3 13:00 ../
-rw------- 1 root root 64000648 Apr 25 06:18 000000000000002f-00000000005477dc.wal
-rw------- 1 root root 64000888 Apr 25 17:00 0000000000000030-0000000000563e46.wal
-rw------- 1 root root 64000968 Apr 26 03:42 0000000000000031-00000000005804a2.wal
-rw------- 1 root root 64002696 Apr 26 14:18 0000000000000032-000000000059cae5.wal
-rw------- 1 root root 64000000 Apr 26 14:29 0000000000000033-00000000005b8e21.wal
-rw------- 1 root root 64000000 Apr 26 14:18 1.tmp
注意,ETCD
保存的时间范围。
证书目录
证书目录保存有K8S
集群使用的关键证书:
ll /etc/kubernetes/pki/etcd/
total 40
drwxr-xr-x 2 root root 4096 Apr 3 12:59 ./
drwxr-xr-x 3 root root 4096 Apr 3 12:59 ../
-rw-r--r-- 1 root root 1017 Apr 3 12:59 ca.crt
-rw------- 1 root root 1675 Apr 3 12:59 ca.key
-rw-r--r-- 1 root root 1094 Apr 3 12:59 healthcheck-client.crt
-rw------- 1 root root 1675 Apr 3 12:59 healthcheck-client.key
-rw-r--r-- 1 root root 1127 Apr 3 12:59 peer.crt
-rw------- 1 root root 1679 Apr 3 12:59 peer.key
-rw-r--r-- 1 root root 1127 Apr 3 12:59 server.crt
-rw------- 1 root root 1679 Apr 3 12:59 server.key
apiserver
apiserver
的数据卷有5
个:
kubectl -n kube-system describe pod kube-apiserver-khost0
Name: kube-apiserver-khost0
Namespace: kube-system
#...省略
Volumes:
ca-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs
HostPathType: DirectoryOrCreate
etc-ca-certificates:
Type: HostPath (bare host directory volume)
Path: /etc/ca-certificates
HostPathType: DirectoryOrCreate
k8s-certs:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/pki
HostPathType: DirectoryOrCreate
usr-local-share-ca-certificates:
Type: HostPath (bare host directory volume)
Path: /usr/local/share/ca-certificates
HostPathType: DirectoryOrCreate
usr-share-ca-certificates:
Type: HostPath (bare host directory volume)
Path: /usr/share/ca-certificates
HostPathType: DirectoryOrCreate
注意,避免数据重复备份,上列中 etcd
数据备份的 /etc/kubernetes/pki/etcd/
目录 是 /etc/kubernetes/pki
子目录。
scheduler
scheduler
的数据卷只有一个,映射的配置文件:
kubectl -n kube-system describe pod kube-scheduler-khost0
Name: kube-scheduler-khost0
Namespace: kube-system
#...省略
Volumes:
kubeconfig:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/scheduler.conf
HostPathType: FileOrCreate
备份程序
使用 python
编写的脚本程序完成备份工作。
脚本
代码如下:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
"""
# backup k8s meta data.
# k8s-etcd
etcd = {
# "etcdCerts": '/etc/kubernetes/pki/etcd/', # duplicate
"etcd": '/var/lib/etcd/'
}
# k8s-api-server
apiServer = {
"ca-certs": '/etc/ssl/certs/',
"etc-ca-certificates": '/etc/ca-certificates/',
# "k8s-certs": '/etc/kubernetes/pki/', # duplicate
"usr-local-share-ca-certificates": '/usr/local/share/ca-certificates/',
"usr-share-ca-certificates": '/usr/share/ca-certificates/'
}
# k8s-scheduler
scheduler = {
# "kubeconfig": '/etc/kubernetes/scheduler.conf'
"kubeconfig": '/etc/kubernetes/'
}
# rancher
rancher = {
"rancher": '/var/lib/docker/volumes/rancher-data/'
}
from collections import ChainMap
item = ChainMap(etcd, apiServer, scheduler, rancher)
"""
import os
import datetime
def sync_k8s_meta_data(dirs):
log_date = datetime.datetime.today().strftime("%Y-%m-%d %H:%M:%S")
log_dir = '/root/shell/backup.log'
cmd_dir = '/usr/bin/rsync'
backup_dir = '/backup/k8s/'
print(log_date)
for k0, v0 in dirs.items():
dir0 = os.path.join(backup_dir, k0)
if not os.path.exists(dir0): os.makedirs(dir0)
print('%s:' % k0)
for k1, v1 in v0.items():
dir1 = os.path.join(dir0, k1)
if not os.path.exists(v1):
print("%s not found !!!" % v1)
else:
print("%s -av --delete %s %s/" % (cmd_dir, v1, dir1))
os.system("%s -a --delete %s %s/" % (cmd_dir, v1, dir1))
if __name__ == '__main__':
k8s_dirs = {
'etcd': {'etcd-data': '/var/lib/etcd/'},
'scheduler': {'kubeconfig': '/etc/kubernetes/'},
'rancher': {'rancher-data': '/var/lib/docker/volumes/rancher-data/'},
'apiserver': {
'usr-share-ca-certificates': '/usr/share/ca-certificates/',
'usr-local-share-ca-certificates': '/usr/local/share/ca-certificates/',
'etc-ca-certificates': '/etc/ca-certificates/',
'ca-certs': '/etc/ssl/certs/'},
}
sync_k8s_meta_data(k8s_dirs)
需要注意的是,备份脚本需要 rsync
支持,如果没有需要安装它,如:
apt install rsync
使用
例如:
python3 /root/shell/k8s.py
任务计划,每5分钟执行一次:
# 赋予可执行权限
chmod + x /root/shell/k8s.py
crontab -l | tail -n 1
*/5 * * * * /root/shell/k8s.py >> /root/shell/cron.log 2>&1
验证
查看日志:
head /root/shell/cron.log
2020-04-20 10:55:01
etcd:
/usr/bin/rsync -av --delete /var/lib/etcd/ /data/backup/k8s/etcd/etcd-data/
scheduler:
/usr/bin/rsync -av --delete /etc/kubernetes/ /data/backup/k8s/scheduler/kubeconfig/
rancher:
/usr/bin/rsync -av --delete /var/lib/docker/volumes/rancher-data/ /data/backup/k8s/rancher/rancher-data/
apiserver:
/usr/bin/rsync -av --delete /usr/share/ca-certificates/ /data/backup/k8s/apiserver/usr-share-ca-certificates/
/usr/bin/rsync -av --delete /usr/local/share/ca-certificates/ /data/backup/k8s/apiserver/usr-local-share-ca-certificates/
/usr/bin/rsync -av --delete /etc/ca-certificates/ /data/backup/k8s/apiserver/etc-ca-certificates/
/usr/bin/rsync -av --delete /etc/ssl/certs/ /data/backup/k8s/apiserver/ca-certs/
检查数据:
# 备份窗口时间内只有两文件不同
diff -r etcd/etcd-data/ /var/lib/etcd/
Binary files etcd/etcd-data/member/snap/db and /var/lib/etcd/member/snap/db differ
Binary files etcd/etcd-data/member/wal/00000000000005d3-000000000a0ae8ee.wal and /var/lib/etcd/member/wal/00000000000005d3-000000000a0ae8ee.wal differ
小结
在K8S
运维实践中,集群有备份、删除、添加、更新维护等需求,下一篇说说如何为集群删除节点。