跳转至

Kubernetes 集群-数据备份


2020-05-03 by dongnan

环境

在前面的三篇文章中,使用 Kubeadm 部署了 K8S集群环境:

测试的K8S集群由一个Master管理节点、两个Worker计算节点组成。

目标

备份 K8S 集群 Master 的重要数据。

备份计划

由于K8S集群部署在阿里云上,计划使用ECS快照备份(全量) + ETCD等核心数据(增量)备份。

ECS快照每8小时进行一次快照操作,每天可以进行3次快照,ECS快照保留7天:

  • 恢复数据时,找到最近时间的ECS快照恢复。

K8S集群核心数据保存在ETCD内,备份程序每5分钟同步一次,同步的数据保存在第2块磁盘上:

  • 恢复数据时,将备份的数据拷贝回到对应的目录。
  • 使用第2块磁盘,是对应主磁盘快照不可用极端情况下,通过kubeadm重建集群恢复数据。

系统快照全量备份 + 核心数据增量备份,这个方案仍然有5分钟的盲区,可能会丢失5分钟内的数据

目标数据

Master 使用的命名空间为 kube-system

kubectl -n kube-system get pod

其中系统Pod带有数据卷(需要备份)有3个:

  • etcd-khost0
  • kube-apiserver-khost0
  • kube-scheduler-khost0

其它的 calico-node-xxxkube-proxy-xxx 是每个 woker 节点的 Pod容器,非重要数据。

etcd

MasterETCD数据卷有两个:

kubectl -n kube-system describe pod etcd-khost0 
Name:                 etcd-khost0
Namespace:            kube-system
#...省略
Volumes:
  etcd-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/pki/etcd
    HostPathType:  DirectoryOrCreate
  etcd-data:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/etcd
    HostPathType:  DirectoryOrCreate
  • 证书保存在目录/etc/kubernetes/pki/etcd
  • 数据保存在目录/var/lib/etcd
数据目录
du -sh /var/lib/etcd/member/*
131M    /var/lib/etcd/member/snap
367M    /var/lib/etcd/member/wal

ll /var/lib/etcd/member/snap/
total 133468
drwx------ 2 root root      4096 Apr 26 14:21 ./
drwx------ 4 root root      4096 Apr  3 13:00 ../
-rw-r--r-- 1 root root      7135 Apr 26 10:41 0000000000000002-00000000005af394.snap
-rw-r--r-- 1 root root      7135 Apr 26 11:36 0000000000000002-00000000005b1aa5.snap
-rw-r--r-- 1 root root      7135 Apr 26 12:31 0000000000000002-00000000005b41b6.snap
-rw-r--r-- 1 root root      7135 Apr 26 13:25 0000000000000002-00000000005b68c8.snap
-rw-r--r-- 1 root root      7135 Apr 26 14:20 0000000000000002-00000000005b8fd9.snap
-rw------- 1 root root 136597504 Apr 26 14:29 db

ll /var/lib/etcd/member/wal/
total 375024
drwx------ 2 root root     4096 Apr 26 14:18 ./
drwx------ 4 root root     4096 Apr  3 13:00 ../
-rw------- 1 root root 64000648 Apr 25 06:18 000000000000002f-00000000005477dc.wal
-rw------- 1 root root 64000888 Apr 25 17:00 0000000000000030-0000000000563e46.wal
-rw------- 1 root root 64000968 Apr 26 03:42 0000000000000031-00000000005804a2.wal
-rw------- 1 root root 64002696 Apr 26 14:18 0000000000000032-000000000059cae5.wal
-rw------- 1 root root 64000000 Apr 26 14:29 0000000000000033-00000000005b8e21.wal
-rw------- 1 root root 64000000 Apr 26 14:18 1.tmp

注意,ETCD保存的时间范围。

证书目录

证书目录保存有K8S集群使用的关键证书:

ll /etc/kubernetes/pki/etcd/

total 40
drwxr-xr-x 2 root root 4096 Apr  3 12:59 ./
drwxr-xr-x 3 root root 4096 Apr  3 12:59 ../
-rw-r--r-- 1 root root 1017 Apr  3 12:59 ca.crt
-rw------- 1 root root 1675 Apr  3 12:59 ca.key
-rw-r--r-- 1 root root 1094 Apr  3 12:59 healthcheck-client.crt
-rw------- 1 root root 1675 Apr  3 12:59 healthcheck-client.key
-rw-r--r-- 1 root root 1127 Apr  3 12:59 peer.crt
-rw------- 1 root root 1679 Apr  3 12:59 peer.key
-rw-r--r-- 1 root root 1127 Apr  3 12:59 server.crt
-rw------- 1 root root 1679 Apr  3 12:59 server.key

apiserver

apiserver 的数据卷有5个:

kubectl -n kube-system describe pod kube-apiserver-khost0
Name:                 kube-apiserver-khost0
Namespace:            kube-system
#...省略
Volumes:
  ca-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/ssl/certs
    HostPathType:  DirectoryOrCreate
  etc-ca-certificates:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/ca-certificates
    HostPathType:  DirectoryOrCreate
  k8s-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/pki
    HostPathType:  DirectoryOrCreate
  usr-local-share-ca-certificates:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/local/share/ca-certificates
    HostPathType:  DirectoryOrCreate
  usr-share-ca-certificates:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/share/ca-certificates
    HostPathType:  DirectoryOrCreate

注意,避免数据重复备份,上列中 etcd 数据备份的 /etc/kubernetes/pki/etcd/目录 是 /etc/kubernetes/pki 子目录。

scheduler

scheduler 的数据卷只有一个,映射的配置文件:

kubectl -n kube-system describe pod kube-scheduler-khost0 
Name:                 kube-scheduler-khost0
Namespace:            kube-system
#...省略
Volumes:
  kubeconfig:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/scheduler.conf
    HostPathType:  FileOrCreate

备份程序

使用 python 编写的脚本程序完成备份工作。

脚本

代码如下:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

"""
# backup k8s meta data.

# k8s-etcd
etcd = {
    # "etcdCerts": '/etc/kubernetes/pki/etcd/', # duplicate
    "etcd": '/var/lib/etcd/'
}
# k8s-api-server
apiServer = {
    "ca-certs": '/etc/ssl/certs/',
    "etc-ca-certificates": '/etc/ca-certificates/',
    # "k8s-certs": '/etc/kubernetes/pki/', # duplicate
    "usr-local-share-ca-certificates": '/usr/local/share/ca-certificates/',
    "usr-share-ca-certificates": '/usr/share/ca-certificates/'
}
# k8s-scheduler
scheduler = {
    # "kubeconfig": '/etc/kubernetes/scheduler.conf'
    "kubeconfig": '/etc/kubernetes/'
}
# rancher
rancher = {
    "rancher": '/var/lib/docker/volumes/rancher-data/'
}

from collections import ChainMap
item = ChainMap(etcd, apiServer, scheduler, rancher)
"""

import os
import datetime


def sync_k8s_meta_data(dirs):
    log_date = datetime.datetime.today().strftime("%Y-%m-%d %H:%M:%S")
    log_dir = '/root/shell/backup.log'
    cmd_dir = '/usr/bin/rsync'
    backup_dir = '/backup/k8s/'
    print(log_date)

    for k0, v0 in dirs.items():
        dir0 = os.path.join(backup_dir, k0)
        if not os.path.exists(dir0): os.makedirs(dir0)
        print('%s:' % k0)

        for k1, v1 in v0.items():
            dir1 = os.path.join(dir0, k1)

            if not os.path.exists(v1):
                print("%s not found !!!" % v1)
            else:
                print("%s -av --delete %s %s/" % (cmd_dir, v1, dir1))
                os.system("%s -a --delete %s %s/" % (cmd_dir, v1, dir1))

if __name__ == '__main__':

    k8s_dirs = {
        'etcd': {'etcd-data': '/var/lib/etcd/'},
        'scheduler': {'kubeconfig': '/etc/kubernetes/'},
        'rancher': {'rancher-data': '/var/lib/docker/volumes/rancher-data/'},
        'apiserver': {
            'usr-share-ca-certificates': '/usr/share/ca-certificates/',
            'usr-local-share-ca-certificates': '/usr/local/share/ca-certificates/',
            'etc-ca-certificates': '/etc/ca-certificates/',
            'ca-certs': '/etc/ssl/certs/'},
    }

    sync_k8s_meta_data(k8s_dirs)

需要注意的是,备份脚本需要 rsync 支持,如果没有需要安装它,如:

apt install rsync

使用

例如:

python3 /root/shell/k8s.py

任务计划,每5分钟执行一次:

# 赋予可执行权限
chmod + x /root/shell/k8s.py

crontab -l | tail -n 1
*/5 * * * * /root/shell/k8s.py >> /root/shell/cron.log 2>&1

验证

查看日志:

head /root/shell/cron.log

2020-04-20 10:55:01
etcd:
/usr/bin/rsync -av --delete /var/lib/etcd/ /data/backup/k8s/etcd/etcd-data/
scheduler:
/usr/bin/rsync -av --delete /etc/kubernetes/ /data/backup/k8s/scheduler/kubeconfig/
rancher:
/usr/bin/rsync -av --delete /var/lib/docker/volumes/rancher-data/ /data/backup/k8s/rancher/rancher-data/
apiserver:
/usr/bin/rsync -av --delete /usr/share/ca-certificates/ /data/backup/k8s/apiserver/usr-share-ca-certificates/
/usr/bin/rsync -av --delete /usr/local/share/ca-certificates/ /data/backup/k8s/apiserver/usr-local-share-ca-certificates/
/usr/bin/rsync -av --delete /etc/ca-certificates/ /data/backup/k8s/apiserver/etc-ca-certificates/
/usr/bin/rsync -av --delete /etc/ssl/certs/ /data/backup/k8s/apiserver/ca-certs/

检查数据:

# 备份窗口时间内只有两文件不同
diff -r etcd/etcd-data/ /var/lib/etcd/
Binary files etcd/etcd-data/member/snap/db and /var/lib/etcd/member/snap/db differ
Binary files etcd/etcd-data/member/wal/00000000000005d3-000000000a0ae8ee.wal and /var/lib/etcd/member/wal/00000000000005d3-000000000a0ae8ee.wal differ

小结

K8S运维实践中,集群有备份、删除、添加、更新维护等需求,下一篇说说如何为集群删除节点。

参考

回到页面顶部