前言
整個k8s諸多組件幾乎都是無狀態的,所有的數據保存在etcd裏,可以說etcd是整個k8s集羣的數據庫。可想而知,etcd的重要性。因而做好etcd數據備份工作至關重要。這篇主要講一下我司的相關的實踐。
備份etcd數據到s3
能做etcd的備份方案很多,但是大同小異,基本上都是利用了etcdctl命令來完成。
爲什麼選擇s3那?
- 因爲我們單位對於aws使用比較多,另外我們希望我們備份到一個高可用的存儲中,而不是部署etcd的本機中。
- 此外,s3支持存儲的生命週期的設置。設置一下,就可以aws幫助我們定時刪除舊數據,保留新的備份數據。
具體方案
我們基本上用了etcd-backup這個項目,當然也fork了,做了稍微的更改,主要是更改了dockerfile。將etcdctl 修改爲我們線上實際的版本。
修改之後的dockerfile如下:
FROM alpine:3.8
RUN apk add --no-cache curl
# Get etcdctl
ENV ETCD_VER=v3.2.24
RUN \
cd /tmp && \
curl -L https://storage.googleapis.com/etcd/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz | \
tar xz -C /usr/local/bin --strip-components=1
COPY ./etcd-backup /
ENTRYPOINT ["/etcd-backup"]
CMD ["-h"]
之後就是docker build之類了。
k8s部署方案
選擇k8s中的cronjob比較合適,我的備份策略是每三小時備份一次。
cronjob.yaml:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: etcd-backup
namespace: kube-system
spec:
schedule: "* */3 * * *"
successfulJobsHistoryLimit: 2
failedJobsHistoryLimit: 2
jobTemplate:
spec:
# Job timeout
activeDeadlineSeconds: 300
template:
spec:
tolerations:
# Tolerate master taint
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
# Container creates etcd backups.
# Run container in host network mode on G8s masters
# to be able to use 127.0.0.1 as etcd address.
# For etcd v2 backups container should have access
# to etcd data directory. To achive that,
# mount /var/lib/etcd3 as a volume.
nodeSelector:
node-role.kubernetes.io/master: ""
containers:
- name: etcd-backup
image: iyacontrol/etcd-backup:0.1
args:
# backup guest clusters only on production instalations
# testing installation can have many broken guest clusters
- -prefix=k8s-prod-1
- -etcd-v2-datadir=/var/lib/etcd
- -etcd-v3-endpoints=https://172.xx.xx.221:2379,https://172.xx.xx.83:2379,https://172.xx.xx.246:2379
- -etcd-v3-cacert=/certs/ca.crt
- -etcd-v3-cert=/certs/server.crt
- -etcd-v3-key=/certs/server.key
- -aws-s3-bucket=mybucket
- -aws-s3-region=us-east-1
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-datadir
- mountPath: /certs
name: etcd-certs
env:
- name: ETCDBACKUP_AWS_ACCESS_KEY
valueFrom:
secretKeyRef:
name: etcd-backup
key: ETCDBACKUP_AWS_ACCESS_KEY
- name: ETCDBACKUP_AWS_SECRET_KEY
valueFrom:
secretKeyRef:
name: etcd-backup
key: ETCDBACKUP_AWS_SECRET_KEY
- name: ETCDBACKUP_PASSPHRASE
valueFrom:
secretKeyRef:
name: etcd-backup
key: ETCDBACKUP_PASSPHRASE
volumes:
- name: etcd-datadir
hostPath:
path: /var/lib/etcd
- name: etcd-certs
hostPath:
path: /etc/kubernetes/pki/etcd/
# Do not restart pod, job takes care on restarting failed pod.
restartPolicy: Never
hostNetwork: true
注意:容忍 和 nodeselector配合,讓pod調度到master節點上。
然後secret.yaml:
apiVersion: v1
kind: Secret
metadata:
name: etcd-backup
namespace: kube-system
type: Opaque
data:
ETCDBACKUP_AWS_ACCESS_KEY: QUtJTI0TktCT0xQRlEK
ETCDBACKUP_AWS_SECRET_KEY: aXJ6eThjQnM2MVRaSkdGMGxDeHhoeFZNUDU4ZGRNbgo=
ETCDBACKUP_PASSPHRASE: ""
總結
之前我們嘗試過,etcd-operator來完成backup。實際使用過程中,發現並不好,概念很多,組件複雜,代碼很多寫法太死。
最後選擇etcd-backup。主要是因爲簡單,less is more。看源碼,用golang編寫,擴展自己的一些需求,也比較簡單。