RC、Deployment、DaemonSet都是面向無狀態的服務,它們所管理的Pod的IP、Hostname、啓停順序等都是隨機的,被管理的Pod重建時,Pod的IP、Hostname都會有變化。而StatefulSet是有狀態的集合,管理所有有狀態的服務,比如MySQL、MongoDB集羣等。
StatefulSet本質上是Deployment的一種變體,在v1.9版本中已成爲GA版本,它爲了解決有狀態服務的問題,它所管理的Pod擁有固定的Pod名稱、啓停順序;在StatefulSet中,Pod名字稱爲網絡標識(hostname),還必須要用到共享存儲。
在Deployment中,與之對應的服務是service,而在StatefulSet中與之對應的headless service,headless service,即無頭服務,與service的區別就是它沒有Cluster IP,解析它的名稱時將返回該Headless Service對應的全部Pod的Endpoint列表。
以redis cluster爲例,由於各redis container 的角色不一定相同(有master、slave之分),所以每個redis container被重建之後必須保持原有的hostname,必須掛載原有的volume,這樣才能保證每個shard內是正常的。而且每個redis shard 所管理的slot不同,存儲的數據不同,所以要求每個redis shard 所連接的存儲不同,保證數據不會被覆蓋或混亂。(注:在Deployment中 Pod template裏定義的存儲卷,所有副本集共用一個存儲卷,數據是相同的,因爲Pod創建時基於同一模板生成)
爲了保證container所掛載的volume不會出錯,k8s引入了volumeClaimTemplate。
所以具有以下特性的應用使用statefullSet:
1)、穩定且唯一的網絡標識符;
2)、穩定且持久的存儲;
3)、有序、平滑地部署和擴展;
4)、有序、平滑的終止和刪除;
5)、有序的滾動更新;
對於一個完整的StatefulSet應用由三個部分組成: headless service、StatefulSet controller、volumeClaimTemplate。
例1:
由於本例中pv是靜態提供,所以首先準備pv,如下:
[root@k8s-master-dev statefulset]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv01 5Gi RWO,RWX Retain Available 27m
pv02 10Gi RWO,RWX Retain Available 27m
pv03 15Gi RWO,RWX Retain Available 27m
[root@k8s-master-dev statefulset]#
然後定義一個statefulset 應用,如下:
[root@k8s-master-dev statefulset]# vim statefulset-demo.yaml
[root@k8s-master-dev statefulset]# cat statefulset-demo.yaml
apiVersion: v1
kind: Service
metadata:
name: ngx-svc
labels:
app: ngx-svc
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: ngx-pod
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ngx
spec:
serviceName: ngx-svc #聲明它屬於哪個Headless Service
replicas: 2
selector:
matchLabels:
app: ngx-pod #has to match .spec.template.metadata.labels
template:
metadata:
labels:
app: ngx-pod #has to match .spec.selector.matchLabels
spec:
containers:
- name: ngx
image: nginx:1.15-alpine
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
name: web
volumeMounts:
- name: ngxvol
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: ngxvol
spec:
accessModes: ["ReadWriteMany"]
resources:
requests:
storage: 5Gi
[root@k8s-master-dev statefulset]# kubectl apply -f statefulset-demo.yaml
service/ngx-svc created
statefulset.apps/ngx created
[root@k8s-master-dev statefulset]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 1d
ngx-svc ClusterIP None <none> 80/TCP 15s
[root@k8s-master-dev statefulset]# kubectl get sts
NAME DESIRED CURRENT AGE
ngx 2 2 30s
[root@k8s-master-dev statefulset]# kubectl get pods
NAME READY STATUS RESTARTS AGE
ngx-0 1/1 Running 0 35s
ngx-1 1/1 Running 0 34s
[root@k8s-master-dev statefulset]#
每個podname 被定義爲pod.name-0、pod.name-1... 依次類推。而每個pod的FQDN名被定義爲: $(pod.name).(headless server name).namespace.svc.cluster.local
每個PVC 的名稱又由兩個部分組成:$(volumeClaimTemplates.name)-(pod.name) ,代表該PVC由哪個volumeClaimTemplates申請創建,且永遠被掛載到$(pod.name)上。當原Pod被刪除之後,PVC保持不變,數據不會丟失(手動刪除pvc將自動釋放pv)。當新Pod被創建之後,原podname會被繼承,也會再次掛載到原Volume之上。
[root@k8s-master-dev statefulset]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv01 5Gi RWO,RWX Retain Bound default/ngxvol-ngx-0 35m
pv02 10Gi RWO,RWX Retain Bound default/ngxvol-ngx-1 35m
pv03 15Gi RWO,RWX Retain Available 35m
[root@k8s-master-dev statefulset]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ngxvol-ngx-0 Bound pv01 5Gi RWO,RWX 5m
ngxvol-ngx-1 Bound pv02 10Gi RWO,RWX 5m
[root@k8s-master-dev statefulset]#
[root@k8s-master-dev manifests]# kubectl exec -it ngx-0 -- /bin/sh
/ # nslookup ngx-0.ngx-svc.default.svc.cluster.local
nslookup: can't resolve '(null)': Name does not resolve
Name: ngx-0.ngx-svc.default.svc.cluster.local
Address 1: 10.244.4.2 ngx-0.ngx-svc.default.svc.cluster.local
/ #
/ #
/ # nslookup ngx-1.ngx-svc.default.svc.cluster.local
nslookup: can't resolve '(null)': Name does not resolve
Name: ngx-1.ngx-svc.default.svc.cluster.local
Address 1: 10.244.1.101 ngx-1.ngx-svc.default.svc.cluster.local
/ #
/ # [root@k8s-master-dev manifests]#
[root@k8s-master-dev statefulset]# kubectl exec ngx-0 -- ls /usr/share/nginx/html
[root@k8s-master-dev statefulset]# kubectl exec -it ngx-0 -- /bin/sh
/ # echo ngx-0 > /usr/share/nginx/html/index.html
/ # [root@k8s-master-dev statefulset]#
[root@k8s-master-dev statefulset]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
ngx-0 1/1 Running 0 9m 10.244.1.98 k8s-node1-dev <none>
ngx-1 1/1 Running 0 9m 10.244.2.63 k8s-node2-dev <none>
[root@k8s-master-dev statefulset]# curl http://10.244.1.98
ngx-0
[root@k8s-master-dev statefulset]# kubectl delete pod/ngx-0
pod "ngx-0" deleted
[root@k8s-master-dev statefulset]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
ngx-0 1/1 Running 0 8s 10.244.1.99 k8s-node1-dev <none>
ngx-1 1/1 Running 0 10m 10.244.2.63 k8s-node2-dev <none>
[root@k8s-master-dev statefulset]# curl http://10.244.1.99
ngx-0
[root@k8s-master-dev statefulset]#
pod的擴展、收縮都是按照順序執行。如下所示:
[root@k8s-master-dev statefulset]# kubectl scale sts ngx --replicas=3
statefulset.apps/ngx scaled
[root@k8s-master-dev statefulset]# kubectl get pods
NAME READY STATUS RESTARTS AGE
ngx-0 1/1 Running 0 8m
ngx-1 1/1 Running 0 18m
ngx-2 1/1 Running 0 3s
[root@k8s-master-dev statefulset]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ngxvol-ngx-0 Bound pv01 5Gi RWO,RWX 18m
ngxvol-ngx-1 Bound pv02 10Gi RWO,RWX 18m
ngxvol-ngx-2 Bound pv03 15Gi RWO,RWX 12s
[root@k8s-master-dev statefulset]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv01 5Gi RWO,RWX Retain Bound default/ngxvol-ngx-0 49m
pv02 10Gi RWO,RWX Retain Bound default/ngxvol-ngx-1 48m
pv03 15Gi RWO,RWX Retain Bound default/ngxvol-ngx-2 48m
[root@k8s-master-dev statefulset]# kubectl patch sts ngx -p '{"spec":{"replicas":2}}'
statefulset.apps/ngx patched
[root@k8s-master-dev statefulset]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ngxvol-ngx-0 Bound pv01 5Gi RWO,RWX 20m
ngxvol-ngx-1 Bound pv02 10Gi RWO,RWX 20m
ngxvol-ngx-2 Bound pv03 15Gi RWO,RWX 1m
[root@k8s-master-dev statefulset]# kubectl get pods
NAME READY STATUS RESTARTS AGE
ngx-0 1/1 Running 0 9m
ngx-1 1/1 Running 0 20m
[root@k8s-master-dev statefulset]#
[root@k8s-master-dev statefulset]# kubectl delete -f statefulset-demo.yaml
service "ngx-svc" deleted
statefulset.apps "ngx" deleted
[root@k8s-master-dev statefulset]# kubectl delete pvc --all
persistentvolumeclaim "ngxvol-ngx-0" deleted
persistentvolumeclaim "ngxvol-ngx-1" deleted
persistentvolumeclaim "ngxvol-ngx-2" deleted
[root@k8s-master-dev statefulset]# kubectl delete -f ../volumes/pv-vol-demo.yaml
persistentvolume "pv01" deleted
persistentvolume "pv02" deleted
persistentvolume "pv03" deleted
[root@k8s-master-dev statefulset]#
更新策略
在Kubernetes 1.7及更高版本中,通過.spec.updateStrategy字段允許配置或禁用Pod、labels、source request/limits、annotations自動滾動更新功能。
OnDelete:通過.spec.updateStrategy.type 字段設置爲OnDelete,StatefulSet控制器不會自動更新StatefulSet中的Pod。用戶必須手動刪除Pod,以使控制器創建新的Pod。
RollingUpdate:通過.spec.updateStrategy.type 字段設置爲RollingUpdate,實現了Pod的自動滾動更新,如果.spec.updateStrategy未指定,則此爲默認策略。
StatefulSet控制器將刪除並重新創建StatefulSet中的每個Pod。它將以Pod終止(從最大序數到最小序數)的順序進行,一次更新每個Pod。在更新下一個Pod之前,必須等待這個Pod Running and Ready。
Partitions:通過指定 .spec.updateStrategy.rollingUpdate.partition 來對 RollingUpdate 更新策略進行分區,如果指定了分區,則當 StatefulSet 的 .spec.template 更新時,具有大於或等於分區序數的所有 Pod 將被更新。
具有小於分區的序數的所有 Pod 將不會被更新,即使刪除它們也將被重新創建。如果 StatefulSet 的 .spec.updateStrategy.rollingUpdate.partition 大於其 .spec.replicas,則其 .spec.template 的更新將不會傳播到 Pod。在大多數情況下,不需要使用分區。
修改更新策略及更新image 例:
kubectl patch sts ngx -p '{"spec":{"updateStrategy":{"rollingUpdate":{"partition":4}}}}'
kubectl set image sts/ngx ngx=nginx:latest