本文記錄KubeEdge實踐的一些記錄,包括疑問和解決方案。本文不定時更新。
雜項
編譯kubeedge,內存爲2GB會出錯,4G正常。
同一個pod導出節點端口相同,擴容會不成功,因爲節點端口已被佔用。
需要先運行得到配置文件,再修改。注意配置文件位置,注意系統平臺框架,如果是arm平臺,但pause不使用kubeedge/pause-arm:3.1
,則出錯。
檢查主機名稱,必須合規(小寫字母、數字、橫槓-
、點號.
),否則註冊不了,有時返回信息爲err:<nil>
,無法排查。
邊緣端系統需要有默認網關,否則運行會有段錯誤。按issue說法,此已解決,但依然有。
KubeEdge 不完全等同於 k8s,k8s的部分命令還沒有實現。如查看、運行容器的命令就沒有。
我收集的相關的bug
2020.3.19記錄:
不支持kubectl exec
和kubectl logs
命令,官方說後續支持。待觀察。
調度信息不夠。從kubectl describe
中只知道成功調度到了某個節點,至於成功或失敗,不知道。只能到節點機器看用docker logs
查日誌。
問題
無法調度
環境:3臺主機,已部署k8s。清理k8s。
按k8s部署deployment,查看pod,顯示Pending,刪除pod,顯示Terminating。再嘗試,發現有一個pod可運行在其中一節點,擴容,該節點可運行,另一節點Pending。經過一晚,依舊。
強制停止cloudcore 和 edgecore,k8s中的節點顯示NotReady。節點的容器依舊在運行。
疑問:
無法調度,何解?如果優雅關掉pod,再停止cloudcore?目前找不到方法。
雲端打印:
messagehandler.go:448] write error, connection for node edge-node2 will be closed, affected event id: dba8d7ec-ffa4-4c6f-ac6e-accfa527a366, parent_id: , group: resource, source: edgecontroller, resource: default/pod/nginx-deployment-77698bff7d-jdm8k, operation: update, reason tls: use of closed connection
邊緣端打印:
process.go:130] failed to send message: tls: use of closed connection
process.go:196] websocket write error: failed to send message, error: tls: use of closed connection
猜測:連接斷開,但查看node狀態,是Ready狀態,不知何故。
後續:刪除,過一段時間,再部署,成功。
正常連接,跑,一夜後,NotReady狀態。pod不斷銷燬,不斷創建。
# kubectl get pod
NAME READY STATUS RESTARTS AGE
led-light-mapper-deployment-94bbdf88-26h2d 0/1 Terminating 0 14h
led-light-mapper-deployment-94bbdf88-2hwxq 0/1 Terminating 0 90m
led-light-mapper-deployment-94bbdf88-4f8pd 0/1 Terminating 0 80m
led-light-mapper-deployment-94bbdf88-52p9w 0/1 Terminating 0 15m
led-light-mapper-deployment-94bbdf88-8t9cl 0/1 Terminating 0 30m
led-light-mapper-deployment-94bbdf88-9bpt7 0/1 Terminating 0 95m
led-light-mapper-deployment-94bbdf88-9nfk6 0/1 Terminating 0 65m
led-light-mapper-deployment-94bbdf88-c8wtb 0/1 Terminating 0 85m
led-light-mapper-deployment-94bbdf88-kpcx4 0/1 Terminating 0 75m
led-light-mapper-deployment-94bbdf88-kwgqs 0/1 Terminating 0 35m
led-light-mapper-deployment-94bbdf88-l6hn2 0/1 Terminating 0 55m
led-light-mapper-deployment-94bbdf88-pk6fx 0/1 Terminating 0 5m1s
led-light-mapper-deployment-94bbdf88-qk9gj 0/1 Terminating 0 60m
led-light-mapper-deployment-94bbdf88-sgns2 0/1 Terminating 0 100m
led-light-mapper-deployment-94bbdf88-sk8gf 0/1 Terminating 0 20m
led-light-mapper-deployment-94bbdf88-svkgr 0/1 Terminating 0 50m
led-light-mapper-deployment-94bbdf88-tjz7z 0/1 Terminating 0 45m
led-light-mapper-deployment-94bbdf88-vwx7w 0/1 Pending 0 1s
led-light-mapper-deployment-94bbdf88-xfsc8 0/1 Terminating 0 10m
led-light-mapper-deployment-94bbdf88-xpq8k 0/1 Terminating 0 40m
led-light-mapper-deployment-94bbdf88-zhj24 0/1 Terminating 0 25m
led-light-mapper-deployment-94bbdf88-zncjg 0/1 Terminating 0 70m
查邊緣端:
I0319 09:17:05.425874 2147 communicate.go:151] has msg
I0319 09:17:05.426062 2147 communicate.go:155] redo task due to no recv
I0319 09:17:05.427233 2147 communicate.go:151] has msg
I0319 09:17:05.427416 2147 communicate.go:155] redo task due to no recv
I0319 09:17:05.428657 2147 dtcontext.go:69] CommModule is healthy 1584580625
context_channel.go:175] the message channel is full, message: {Header:{ID:5f072fe2-b8cf-411e-8aee-16e927f27433 ParentID: Timestamp:1584580605260 ResourceVersion:391570 Sync:false} Router:{Source:edgecontroller Group:resource Operation:update Resource:default/pod/led-light-mapper-deployment-94bbdf88-26h2d} Content:map[metadata:map[creationTimestamp:2020-03-18T10:23:50Z deletionGracePeriodSeconds:30 deletionTimestamp:2020-03-18T23:40:09Z generateName:led-light-mapper-deployment-94bbdf88- labels:map[app:led-light-mapper pod-template-hash:94bbdf88] name:led-light-mapper-deployment-94bbdf88-26h2d namespace:default ownerReferences:[map[apiVersion:apps/v1 blockOwnerDeletion:true controller:true kind:ReplicaSet name:led-light-mapper-deployment-94bbdf88 uid:52c44b48-1214-4b10-9007-23093a953a40]] resourceVersion:391570 selfLink:/api/v1/namespaces/default/pods/led-light-mapper-deployment-94bbdf88-26h2d uid:12002c7e-69fe-4a31-bf66-759d78380abe] spec:map[containers:[map[image:latelee/led-light-mapper:v1.1 imagePullPolicy:IfNotPresent name:led-light-mapper-container resources:map[] securityContext:map[privileged:true] terminationMessagePath:/dev/termination-log terminationMessagePolicy:File volumeMounts:[map[mountPath:/opt/kubeedge/ name:config-volume] map[mountPath:/var/run/secrets/kubernetes.io/serviceaccount name:default-token-gb4kq readOnly:true]]]] dnsPolicy:ClusterFirst enableServiceLinks:true hostNetwork:true nodeName:latelee.org.ttucon-2142ec priority:0 restartPolicy:Always schedulerName:default-scheduler securityContext:map[] serviceAccount:default serviceAccountName:default terminationGracePeriodSeconds:30 tolerations:[map[effect:NoExecute key:node.kubernetes.io/not-ready operator:Exists tolerationSeconds:300] map[effect:NoExecute key:node.kubernetes.io/unreachable operator:Exists tolerationSeconds:300]] volumes:[map[configMap:map[defaultMode:420 name:device-profile-config-edge-node2] name:config-volume] map[name:default-token-gb4kq secret:map[defaultMode:420 secretName:default-token-gb4kq]]]] status:map[phase:Pending qosClass:BestEffort]]}
DNS警告:
I0319 16:25:18.563472 17947 record.go:24] Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
I0319 16:25:18.563724 17947 record.go:24] Warning MissingClusterDNS pod: "webgin-deployment-747c6887f5-dwmtb_default(1ceb1dd6-6dae-4aff-a2c6-d0de64373031)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
I0319 16:25:18.563902 17947 record.go:19] Warning DNSConfigForming Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 8.8.8.8 8.8.4.4 2001:4860:4860::8888
E0319 16:25:18.564035 17947 dns.go:135] Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 8.8.8.8 8.8.4.4 2001:4860:4860::8888
I0319 16:30:09.037479 17947 edged.go:808] consume added pod [webgin-deployment-7ccff86d8b-s227c] successfully
I0319 16:30:10.506631 17947 record.go:19] Normal Started Started container webgin
E0319 16:30:10.507199 17947 kuberuntime_container.go:172] Failed to create legacy symbolic link "/var/log/containers/webgin-deployment-747c6887f5-f6547_default_webgin-1772b70cd7725f77c30b9cf47e3ce57159d9fdccf47c0c19aed8edf779c52c16.log" to container "1772b70cd7725f77c30b9cf47e3ce57159d9fdccf47c0c19aed8edf779c52c16" log "/var/log/pods/default_webgin-deployment-747c6887f5-f6547_abc27c3c-50f1-49e9-9f2e-b00fa802dc7f/webgin/0.log": symlink /var/log/pods/default_webgin-deployment-747c6887f5-f6547_abc27c3c-50f1-49e9-9f2e-b00fa802dc7f/webgin/0.log /var/log/containers/webgin-deployment-747c6887f5-f6547_default_webgin-1772b70cd7725f77c30b9cf47e3ce57159d9fdccf47c0c19aed8edf779c52c16.log: no such file or directory
I0319 16:30:10.507557 17947 edged.go:808] consume added pod [webgin-deployment-747c6887f5-f6547] successfully
I0319 16:30:10.667156 17947 edged.go:648] sync loop ignore event: [ContainerDied], with pod [1ceb1dd6-6dae-4aff-a2c6-d0de64373031] not found
W0319 16:30:10.685178 17947 docker_sandbox.go:394] failed to read pod IP from plugin/docker: Couldn't find network status for default/webgin-deployment-747c6887f5-f6547 through plugin: invalid network status for
W0319 16:30:10.871129 17947 docker_sandbox.go:394] failed to read pod IP from plugin/docker: Couldn't find network status for default/webgin-deployment-747c6887f5-f6547 through plugin: invalid network status for
I0319 16:30:10.914857 17947 container_manager_linux.go:880] Found 44 PIDs in root, 44 of them are not to be moved
I0319 16:30:11.088286 17947 edged.go:645] sync loop get event [ContainerStarted], ignore it now.
I0319 16:30:11.327738 17947 edged.go:645] sync loop get event [ContainerStarted], ignore it now.
W0319 16:30:12.413498 17947 docker_sandbox.go:394] failed to read pod IP from plugin/docker: Couldn't find network status for default/webgin-deployment-747c6887f5-f6547 through plugin: invalid network status for
W0319 16:30:12.543879 17947 docker_sandbox.go:394] failed to read pod IP from plugin/docker: Couldn't find network status for default/webgin-deployment-747c6887f5-f6547 through plugin: invalid network status for
成功部署pod的:
I0319 16:25:18.564503 17947 edged.go:808] consume added pod [webgin-deployment-747c6887f5-dwmtb] successfully
I0319 16:25:18.564974 17947 proxy.go:318] [L4 Proxy] process other resource: kube-system/endpoints/kube-scheduler
I0319 16:25:18.688263 17947 edged_volumes.go:54] Using volume plugin "kubernetes.io/empty-dir" to mount wrapped_default-token-gb4kq