解決eureka高可用集羣在k8s環境無法正常啓動的問題

    昨天遇到一個問題,原本在非 k8s 環境下可以運行的 eureka 集羣,上到 k8s 環境後,就無法運行。

    這裏記錄一下解決問題的過程:

    kubectl logs -f  XXX -n XXX 看日誌後,報錯:


com.netflix.discovery.shared.transport.TransportException: Cannot execute request on any known server
	at com.netflix.discovery.shared.transport.decorator.RetryableEurekaHttpClient.execute(RetryableEurekaHttpClient.java:111) ~[eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.getApplications(EurekaHttpClientDecorator.java:134) ~[eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator$6.execute(EurekaHttpClientDecorator.java:137) ~[eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.shared.transport.decorator.SessionedEurekaHttpClient.execute(SessionedEurekaHttpClient.java:77) ~[eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.getApplications(EurekaHttpClientDecorator.java:134) ~[eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.DiscoveryClient.getAndStoreFullRegistry(DiscoveryClient.java:1013) [eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.DiscoveryClient.getAndUpdateDelta(DiscoveryClient.java:1055) [eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.DiscoveryClient.fetchRegistry(DiscoveryClient.java:929) [eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.DiscoveryClient.refreshRegistry(DiscoveryClient.java:1451) [eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.DiscoveryClient$CacheRefreshThread.run(DiscoveryClient.java:1418) [eureka-client-1.6.2.jar:1.6.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80]

我的eureka yaml 文件如下:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: eureka01-deployment
  namespace: wx-prod
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: eureka01-prod
        regcenter: eureka
        track: stable
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - topologyKey: "kubernetes.io/hostname"
              labelSelector:
                matchExpressions:
                  - key: regcenter
                    operator: In
                    values:
                      - eureka
      containers:
        - name: eureka
          image: harbor.prod.com/kube-prod/eureka:1.0
          imagePullPolicy: Always
          resources:
            requests:
              cpu: "500m"
              memory: "1024Mi"
            limits:
              cpu: "1000m"
              memory: "2048Mi"
          ports:
            - containerPort: 8761
          env:
            - name: eureka.server.enable-self-preservation
              value: "false"
            - name: eureka.client.service-url.defaultZone
              value: http://wx:[email protected]:8761/eureka/,http://vass_wx:[email protected]:8761/eureka/
      imagePullSecrets:
        - name: harbor-secret-name
  selector:
    matchLabels:
      app: eureka01-prod

---
---
apiVersion: v1
kind: Service
metadata:
  name: eureka01-service
  namespace: wx-prod
  labels:
    app: eureka01-svc
spec:
  type: NodePort
  selector:
    app: eureka01-prod
  ports:
    - port: 8761
      targetPort: 8761
      nodePort: 30001

 

看字面意思就是 eureka 之間無法相互找到對方,因爲是我新搭建的 k8s 環境所以,同時eureka域名我使用的是 k8s 內部域名,我第一想到的是排查 coreDNS 是否正常工作

排查步驟如下:

參考這位同學的排查步驟: https://blog.csdn.net/alva_xu/article/details/85160552

1、 在 kubectl get pod -n kube-system 中查看 coreDNS 的pod是否正常;

2、 下載 busybox 並在k8s集羣內部啓動

3、 kubectl exec -ti busybox -- nslookup kubernetes.default 確認是否 域名解析是否有問題

結果證明完全正常。

 

環境沒問題,那麼出問題的地方就只可能是環境和代碼不匹配了,因此再排查註冊中心 bootstrap 文件:

server:
  port: ${hostPort}
eureka:
  client:
    service-url:
      defaultZone: http://wx:wx@${eureka.node01.name}:${eureka.node01.port}/eureka/,http://wx:wx@${eureka.node02.name}:${eureka.node02.port}/eureka/
    fetch-registry: true
    register-with-eureka: true
  instance:
    #要配置hosts
    #hostname: ${eureka.hostname}
    instance-id: ${spring.application.name}:${server.port}
    prefer-ip-address: true
    ip-address: ${ipAddress}
  server:
    peer-node-read-timeout-ms: 1000
    ####自我保護,線上設置爲true
    enable-self-preservation: ${selfPreservation:true}
spring:
  application:
    name: eureka
security:
  basic:
    enabled: true
  user:
    name: wx
    password: wx

突然反應過來,如果三個 eureka 都使用相同的 application.name:port 作爲註冊的 instanceid 那麼會不會是導致這個問題的原因呢? 接下來將該文件修改爲:

server:
  port: 8761
eureka:
  client:
    service-url:
      defaultZone: http://localhost:8761/eureka/
    fetch-registry: true
    register-with-eureka: true
  instance:
    instance-id: ${spring.cloud.client.ipAddress}:${server.port}
    prefer-ip-address: true
  server:
    peer-node-read-timeout-ms: 1000
    ####自我保護,線上設置爲true
    enable-self-preservation: false
spring:
  application:
    name: eureka
security:
  basic:
    enabled: true
  user:
    name: wx
    password: wx

因爲 k8s 內部部署時,我使用的是 ClusterIP,每個 eureka 實例的 IP 應該都是不同的,這樣使用 IP + Port 作爲 instanceID :

啓動後發現完美解決, mark 一下。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章