解决eureka高可用集群在k8s环境无法正常启动的问题

    昨天遇到一个问题,原本在非 k8s 环境下可以运行的 eureka 集群,上到 k8s 环境后,就无法运行。

    这里记录一下解决问题的过程:

    kubectl logs -f  XXX -n XXX 看日志后,报错:


com.netflix.discovery.shared.transport.TransportException: Cannot execute request on any known server
	at com.netflix.discovery.shared.transport.decorator.RetryableEurekaHttpClient.execute(RetryableEurekaHttpClient.java:111) ~[eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.getApplications(EurekaHttpClientDecorator.java:134) ~[eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator$6.execute(EurekaHttpClientDecorator.java:137) ~[eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.shared.transport.decorator.SessionedEurekaHttpClient.execute(SessionedEurekaHttpClient.java:77) ~[eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.getApplications(EurekaHttpClientDecorator.java:134) ~[eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.DiscoveryClient.getAndStoreFullRegistry(DiscoveryClient.java:1013) [eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.DiscoveryClient.getAndUpdateDelta(DiscoveryClient.java:1055) [eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.DiscoveryClient.fetchRegistry(DiscoveryClient.java:929) [eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.DiscoveryClient.refreshRegistry(DiscoveryClient.java:1451) [eureka-client-1.6.2.jar:1.6.2]
	at com.netflix.discovery.DiscoveryClient$CacheRefreshThread.run(DiscoveryClient.java:1418) [eureka-client-1.6.2.jar:1.6.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80]

我的eureka yaml 文件如下:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: eureka01-deployment
  namespace: wx-prod
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: eureka01-prod
        regcenter: eureka
        track: stable
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - topologyKey: "kubernetes.io/hostname"
              labelSelector:
                matchExpressions:
                  - key: regcenter
                    operator: In
                    values:
                      - eureka
      containers:
        - name: eureka
          image: harbor.prod.com/kube-prod/eureka:1.0
          imagePullPolicy: Always
          resources:
            requests:
              cpu: "500m"
              memory: "1024Mi"
            limits:
              cpu: "1000m"
              memory: "2048Mi"
          ports:
            - containerPort: 8761
          env:
            - name: eureka.server.enable-self-preservation
              value: "false"
            - name: eureka.client.service-url.defaultZone
              value: http://wx:[email protected]:8761/eureka/,http://vass_wx:[email protected]:8761/eureka/
      imagePullSecrets:
        - name: harbor-secret-name
  selector:
    matchLabels:
      app: eureka01-prod

---
---
apiVersion: v1
kind: Service
metadata:
  name: eureka01-service
  namespace: wx-prod
  labels:
    app: eureka01-svc
spec:
  type: NodePort
  selector:
    app: eureka01-prod
  ports:
    - port: 8761
      targetPort: 8761
      nodePort: 30001

 

看字面意思就是 eureka 之间无法相互找到对方,因为是我新搭建的 k8s 环境所以,同时eureka域名我使用的是 k8s 内部域名,我第一想到的是排查 coreDNS 是否正常工作

排查步骤如下:

参考这位同学的排查步骤: https://blog.csdn.net/alva_xu/article/details/85160552

1、 在 kubectl get pod -n kube-system 中查看 coreDNS 的pod是否正常;

2、 下载 busybox 并在k8s集群内部启动

3、 kubectl exec -ti busybox -- nslookup kubernetes.default 确认是否 域名解析是否有问题

结果证明完全正常。

 

环境没问题,那么出问题的地方就只可能是环境和代码不匹配了,因此再排查注册中心 bootstrap 文件:

server:
  port: ${hostPort}
eureka:
  client:
    service-url:
      defaultZone: http://wx:wx@${eureka.node01.name}:${eureka.node01.port}/eureka/,http://wx:wx@${eureka.node02.name}:${eureka.node02.port}/eureka/
    fetch-registry: true
    register-with-eureka: true
  instance:
    #要配置hosts
    #hostname: ${eureka.hostname}
    instance-id: ${spring.application.name}:${server.port}
    prefer-ip-address: true
    ip-address: ${ipAddress}
  server:
    peer-node-read-timeout-ms: 1000
    ####自我保护,线上设置为true
    enable-self-preservation: ${selfPreservation:true}
spring:
  application:
    name: eureka
security:
  basic:
    enabled: true
  user:
    name: wx
    password: wx

突然反应过来,如果三个 eureka 都使用相同的 application.name:port 作为注册的 instanceid 那么会不会是导致这个问题的原因呢? 接下来将该文件修改为:

server:
  port: 8761
eureka:
  client:
    service-url:
      defaultZone: http://localhost:8761/eureka/
    fetch-registry: true
    register-with-eureka: true
  instance:
    instance-id: ${spring.cloud.client.ipAddress}:${server.port}
    prefer-ip-address: true
  server:
    peer-node-read-timeout-ms: 1000
    ####自我保护,线上设置为true
    enable-self-preservation: false
spring:
  application:
    name: eureka
security:
  basic:
    enabled: true
  user:
    name: wx
    password: wx

因为 k8s 内部部署时,我使用的是 ClusterIP,每个 eureka 实例的 IP 应该都是不同的,这样使用 IP + Port 作为 instanceID :

启动后发现完美解决, mark 一下。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章