Actuator Health 超時導致 Spring Boot Admin 反覆 Offline / Up 的臨時解決方案

問題現象

Spring Boot Admin (SBA) 監控從早晨 7:23 開始反覆通知 service offline / up。

臨時解決方案

修改 SBA 配置:

spring.boot.admin.monitor.period=180000
spring.boot.admin.monitor.status-lifetime=180000
spring.boot.admin.monitor.read-timeout=120000

修改 Zuul 配置:

ribbon.ReadTimeout=120000
ribbon.ConnectTimeout=120000

TODO

  • 是什麼原因導致 actuator health 突然變慢?

排查過程

SBA 日誌如下:

Couldn't retrieve status for Application [id=20e256cd, name=ADMIN-SERVICE, managementUrl=http://172.19.222.xxx:yyyy/, healthUrl=http://172.19.222.xxx:yyyy/health, serviceUrl=http://172.19.222.xxx:yyyy/]

org.springframework.web.client.ResourceAccessException: I/O error on GET request for "http://172.19.222.xxx:yyyy/health": Read timed out; 

nested exception is java.net.SocketTimeoutException: Read timed out

B 和 D 可用區的 service 是同樣的現象;

排除網絡原因,monitor / gateway / rest 都在 B 可用區,只有 rest-d1 在 D 可用區;

排除 SBA 服務原因,重啓 sba 服務,無效;

排除 SBA 機器原因,重啓 sba 機器,無效;

初步鎖定爲 actuator 問題,手動調用 actuator health 超時;

http http://172.19.222.xxx:yyyy/health

http: error: Request timed out (30s).

修改 actuator 配置,關閉未使用或不重要的檢查點,無效;

management.health.db.enabled=false
management.health.mail.enabled=false
management.health.redis.enabled=false
management.health.mongo.enabled=false

查看 MySQL 監控,確認數據庫一切正常;

修改 SBA 配置,增加超時時間,無效

spring.boot.admin.monitor.period=180000
spring.boot.admin.monitor.status-lifetime=180000
spring.boot.admin.monitor.read-timeout=120000

修改 Zuul 配置,增加超時時間,有效

ribbon.ReadTimeout=120000
ribbon.ConnectTimeout=120000

參考資料

Spring Boot Admin 集成 Eureka 和 Actuator 後,服務 health 狀態返回超時 https://www.bitdoom.com/2018/03/21/p140/

啓動 spring boot admin 項目後,發現很多服務狀態都是 DOWN,發現是 actuator 的 health 端點訪問很慢超時造成的。經過排查,需要把 management 的檢查數據庫相關屬性關閉掉,問題解決。

Long health request + Read Timeout https://github.com/codecentric/spring-boot-admin/issues/494

The values shown in the ui are fetched via the zuul proxy. You can use zuul.host.socket-timeout-millis (default: 10000) and zuul.host.connect-timeout-millis (default: 2000) to control these timeouts.
But you should better fix your slow /health responses, as they are made quite often.

spring-cloud-zuul timeout configuration does not work https://stackoverflow.com/questions/49525707/spring-cloud-zuul-timeout-configuration-does-not-work

Try to define the below properties instead if you are using Zuul withe Eureka.
ribbon:
ReadTimeout: 60000
ConnectTimeout: 20000
If you are using Zuul with Eureka, Zuul will use RibbonRoutingFilter for routing instead of SimpleHostRoutingFilter. In this case, HTTP requests are handled by Ribbon.

Table 3. Spring Boot Admin Server configuration options
https://codecentric.github.io/spring-boot-admin/1.5.5/#spring-boot-admin-server

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章