聊一聊微服務架構中的服務發現系統

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文圍繞服服務調用模式、一致性取捨、服務提供者的健康檢查模式等方面,討論了服務發現的技術選型和設計的各種優缺點,希望能夠幫助大家在選擇或者使用服務發現系統的時候更加順暢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"服務發現系統的背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不知道大家剛接觸微服務治理的時候是否有這樣的疑惑:爲什麼一定需要一個服務發現系統呢?服務啓動的時候直接讀取一個本地配置,然後通過遠程配置系統,動態推送下來不行嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實際上,當服務節點規模較小時,該方案也行得通,但如果遇到以下的場景呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 在微服務的世界中,服務節點的擴縮容、服務版本的迭代是常態,服務消費端需要能夠快速及時的感知到節點信息的變更(網絡地址、節點數量)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 當服務節點規模巨大時,節點的不可用也會變成常態,服務提供者要能夠及時上報自己的健康狀態,從而做到及時剔除不健康節點(或降低權重)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. 當服務部署在多個可用區時,需要將多個可用區的服務節點信息互相同步,當某個可用區的服務不可用時,服務消費者能夠及時切換到其他可用區(通過負載均衡算法自動切換或手動緊急切換),從而做到多活和高可用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4. 服務發現背後的存儲應該是分佈式的,這樣當部分服務發現節點不可用的時候,也能提供基本的服務發現功能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5. 除了ip、port我們需要更多的信息,比如節點權重、路由標籤信息等等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用文件配置或DNS等傳統方式無法同時滿足上述幾點要求,因此我們需要重新設計一個能夠匹配上微服務架構的服務發現系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#FFFFFF","name":"user"}}],"text":"02"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"服務間調用模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#006EFF","name":"user"}}],"text":"客戶端發現模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由客戶端負責向服務發現系統(可以認爲是一個數據庫,存儲了所有服務提供者的所有節點位置信息)詢問某個服務提供者的所有實例的ip、port信息,並採用某種負載均衡策略,直接發起對服務實例的訪問。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中一個經典代表就是Netflix提供的解決方案:Netflix Eureka 提供服務發現功能, Netflix Ribbon 作爲一個通訊SDK庫與客戶端集成在一起提供負載均衡與故障轉移。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/4a\/c4\/4a81c440a5cd6f8528dyyc9c9cbd8dc4.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種模式去除了對中心化單點(API Gateway or Load Balancer)的依賴,可以避開單點造成的性能瓶頸與故障問題,同時由於負載均衡的邏輯在客戶端,它可以根據自身的配置選擇負載均衡算法,比如一致性Hash算法。不過這種模式也存在缺陷,由於客戶端的負載均衡邏輯是分佈式的,各自爲政,沒有全局統一視角,在某些情景下會因爲客戶端的高度競爭而導致後端服務提供者節點的負載不均衡。同時客戶端的業務邏輯和服務發現的邏輯耦合在一起,不同的服務使用了不同的編程語言,那麼就需要有不同語言的SDK,如果未來某天服務發現的邏輯變更了,也需要重新發布所有的客戶端節點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#006EFF","name":"user"}}],"text":"服務端發現模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"把原本客戶端執行的服務列表拉取&負載均衡&熔斷&故障轉移這部分邏輯抽象變成一個專屬的服務。不過跟傳統的 load balancer 不大一樣的地方是: 這個的 load balancer會跟服務發現系統密切的配合,實時訂閱服務發現系統中服務提供者節點列表信息,扮演反向代理的角色,將請求分發到合適的 Endpoint。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這塊的一個代表是kubernetes的服務發現解決方案:運行在每個Node節點的kube-proxy會實時的watch Services和 Endpoints對象。每個運行在Node節點的kube-proxy感知到Services和Endpoints的變化後,會在各自的Node節點設置相關的iptables或IPVS規則,方便後面用戶通過Service的ClusterIP去訪問該Service下的服務。當kube-proxy把需要的規則設置完成之後,用戶便可以在集羣內的Node或客戶端Pod上通過ClusterIP經過iptables或IPVS設置的規則進行路由和轉發,最終將客戶端請求發送到真實的後端Pod。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/00\/f9\/00c18a52f94101d4dd8dyy167c0a90f9.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種模式對於客戶端來說是透明的,所有細節都被隔離在 load balancer 跟服務發現系統之間, 因此也沒有前面跨語言等相關問題,更新相關邏輯也只要統一部署 load balancer & service registry 就足夠了。很明顯,這種模式下服務的架構等於多了一層轉發,延遲事件會增加;整個系統也多了一個故障點,整體系統的運維難度會提高;另外這個load balancer 也可能會成爲性能瓶頸。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基本上服務端發現模式我們平常接觸到的機會比較少,但是由於是無任何入侵的,比較適合舊系統上微服務架構的一個過渡方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"服務發現的一致性取捨"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們先回顧一下CAP定律:在一個分佈式系統中,Consistency(一致性)、Availability(可用性)、Partition Tolerance(分區容錯性),不能同時成立。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一致性:它要求在同一時刻點,分佈式系統中的所有數據備份都處於同一狀態。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可用性:在系統集羣的一部分節點宕機後,系統依然能夠響應用戶的請求。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分區容錯性:在網絡區間通信出現失敗,系統能夠容忍。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對基於raft、paxos算法的CP服務發現系統比如Consul、Zookeeper、Etcd等,爲了保證數據的線性強一致性(Linearizable),必然會犧牲掉高可用性,比如在網絡分區的情況下心跳、註冊、反註冊這些操作都會超時並失敗。同時由於一致性算法的要求,所有的寫請求都會重定向至leader節點,那麼這樣無法做到寫的水平擴展。而AP服務發現比如Eureka則強調最終一致性(在有限的時間內(例如3s內)將數據收斂到一致狀態),在犧牲數據一致性的情況下最大程度保障服務的可用性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/6d\/e0\/6d96727feffc3e59c3da1a7be03dd9e0.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"zookeeper服務發現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以考慮上圖的跨機房容災的情景,此時滿足強一致要求的Zookeeper作爲服務發現。如果機房 1 和機房 2 由於某些不穩定的原因發生網絡斷開,provider B 去往 Zookeeper Follower 的註冊是無法實現的。因爲 Zookeeper Follower 所有的請求是強一致,都有同步到 ZK Leader,這時機房 2 就無法註冊了,但此時其實 Consumer B 和 Provider B 之間的網絡是正常的,互相調用沒有問題,可Provider B不能註冊導致Consumer B無法訪問Provider B。所以我們可以發現,服務發現系統首先應當保證的服務可用性,爲了保證數據一致性卻不能提供註冊功能,在生產實踐中是不能接受的。  當然我們也可以在兩個機房獨立的部署兩套Zookeeper,然後再寫一個工具互相同步數據,使得兩個機房的Zookeeper互爲Master Slave,但這樣不僅引入了新的複雜度,同時還得花大力氣保證數據同步的一致性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼引入一個最終一致的Netflix Eruka的最終一致性設計是否就滿足所有的場景萬事大吉了呢?讓我們設想這種情景:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/e0\/3b\/e06d0a051a9405c9c9e6460f2b13553b.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Eruka serverA和Eruka serverB之前互相同步數據,但此時Eruka serverC和Eruka serverB、Eruka serverA之間的網絡發生了故障,無法順利同步信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ProviderB向Eruka serverA註冊了服務信息,並維持上報心跳,這樣服務節點ProvderB的信息Eruka serverA和Eruka serverB中都是存在的,但是由於信息複製的問題,沒辦法同步到Eruka serverC中。這樣當ConsumerA先向eruka serverA發起請求的時候,會得到一個正確的節點信息,但是當下次訪問到Eruka serverC的時候又會得到一個錯誤的節點信息,這樣之前正確的信息就被覆蓋了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼爲了避免上述的情況,我們需要改造上面的邏輯,Client SDK需要同時去訪問三個eruka server節點,再拿到三個節點返回providerB的節點信息中的的dirty time(dirty time由ProviderB維護,心跳上報的時候夾帶,這樣可以保證單調自增)後,通過比較選取dirty time最新的那個信息,這樣就可以保證訪問到正確的信息。當然上述情景是在生成環境中很難遇到,因爲大多數情況下eruka server和Provider、Consumer都部署在同一個機房,如果eruka serverC和其他eruka server節點網絡通信有問題的話,ConsumerA大概率也是訪問不到eruka serverC的;又如果eruka serverC是跨機房部署的,那麼正常情況下ConsumerA也是不會主動跨機房訪問eruka serverC的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"服務提供者的健康檢查模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#006EFF","name":"user"}}],"text":"客戶端心跳"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客戶端每隔一定時間主動發送“心跳”的方式來向服務端表明自己的服務狀態正常,心跳可以是 TCP 的形式,也可以是 HTTP 的形式。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"也可以通過維持客戶端和服務端的一個 socket 長連接自己實現一個客戶端心跳的方式。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是客戶端心跳中,長連接的維持和客戶端的主動心跳都只是表明鏈路上的正常,不一定是服務狀態正常。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#006EFF","name":"user"}}],"text":"服務端主動探測"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務端調用服務發佈者某個 HTTP 接口來完成健康檢查。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於沒有提供 HTTP 服務的 RPC 應用,服務端調用服務發佈者的接口來完成健康檢查。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以通過執行某個腳本的形式來進行綜合檢查。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務端主動調用服務進行健康檢查是一個較爲準確的方式,返回結果成功表明服務狀態確實正常。但是服務端主動探測也存在問題。服務註冊中心主動調用 RPC 服務的某個接口無法做到通用性;在很多場景下服務註冊中心到服務發佈者的網絡是不通的,服務端無法主動發起健康檢查,那麼往往需要在宿主機器上部署一個agent來代替服務端的接口探測,比如Consul的健康檢查機制就是這麼實現的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a8\/52\/a88d8068203cf9d3057d0bbaf4372152.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"消費端的訂閱機制"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Push推送:Push 的經典實現有兩種,基於socket長連接的推送,典型的實現如 zookeeper;另一種爲HTTP連接所使用的 Long Polling,這兩種形式都保證了消息變更能夠第一時間送達。但是基於 socket 長連接的推送和基於 HTTP 協議的 Long Polling 都會存在notify消息丟失的問題和代碼實現複雜度過高的問題。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"定時輪詢:比如eruka,客戶端每隔一段時間(默認30秒)會去服務端拉取註冊表信息,保證註冊表是最新的,這樣的基於http短鏈接的訂閱模式實現起來是最簡單、最通用的。但也很容易導致一個問題,就是服務節點信息會有30s的延遲,在這30s內有可能會有請求打到已下線的節點上去。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推拉結合的方式:比如Consul,客戶端和consul server之間會建立起一個最長30s的http長鏈接,如果期間有任何變更,則會立即推送,如果沒有變更等到30s過後,客戶端又會立即建立起新的連接,繼續開始新的一輪訂閱。這種模式的既吸收了http短鏈接方便通用的好處,又享受到消息即時推送的優勢。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"服務的上線與下線"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#006EFF","name":"user"}}],"text":"優雅上線"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 服務提供一個通用的Health check接口(比如spring boot actuator模塊自帶\/actuator\/health 接口,grpc也提供了health checking的標準模型),服務發現的sdk通過檢查該接口來確定服務是否準備好接流,只有準備好節流纔可將該節點註冊上去。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. SDK也可以提供一個回調接口,服務一切都準備就緒後再調用這個接口通知sdk去註冊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#006EFF","name":"user"}}],"text":"優雅下線"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務發現SDK接收到系統發出的SigTerm或者SigInt信號後,需要先主動反註冊本身的實例,此時如果服務框架提供了graceful shutdown能力,就可以直接調用該方法,此時會阻塞住直到當前的所有inflight請求都處理完成或者超時才真正退出(不通)(grpc server提供了直接graceful shutdown方法,spring web應用則可以通過java提供的ThreadPoolExecutor.awitTermination來實現此能力)。如果沒有graceful shutdown的能力,則需要主動sleep一定時間以確保所有http、rpc請求都處理完成後再退出。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"服務發現的容災與高可用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"服務端"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務節點信息原本是分佈式存儲的,少數節點掛了,不會影響整體可用性。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當大多數節點掛了的時候,如果是強一致的系統此時會進入只讀不可寫的模式(比如Zookeeper和開啓了stale read的consul。如果是最終一致的系統,此時客戶端 sdk會自動重試並切換到正常節點上去,讀和寫都不受影響。(缺少後括號,但不知道在哪加)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當服務端所有的節點都掛了時候,此時需要服務端能夠持久化存儲之前註冊的Provider節點信息,並在重啓之後進入保護模式一段時間,在此期間先不剔除不健康的Provider節點(因爲宕機過程中心跳沒辦法成功上報),否則可能會導致在一個ttl內大量Provider節點失效。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"網絡閃斷保護,監測到大面積出現服務提供者節點心跳沒有上報,則自動進入保護模式,該模式下不會剔除因爲心跳上報失敗的服務提供者節點"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#006EFF","name":"user"}}],"text":"客戶端"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客戶端SDK需要有不可用節點剔除能力,當服務端某個節點不可用的時候,能夠立即切換到下一個節點嘗試(切換的時候隨機sleep 0-3s防止重試風暴打垮某個節點)。這裏要注意客戶端SDK每次請求的超時時間是否設置正確,我們發現部分服務發現官方SDK的默認超時時間過長,比如java的consul sdk中默認超時是10分鐘,在生產實踐中如果發生了網絡閃斷導致response包回不來就會導致sdk的心跳請求一致阻塞住,沒辦法進行下次的心跳上報,從而導致節點從註冊中心中異常下線。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當所有的服務端節點都不可用的時候,SDK能夠使用內存中的緩存繼續提供服務"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果客戶端重啓了,內存中的數據不存在了,則走本地配置降級。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#006EFF","name":"user"}}],"text":"服務註冊的Metadata"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務註冊的時候除了攜帶serviceName、ip、port這些信息就足夠了呢?在一個大型爲微服務系統中,服務支持的協議、服務的標籤(比如Abtest、藍綠髮布的時候需要篩選這些tag作爲服務路由信息)、服務的健康狀態、服務的調度權重等信息可能都需要傳遞給消費者感知到。不過在生產實踐中,一般不推薦將過多的信息放入註冊中心,以免導致性能下降,比如swagger生成的api信息最好單獨存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上一些淺見便是我們團隊在騰訊雲微服務框架TSF中的服務發現系統開發和維護時所踩過的坑以及留下的經驗和總結,如果大家不想再淌這些坑,可以直接使用"},{"type":"link","attrs":{"href":"http:\/\/mp.weixin.qq.com\/s?__biz=MzAxMTQ2NTA1Mg==&mid=2247483676&idx=1&sn=009b08384b8c46eddc6d1941fb72586c&chksm=9b41f92fac367039ef03378c7580cd9b7c4413c2801ce839b0f524fe0e1f14fb4ce8257da9e3&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"騰訊雲微服務框架TSF"}]},{"type":"text","text":",其中提供了服務發現等微服務治理功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule"},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"頭圖:Unsplash"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者:曹國樑"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/IhsLvbhr8-jwg4nW-P7CRQ","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/mp.weixin.qq.com\/s\/IhsLvbhr8-jwg4nW-P7CRQ"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文:聊一聊微服務架構中的服務發現系統"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來源:騰訊雲中間件 - 微信公衆號 [ID:gh_6ea1bc2dd5fd]"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"轉載:著作權歸作者所有。商業轉載請聯繫作者獲得授權,非商業轉載請註明出處。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章