Yelp故障轉移策略的實現

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#888888","name":"user"}}],"text":"講述Yelp工程師如何協調其流量故障轉移流程,並在可靠性、性能和成本效率之間實現微妙平衡的故事。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"表面上看,這是很簡單明瞭的流程:Yelp的站點可靠性工程師有時會轉移流量,以防止出現面向用戶的錯誤。但是在幕後,這一流程涉及生產系統、基礎架構團隊以及成百上千開發人員和他們負責服務之間的複雜編排。這篇文章講述的就是Yelp的生產工程和計算基礎架構團隊如何實現故障轉移策略,在可靠性、性能和成本效率之間找到平衡的故事。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"什麼是流量故障轉移?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Yelp通過位於美國東西海岸的兩個AWS區域數據中心來服務請求。大部分用戶流量都是隻讀請求,它們被髮送到最近的數據中心,並帶有附加的邏輯以確保負載在兩個區域之間平均分配。有時,一個區域會由於基礎架構配置不正確、關鍵數據存儲受損或(極少數情況下)AWS自身的問題而變得不健康;當其中任何一種情況發生時,我們可能會向用戶返回HTTP 500錯誤,於是需要迅速採取補救行動。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"爲緩解此類故障,Yelp可以使用的一種工具是故障轉移:它能將流量從不健康的區域快速轉移到健康的區域。流量部分轉移可以緩解故障系統上的壓力併爲其留出恢復的空間。流量也可以全部轉移:也就是完整故障轉移。轉移流量時,我們需要做的就是更新一個由Git控制的YAML文件。但即使在緊急情況下,合併和推送更改也需要審覈,通常需要二級待命生產工程師、一位經理,或一位參與當前事件的工程師來批准。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0a\/0a2e296cf9304553f890f23ed46c8378.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"流量管理配置文件的摘錄"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Yelp的待命工程師會定期演習部分和完整故障轉移,以確保我們的基礎架構能夠應對負載的突然變化,且我們的團隊隨時都可以輕鬆執行這一流程。雖說故障轉移本身不過是一個簡單的配置更改,但需要完整轉移的情況往往壓力巨大且難以預測。主力待命工程師需要熟悉這個流程,以免造成額外的壓力。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"故障轉移失效時"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"流量模式的重大變化可能會壓垮正在爲​​全球流量服務的健康區域。在Yelp的早年間,我們經常會將超量流量太快發送到一個健康區域,結果把這個區域“融化”掉。只要所有系統都能按預期正常工作的話,我們的大多數服務和機器集羣都可以在幾分鐘之內完成規模擴展,但我們連這幾分鐘都沒法等,因爲它們太關鍵了。我們的響應必須在瞬間起效。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"此外,在一個不斷髮展的基礎架構中,向生產環境增加容量可能是很複雜的事情——最近的一項低級配置更改可能會減慢甚至阻止我們獲得新的健康機器。這可能會迅速演變成最壞的情況,也就是我們無法擴展健康區域,最後只能向用戶返回HTTP 500錯誤。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"維持雙倍容量"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"防止崩潰的一種好辦法就是一直保持額外的計算能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"要做到這一點,一種途徑就是準備更多的可用機器。只要將運行中的計算機數量加倍,我們就能一直具備處理故障轉移所需的計算容量。這也意味着我們無需在緊急情況下添加機器,從而減少了故障轉移流程中的一個步驟,更重要的是減少了配置這些實例時出錯的可能性,進而減輕了對計算基礎架構團隊的依賴。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"但是,就爲了故障轉移就在平時準備很多閒置的機器看起來是在浪費資源,因此我們會把所需的容器分配到所有可用的機器之間。這樣,每臺機器在正常情況下只會將其計算資源的50%分配給各種服務,這樣它們就能應對負載峯值,並保持更一致的性能——並且成本是一樣的。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5b\/5b09f61f249412c9522dac91c11af62c.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"將容器均勻分佈在多臺機器上可以爲服務提供更多空間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"有了足夠的機器來處理故障轉移情況後,我們就獲得了可靠性(在多個主機上分佈容器意味着單個主機發生的故障所影響的服務會更少)並提高了性能一致性,但是,我們仍然需要解決在緊急故障轉移時安排我們服務的多個副本所需的關鍵幾分鐘延遲。我們需要流量轉移在幾秒鐘而不是幾分鐘內完成,以最大程度地減少我們可能要返回的500錯誤。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"隨時做好流量轉移的準備"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在故障轉移過程中,如果我們等待安排額外的容器來處理新負載、下載它們的Docker映像併爲它們的worker熱身,然後才爲流量提供服務,我們就會一點點浪費寶貴的時間。爲了抹掉這部分時間,我們決定在正常的服務配置中在容器裏保留額外的容量,以確保在故障轉移期間我們不需要添加任何內容。雖然這個細節看起來沒那麼顯眼,但這是我們可靠性策略的關鍵所在,因爲它讓我們能夠隨時爲故障轉移做好準備。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3e\/3ee8a29981f70da8c7f718dcfe663bd6.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"準備額外容器的策略有一些關鍵優勢:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們可以在所有機器上保留免費資源來保證用戶的高性能一致性。這些資源已由服務保留。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們有了足夠的容器來應對流量突然翻倍的情況。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們減少了對計算基礎架構在關鍵時刻安排容器能力的依賴。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"爲了實現這一策略,我們必須爲容器安排正確的大小,容器還要從計算平臺申請正確的資源數量。在一個面向服務的架構中,開發人員直接負責其服務的配置。這種配置需要反映我們的故障轉移策略,且每個服務都需要配置爲恰好使用分配給它的資源的50%,這是在故障轉移期間處理翻倍負載所需的數字。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"設置正確的數值"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"以簡勝繁"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"這個例子適用於大多數具有單線程worker的服務。即使這種自動調整行爲已經有了內部文檔,也可能沒那麼直觀。只有少數花了時間嘗試瞭解這種行爲的人們才能意識到這一點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"假設一個服務的worker數量有限,例如四個worker(一個容器可以一次滿足四個請求)請求四個CPU。自動縮放器被配置爲將該服務維持在CPU使用率的50%上下。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"現在,想象一下這項服務性能不足的情況。開發人員自然會認爲:“只要給它更多的CPU,讓它變得更快或瓶頸減少即可。”因此,他們將CPU數量提高到了8個。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"但至少有兩個因素可能會讓這樣的更改進一步降低服務速度:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Worker大多是單線程的:無論如何,一個容器永遠不會使用四個以上的CPU內核。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在自動縮放時將CPU設置爲8,並保持50%的目標使用率的話,該服務將永遠不會達到目標使用率(4),於是自動縮放器將開始按比例縮小該服務的容器數量,從而降低該服務的總流量容量。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"也許這很反直覺,但更好的配置是將你請求的CPU數量減少到兩個,這是因爲:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在一個面向服務的架構中,大多數服務都將時間花在等待其他服務上,因此每個worker不太可能需要一個完整的CPU內核算力。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"現在你已經達到了自動縮放器的目標,並且一個worker平均使用一個CPU內核的50%。自動縮放器增加了容器數量,從而提高了worker的總容量。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"這說明了爲什麼生產環境中的自動縮放設置應始終適應服務的內部架構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"面向服務架構的主要優點是提高了開發人員的開發速度,開發人員可以每天多次部署自己負責的服務,而不必擔心批量處理代碼更改、合併衝突、發佈時間表等問題。團隊可以完全控制他們的服務及相關配置,包括CPU、內存和自動縮放設置等資源分配細節。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/6f\/6fd3b092d12d46cb3ebee950ac478e35.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在Yelp的計算平臺PaaSTA上啓用服務自動縮放是非常容易的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"資源分配是一個挑戰,爲了做好故障轉移的準備,每項服務都需要精確的資源分配,以確保容器在正常情況下恰好使用其資源的50%。我們怎麼才能指望幾十個團隊的軟件工程師都在生產環境中精確調整好這些設置呢?"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"找到最優設置"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"隨着時間的推移,Yelp的某些核心組件(例如搜索基礎架構)已針對性能做好了優化工作。它們的所有者確切地瞭解了應用程序在繁重生產負載下的行爲方式、服務可以使用多少線程、典型的等待時間與實際CPU時間都是多少、垃圾收集操作的頻率和成本都是多少,等等。但是,Yelp的大多數團隊都不具備所有這些知識。大多數服務都在使用默認值,並且隨着服務的規模和複雜度增加,我們要做的“調整”內容也越來越多,需要在各種位置增加資源,"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"抽象資源聲明"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們需要幫助開發人員找到適合他們服務的設置,因此我們投資開發了用於監控、推薦甚至抽象出所有生產服務的自動縮放設置和資源需求的工具鏈。現在我們分析服務的資源使用情況併爲CPU、內存、磁盤等生成最優設置。開發人員在需要時可以選擇取消自動設置:手動設置的數值總是優先於模板化的安全默認值和優化默認值。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c9\/c957461636f5be4206f5249d10570768.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"基於配置建議的新容器的使用率爲50%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"爲CPU生成優化設置可以確保服務正確地自動縮放,並有足夠的容量來進行故障轉移,同時還可以獲得可靠性方面的好處。例如,如果一項服務達到其內存限制並被終止,其優化默認值將在兩小時內自動更新,提供更高的內存分配值。在部署自動調整以前,我們需要幾天時間來診斷這類情況,而現在我們修復這類問題所需的時間要少很多。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"自動調整所有默認值"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在提供優化設置選項並構建了用於監控系統成功指標的工具幾周後,我們積累了足夠的信心來從一個推薦系統轉變爲默認啓用的優化值。我們稱其爲“自動調整默認值”。除非這個默認值被開發人員手動選擇關閉,否則每個服務現在都會自動聲明並使用一個優化的資源數量。這一更改還有一個附帶好處,那就是之前被分配過量資源的服務現在得到了合適的資源,從而顯著減少了計算成本。最重要的是,開發人員可以享受更簡單的服務配置。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"爲成功做好準備"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"爲故障轉移分配合適的額外資源,意味着我們隨時都能具備所需的計算容量。自動化服務資源分配爲服務所有者減輕了大量負擔,並改善了整個流程。這些策略結合在一起簡化了事件響應,並減少了最嚴重的崩潰事件中的停機時間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們的故障轉移和自動縮放策略最有趣的一個方面是組織層面的。通過在每個容器中添加額外的故障轉移餘量,多個團隊的工作效率得到了提高。生產工程團隊現在可以控制所有服務的配置,這是成功的故障轉移的先決條件。計算基礎架構團隊可以專注於增強平臺,而不必過多擔心其處理故障轉移服務的能力。而且,開發人員無需通過費時的過程來爲故障轉移調整資源分配或自動擴展配置。相反,他們可以專注於自己的長處:爲我們的用戶和社區構建更好的產品。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Mathieu Frappier在2015年加入Yelp,爲其生產基礎架構的多個領域做出了貢獻。他對提升複雜系統的效率有着很大的熱情。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Dorothy Jung是Yelp的工程經理。她在LISA和SREcon上介紹了很多可靠性最佳實踐。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Qui Nguyen是Yelp的高級工程師兼技術主管。她喜歡利用計算機來解決各種解決問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/increment.com\/reliability\/yelp-traffic-failover-strategy\/","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/increment.com\/reliability\/yelp-traffic-failover-strategy\/"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章