系統穩定性建設實踐總結

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2020年,註定是個不平凡的一年。疫情的蔓延打亂了大家既定的原有的計劃,同時也催生了一些在線業務辦理能力的應用訴求,作爲技術同學,需要在短時間內快速支持建設系統能力並保障其運行系統穩定性。恰逢年終月份,正好梳理總結下自己的系統穩定性建設經驗和思考。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"開篇","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在開始介紹服務穩定性之前,我們先聊一下","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"SLA","attrs":{}},{"type":"text","text":"。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"SLA","attrs":{}},{"type":"text","text":"(service-level agreement,即 服務級別協議)也稱服務等級協議,經常被用來衡量服務穩定性指標。通常被稱作“幾個9”,9越多代表服務全年可用時間越長服務也就越可靠,即停機時間越短。通常作爲服務提供商與受服務用戶之間具體達成承諾的服務指標——質量、可用性,責任。","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3個9,即99.9%,全年停機時間:365 * 24 * 60 *(1-99.9%)= 52.56min","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4個9,即99.99%,全年停機時間:365 * 24 * 60 *(1-99.99%)= 5.256min","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5個9,即99.999%,全年停機時間:365 * 24 * 60 *(1-99.999%)= 0.5256min","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在嚴苛的服務級別協議背後,其實是一些列規範要求來進行保障。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、系統穩定性建設是指什麼?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於系統穩定性是指什麼這一問題,相信好多開發同學都會有自己的理解和認知,但可能會存在是否理解片面或者是否標準的疑惑,那到底有什麼判定標準和劃分邊界呢?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們不妨看下來自於維基百科的解釋:","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"穩定性是數學或工程上的用語,判別一系統在有界的輸入是否也產生有界的輸出。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若是,稱系統爲穩定;若否,則稱系統爲不穩定。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單理解,系統穩定性","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"本質上是系統的確定性應答","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從另一個角度解釋,服務穩定性建設就是如何保障系統能夠滿足SLA所要求的服務等級協議。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、爲什麼需要系統穩定性建設?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以確定的一點,服務穩定性建設是非常必要的,不管是滿足日常系統正常運行還是重大節慶活動的穩定有序運營。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們來看幾個由於服務穩定性故障造成影響的案例:","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)2020年國慶前一天,受“2020年最難打車日”的需求影響,滴滴平臺和嘀嗒平臺相繼出現宕機故障;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)2018年亞馬遜prime day:亞馬遜會員日故障(顧客無法將商品添加到購物車結賬),導致公司損失高達9900萬美元。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)2015年由於中國工商銀行部分地區因計算機系統升級,造成櫃面和電子渠道業務辦理緩慢,甚至不能受理業務;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)2012年12306鐵路訂票網站因機房空調系統故障,導致暫停互聯網售票、退票、改簽業務。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/70/70d72b31bb80abe4d9b7135fe3881841.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務穩定性對於企業來說非常重要,不僅僅會對企業帶來直接的經濟損失,甚至會對行業、人們的生活造成非常嚴重的影響。所以說服務穩定性建設的意義非常重大。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、系統穩定性建設爲什麼難?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於穩定性以及如何提升穩定性指標,我們可以想到很多的優化項:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"eg. 加服務器、擴容、超時重試、服務降級、資源隔離&備份、代碼邏輯優化、異步事件化...","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那系統穩定性建設的主要難點是什麼呢?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.1 面對的挑戰比較大","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"流量未知","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"尤其對於一個新改革上線的新業務而言,系統穩定性建設主要是流量洪峯的是個未知數,由於沒有經驗可以參考,我不確定是百萬級別還是千萬級別,還是更高級別?","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"改動量大","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"往往這種系統穩定性建設需要考慮需求主要是短時間內支持XX能力的上線,這其中往往涉及系統層面從下到上的多處變更,包括底層數據結構調整、業務邏輯改造以及用戶交互方式的優化等等。時間短,改動大,質量難以保證。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"不確定性","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"軟件工程往往被用來描述“研究用工程化方法構建和維護有效的、實用的和高質量的軟件”。其包括軟件建設的方方面面,凡事事無鉅細,任何細微的疏忽都可能造成全盤故障問題,不確定性問題尤其嚴重。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2 系統穩定性建設是一個系統性的大工程","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多環節分工精細複雜,不容一點疏忽。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從系統構成來看,可以區分爲單服務系統穩定性和多服務集羣穩定性。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"單服務穩定性","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主要包括:功能配置可控、緩存加速(利器) 、服務隔離(第三方)、場景異常兜底方案、服務監控與及時響應等等","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"集羣穩定性","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主要包括:合理的系統架構、優秀的集羣部署、科學的熔斷限流、壓測機制、精細的監控體系等等","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、系統穩定性建設如何入手?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.1 系統穩定性建設前提","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在提出系統穩定性建設解決方案之前,我們需要明確一下前提條件:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"業務熟悉","attrs":{}},{"type":"text","text":" 需要對業務全貌流程熟悉,具備較強的掌控力;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"架構明確","attrs":{}},{"type":"text","text":" 需要對系統技術架構熟知並具有一定的實操經驗。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"只有這樣,對業務、架構都具備掌控能力之後,才談得上去做穩定性建設的拆解和優化,纔有基本的保障。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.2 流程劃分","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般情況下,我們提到系統穩定性建設,更像將系統穩定性作爲一個專項Topic來搞,從其運行流程來看,主要存在以下幾個方面:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"前提","attrs":{}},{"type":"text","text":" 明確目標 (基準)","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"事前","attrs":{}},{"type":"text","text":" 請求鏈路優化、服務性能優化&壓測、應急預案制定、故障演練","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"事中","attrs":{}},{"type":"text","text":" 故障監控、定位問題、故障止損、問題修復","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"事後","attrs":{}},{"type":"text","text":" 故障覆盤、整改優化、經驗總結沉澱","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務穩定性建設其實是一個系統性的大工程,包括了方方面面。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1c/1cdd0211502cfccbecda60cce86dccb0.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五、系統穩定性建設的關鍵動作","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上一Part工作拆解來看,穩定性建設囊括的點比較多,而且雜。更多情況下,我們會做服務穩定性專項,針對某些特定場景下的特定問題而梳理出對應的方案。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那我們可以以小見大,從單服務系統本身出發,提煉看看存在哪些穩定性建設的關鍵點。其實只有每個單服務環節都穩定可靠,那集羣系統乃至整個工程系統的穩定性纔有保障。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假如系統面對突增的請求流量情況下,如何做好服務穩定性建設呢?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"穩定性建設關鍵動作","attrs":{}},{"type":"text","text":"拆分如下幾類:","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.1 削峯限流","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如,經典的秒殺場景,春節的火車票搶購、電商平臺的雙11秒殺等等,都是短時間上億的用戶湧入,瞬間流量巨大(高併發)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不管前期對服務器資源做了如何的擴容,都會存在一個處理上限,所以一定要進行必要的削峯限流策略,類似於城市早晚高峯錯峯限行的解決方案。同樣,秒殺場景也需要類似的解決方案。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那具體如何實現呢?","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"利用消息隊列來削峯","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息隊列來緩衝瞬時流量,把同步的直接調用轉換成異步的間接推送,中間通過一個隊列在一端承接瞬時的流量洪峯,在另一端平滑地將消息推送出去。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息隊列就像“水庫”一樣,攔蓄上游的洪水,削減進入下游河道的洪峯流量,從而達到減免洪水災害的目的。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4a/4a5aea3eecedc173e97aef50a7095599.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"利用擋板過濾無效請求","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"流量擋板過濾,主要是建立","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"一種驗證機制過濾掉無效請求,保障核心服務避免受更多外界無效請求的影響","attrs":{}},{"type":"text","text":"。比較常用的方案就是“布隆過濾器”。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b2/b2f0df34957d7f6e180be8da1a54325a.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"產品策略的調整","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"產品策略調整是一種特別有效的手段,效果甚至會優於技術層面的改進優化。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如:利用排隊策略,有效打散高併發請求;調整活動宣傳時間分散點,避免同一時刻出現高併發請求…","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.2 緩存加速","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"緩存是解決併發的利器,可以有效的提高系統的吞吐量。按照業務以及技術的緯度必要時可以增加多級緩存來保證其命中率。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主要應用思路:在數據庫與服務端之間利用 Redis 做緩存服務,減少請求直接衝擊到數據庫。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/06/06969177fb8341906dc2a8383514f5f8.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.3 異步化處理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與異步對應的就是同步,即所有事情排隊一件件的有序進行,等上件事情完成後纔會去做下一件事情。有點像一根籤子串起來的糖葫蘆。需要實時處理並響應,一旦超過時間會結束會話,在該過程中調用方一直在等待響應方處理完成並返回。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"異步處理不用阻塞當前線程來等待處理完成,而是允許後續操作,直至其它線程將處理完成,並回調通知此線程。","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要強調一點:異步是一種設計理念,異步操作不等於多線程,常見的消息中間件、發佈訂閱的廣播模式等,都可以實現異步處理的方式。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"六、穩定性建設過程中的一些經驗","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"6.1 做好壓測","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提前做好系統壓測,做到心中有數,防患於未然,壓力預估要切合實際,不要盲目過大。對於性能瓶頸點,儘量提前做好改","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"進優化或者重點關注佈防","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"6.2 應急預案必備","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"應急預案一定要有","attrs":{}},{"type":"text","text":",研發人員往往比較自信,這是好事也是壞事,我們需要做最壞的打算。因爲經驗再豐富的工程師,也無法窮舉未來可能發生的意外事件,而故障往往出現在預案之外的地方(墨菲定律)。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"6.3 完善監控體系","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"建立完善的監控、告警機制,避免我們成爲瞎子聾子,保障報錯及時感知。在監控點的設置上,主要原則是:","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"所有的依賴都是不可信的!","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"6.4 快速響應能力","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"類似於在行駛的飛機上換引擎,過程中無論發生什麼樣的故障,立即要動用一切力量“快速”止損。服務要有等級劃分,保障抓大放小,保護核心服務原則,如確實存在不能快速定位問題時,可逐層降級。主要目標:","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"防止問題擴大,故障止損,快速恢復","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"穩定性建設關鍵點","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"削峯限流","attrs":{}},{"type":"text","text":" 面對資源上限,做技術、業務層面的處理,達到流量削峯保障服務穩定性;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"緩存加速","attrs":{}},{"type":"text","text":" 利用緩存解決併發,有效提升系統的吞吐量,同時需注意避免熱Key、大Key問題;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"異步化處理","attrs":{}},{"type":"text","text":"(同步->異步),有效提升響應效率,保障數據的最終一致性。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"技術服務於業務","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"技術還是要解決實際問題來落地","attrs":{}},{"type":"text","text":"。應用場景很關鍵,所有的優化工作不要單純爲了技術而技術,技術歸根結底還是爲應用場景和產業落地服務。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以嘗試將業務視角目標做爲最終目標,通過一切技術手段來保障目標的達成,從而實現技術價值最大化。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"不拘泥於形式,靈活運用","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"穩定性方案需要視場景而靈活調整應用,切忌生搬硬套。在具體實現過程中,關鍵要把控主要行動路徑,多條路徑情況下選取投入產出比最高的那一條。推進一個行動路徑:問題驅動(問題感知->問題分析->問題控制->問題解決)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Thanks for reading!","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章