抓住那隻搗亂的猴子,中國移動的混沌工程實踐

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"睡夢正酣,你被一陣密集的電話鈴聲吵醒,匆匆抹了把臉趕到公司,加入正在進行的討論。夜幕退去,天色漸亮,你的思路越發清晰,看着電腦屏幕上暢通運行的程序,鬆了口氣。緊接着,疲憊感如潮水般向你湧來。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是對技術人員而言並不陌生的場景,也是威脅年輕程序員毛囊健康的主因之一。脫落的是毛髮,迸發的是靈感。中國移動對於混沌工程有了最初的想法,也是在這樣一個被叫醒的深夜。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"從故障中來"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中國移動信息技術中心磐基PaaS平臺運營運維團隊在一個處理故障到五六點的凌晨,向自己發出疑問:平時總結在流程、運維等方面準備了很多,爲什麼關鍵時刻,這些東西起不到作用?磐基PaaS平臺目前只是支持中國移動的上百個系統,未來要支持更多系統,怎麼才能真正做到“乘舟上雲,穩如磐基”?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中國移動磐基PaaS平臺是中國移動信息技術自主研發的雲原生平臺,目前已建設K8S集羣144個,生產節點規模近1.1萬個,容器總實例數超過20萬個,提供微服務框架支撐能力、服務治理能力,解決微服務的分佈式特點帶來的管理複雜性。而混沌工程,是專門爲主動理解並應對複雜系統而創建的學科,如今普遍被用於分佈式架構和雲原生環境中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"混沌工程的誕生,是爲了建立對系統抵禦生產環境中失控條件的能力以及信心。2010年,Netflix爲了在避免遷移時可能出現的故障影響用戶體驗,開發了Chaos Monkey,用於提前測試系統。在實踐過程中,這個想法和測試工具被不斷完善,衍生出“混沌工程師”的角色,在工程社區推廣,並在《Principles Of Chaos》中被正式總結爲“混沌工程”。隨着越來越多的大公司開始瞭解和實踐混沌工程,2018年,不斷迭代的混沌工程(Chaos Engineering)成爲CNCF一個新的技術領域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中國移動解決故障的思路是“自己折騰自己”——探知風險、預測風險、挖掘風險。 "}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"到“混沌”中去"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於業務場景、人員結構、實施方式等指標的差異性,系統穩定性的評判標準往往難以統一。此前對業務系統的上雲架構評審以及高可用測試方案,都是由工程師根據經驗主觀推斷出來的。系統穩定性、功能合理性、服務完備性如何證明?傳統方案上我們只能證明系統可能會出現問題,而不能證明系統一定不會出現問題,無窮性出發點使我們沒有辦法證明系統的正確性。所以中國移動磐基PaaS平臺引入了混沌工程,用混沌工程在系統上的邏輯性來證明系統的正確性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"故障注入、故障測試和混沌工程,是業內容易被混淆的話題。在磐基團隊看來,混沌工程並不是在製造混沌,製造故障,而是將系統固有的混沌進行可視化。在整個混沌工程領域下包含了反脆弱性、故障注入、故障測試等相關範疇,所以它不僅僅是測試,其方法可以理解爲探索性測試與可觀測技術的結合。“混沌工程與反脆弱的一個關鍵區別是混沌工程讓人認識到混沌爲系統固有,從而提高團隊韌性,而反脆弱目的是讓系統在響應混沌時變得更強大。但混沌可以通過上述方法去輔助系統的理解,所以認爲它是大小集合的概念。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"混沌工程的黃金標準是:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"建立關於穩態行爲的假說"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多樣化引入真實故障事件"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在生產環境中進行實驗"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"持續運行自動化實驗"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最小化爆炸半徑"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"磐基混沌能力發展路徑,包括接觸和使用混沌工程的能力,混沌平臺建設和深化應用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"磐基團隊表示,混沌工程建設方面,最開始是利用工具實現高可用場景進行場景側的小範圍試點,在平臺構建的過程中進行業務規劃研討、混沌團隊建設和用戶試用,平臺發展的過程包括:用戶業務研討、穩態防禦、應用推廣、流程細化等。工具選型方面,基於支持不同的應用架構與部署架構,支持不同資源的故障場景,以及部署方便,中國移動磐基PaaS平臺選定阿里開源的ChaosBlade作爲混沌工程能力的底座。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"中國移動的混沌工程實踐經驗"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中國移動在混沌工程的實踐已經經歷了多次迭代。“我們其實最初的平臺建設,是先從故障注入的那種角度先進行,然後根據混沌工程的理念,往裏面再增加一些監控。舉個例子,對一個演練進行混沌工程的測試,可能某些指標上升了,如果我們對這個系統的認知不夠全面,故障一旦注入,可能會導致故障的蔓延式發展。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"什麼樣的企業需要做混沌工程?從行業屬性上來看,磐基團隊比較建議對用戶感知比較高的企業做混沌工程,比如金融證券行業。在中國移動,用戶感知高的業務,比如電話停機、卡券售賣,一旦出現故障,會立刻反饋道用戶端,降低用戶體驗。畢竟,業務端的業務量可能有上線,但變更無止境,故障也不會完全消失。混沌工程的意義在於可以減少故障發生的機率,並且讓系統在面對故障時擁有快速響應、恢復的可能性。所以,距離用戶最近的業務、用戶感知度高的業務,需要考慮進行混沌工程實驗。"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採訪嘉賓簡介:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"趙淳,磐基PaaS運營維護總監,負責磐基PaaS在集、省、專推廣、交付與後期運維,有多項國際認證,熟悉容器、微服務以及數據庫、中間件等雲原生技術,有多年省BOSS、CRM建設運維經驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"嚴俊,磐基PaaS平臺開發組成員,多年深耕於大數據、雲原生等領域,主導中國移動磐基PaaS平臺混沌工程能力的研發和建設工作。目前主要負責磐基PaaS平臺在邊緣雲、混沌工程、AIOPS等領域的技術研究與落地工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"袁慶彬,中國移動磐基平臺運營運維團隊成員,IT運維專家,長期致力於IaaS、PaaS領域。負責磐基平臺服務連續性治理、平臺架構優化和能力組件提升工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"晁元寧,磐基PaaS平臺運營維護組成員,PaaS交付專家、混沌工程專家,長期致力於雲原生、PaaS、混沌工程等領域。負責磐基PaaS平臺項目上雲的運維和運營工作,主要對接集團公司、專業公司、省公司的上雲項目,負責項目對接、項目交付、項目維護,具備豐富的交付、運維、應急演練等經驗。"}]}]},{"type":"horizontalrule"},{"type":"heading","attrs":{"align":"center","level":4},"content":[{"type":"text","text":"掃描下方二維碼,進入有獎問答"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"參與國內首個混沌工程調研報告"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/78\/b4\/78ca90e0bd194c1745abb38a57f3c4b4.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"25%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解我國混沌工程發展全貌,中國信通院聯合混沌工程實驗室啓動《中國混沌工程調查報告》問卷徵集活動,深入探索我國系統穩定性現狀及混沌工程使用情況、行業採納度、技術成熟度及未來發展趨勢,以期推動混沌工我國的概念普及,提升國內系統穩定性,促進軟件質量發展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次調查問卷由中國信通院聯合混沌工程實驗室、infoQ、VCEC、中國雲原生社區共同發起,參與問卷的用戶有機會獲得電腦包、文化衫等精美禮品,掃描上方二維碼進入問卷。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"混沌工程實驗室成員包括:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/05\/38\/059b75d168a41a99d5aeef0a116ee838.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章