跨越認知誤區,向智能化混沌工程故障演練邁進

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“企業在什麼階段可以開始考慮實踐混沌工程?”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“任何階段都可以考慮實踐混沌工程,特別是在開發、測試環境,越早引入混沌工程越好。”"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"智能化,從混沌工程實踐起步"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上對話發生在InfoQ與中國工商銀行技術專家的對話中。企業實踐混沌工程,目的是爲了提早發現系統和應用層面的問題。在中國工商銀行軟件開發中心高級研究員王炳輝看來,混沌工程實施的階段越早越好。他認爲,企業在任何階段都可以考慮實踐混沌工程,但是對於生產環境應用混沌工程則需要審慎。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“首先業務系統本身應該具備一定的高可用能力,不應出現對一個節點注入故障就會對整個應用產生影響;其次需要具備較爲完善的監控體系,即在實施混動工程過程中發生了意外情況,也可快速準確定位問題;最後要有較爲完善的應急預案體系,假使混動工程對應用產生了不可逆的影響,也能在最短時間內恢復系統。"},{"type":"text","marks":[{"type":"strong"}],"text":"當企業能滿足以上三個前提時,我們認爲就可以開始考慮在生產環境實施混動工程了"},{"type":"text","text":"。”不過,企業在生產環境實施混動工程,應該遵循先在非重點敏感場景進行試點,然後再逐步在重點敏感應用實施。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何啓動"},{"type":"link","attrs":{"href":"https:\/\/xie.infoq.cn\/article\/7a949e5e6a02734cce1830c5c","title":"xxx","type":null},"content":[{"type":"text","text":"混沌工程"}]},{"type":"text","text":"?首先需要考慮三個問題:需求是什麼?要建設一個什麼樣的平臺?應用在生產上是否已經達到可實施混動工程的前提要求?王炳輝認爲,明確需求是第一位的。“如果只是一個較小的企業或者實現一個小驗證點,最簡單的方式就是引入一個開源的故障測試工具,並在對應的場景去做相應的驗證。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MjM5MDE0Mjc4MA==&mid=2651076881&idx=3&sn=6f57e8c5ce82ad3e60db2a8ce2d395c1&chksm=bdb9c1428ace4854a8ee7acb196f16cea97dc241cb42c7b98542eb521179078e8fbf905ef64b&scene=27#wechat_redirect","title":"xxx","type":null},"content":[{"type":"text","text":"中國工商銀行"}]},{"type":"text","text":"爲例,底層的故障注入分爲多個模塊,涉及虛擬機、容器、物理機,不同類型的目標服務器對應的演練內容和方法都不一樣,通過到各個項目組去抽調對系統底層架構比較瞭解的人,一起探討需求,確定最終要實現的功能和需要吸納的人,最後再去構建平臺。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"混沌工程的實踐誤區"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"就算滿足了混沌工程實踐必備的前置要求,實踐過程中也很容易走到誤區。據王炳輝觀察,混沌工程故障演練是一個特別容易重實踐而輕理論的事情,一開始很多企業可能會熱衷於故障演練平臺能力的建設,實施各類的故障演練,發現問題,解決問題,然後一直如此循環。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“這樣確實在初期可以得到比較好的成效,但是後來會發現隨着應用系統的穩定性逐步提升,原來的這種粗放模式能發現的問題就越來越少了,混沌工程的故障注入慢慢變得像測試人員的常規的測試案例。來一個場景,就把一套故障注入的測試案例往上一套,測試通過了就認爲可用性較好。這樣會帶來一個問題,開發人員在設計和開發過程中,會下意識的對於常規性的混沌測試場景進行能力加強或者規避。從而使得演練效果看起來很好,但實際並沒有達到預期的情況。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在他看來,混沌工程是理論性很強的一門學問,需要對不同的平臺和應用進行全面系統的混沌測試設計,不同的系統的薄弱環節很有可能是不一樣的,因此對應的混沌測試內容也應有所差異,特別是跨平臺,跨系統的業務線,這類業務場景的鏈路較長、甚至存在異構系統之間的調用,要通過混沌工程發現整個業務鏈路的潛在弱點,要求測試人員具備較高的混動工程場景案例設計能力。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"混沌工程金融行業的特殊性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"混動工程作爲應用高可用水平非常重要的衡量手段,目前業界還沒有統一的標準體系。因此中國工商銀行在內部提出了一套工行應用高可用評價體系,根據應用的敏感程度設定不同的等級。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於評價等級較高的應用,部分節點甚至園區級故障都不應影響正常業務交易。對於故障恢復時間,重點敏感應用相對普通應用的要求也會更加高一些。“因此工行這邊的經驗就是,可以先制定一套符合企業需求的混沌測試評級體系,然後根據不同的應用設定不同的評價閾值。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體到金融行業,王炳輝強調了金融行業和其他行業的區別:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 金融行業對應系統可靠性的要求會比傳統非涉賬類系統更高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 金融系統的整體技術架構,可能會比新興的互聯網軟件行業會更加的複雜,比如幾個大行之前核心交易都是運行在大機系統上,現在逐步在做雲計算和分佈式轉型,這些都是互聯網企業不太會遇到的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. 金融行業相對互聯網企業有着更加嚴格的監管要求。特別是涉及到客戶信息,賬目信息的系統,受到嚴格的監管。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,金融行業在實施混動工程過程中往往會更加的謹慎,不會輕易在生產環境去做混動測試,要確保所有混沌測試除了符合企業內部的制度要求之外,也要符合行業監管的政策法規。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"智能化混沌工程故障演練是未來重要發展方向"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" “雖然混沌工程從最初誕生到現在已經有十幾年了,但是無論是國內還是國際,我們認爲目前混沌工程整體都還處於發展初期。”王炳輝表示。絕大多數混沌工程工具或者產品還處於依靠工人進行故障注入的階段,一些企業可能可以做到自動化進行故障注入和撤銷。但是離"},{"type":"text","marks":[{"type":"strong"}],"text":"智能化"},{"type":"text","text":"差距還非常遠。比如通過智能感知技術架構或者業務架構,然後生成相應的故障演練任務並進行實施,同時還可以在實施過程中自動化監控相關的技術和業務指標,動態調整故障注入模式和撤銷故障注入。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“目前國內無論哪家廠商,離這種完全擺脫對人工依賴的故障演練都還有很大差距,但是隨着這幾年國內各家企業對混沌工程的認知逐漸深入,混沌工程發展也非常的迅猛,我相信在未來幾年,"},{"type":"text","marks":[{"type":"strong"}],"text":"智能化混沌工程故障演練是混沌工程發展的一個重要的方向"},{"type":"text","text":"。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"什麼是"},{"type":"text","marks":[{"type":"strong"}],"text":"智能化混沌工程故障演練?"},{"type":"text","text":"可以根據應用的業務架構和技術架構,然後生成相應的故障演練任務並進行實施,同時還可以在實施過程中自動化監控相關的技術和業務指標,動態調整故障注入模式和撤銷故障注入。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“目前工行的混沌工程"},{"type":"link","attrs":{"href":"https:\/\/xie.infoq.cn\/article\/b241775f7efa5ab8851c970a0","title":"xxx","type":null},"content":[{"type":"text","text":"故障演練"}]},{"type":"text","text":"平臺用戶是各個應用的測試人員,平臺無法感知到目標服務器上運行的業務以及故障測試人員本次測試的目的。平臺只知道測試人員進行了哪些混沌測試,但通過混沌測試去反推目標服務器運行的業務或者測試的目的往往是不準確的。因此混沌工程平臺、業務應用、基礎設施三者之間沒辦法良好互通,阻礙了混沌工程實現自動化,智能化。我們未來可能會重點關注這方面的解決方案。”王炳輝解釋。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"混沌工程不只有技術價值"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"混沌工程自誕生起,一直是行業熱門話題,關於混沌工程的認知依然具有討論熱度。不同行業、不同企業、不同職位,對混沌工程的理解一直不盡相同。業務人員對混沌工程的理解,可能是上線之前驗證業務的正確性和冪等性的一種手段;技術人員對混沌工程的理解,可能是驗證系統架構容錯能力,提高故障應急效率的一種手段;產品人員對混動工程的理解,可能是可作爲提升用戶體驗的一種方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在中國工商銀行軟件開發中心高級研究員王炳輝看來,中國工商銀行建設混沌工程故障演練平臺,對分佈式系統各個應用實施故障演練,根本目的還是爲了提升工行系統的整體可靠性。“混動工程作爲提升平臺系統可靠性的一種比較常見的手段,是經過國內外衆多廠商實踐論證的。此外,混動工程也給我們傳達了一種主動去製造發現並解決故障的思維,這種思維也是我們從事IT行業人員,特別是運維人員所必須具備的一種品質。”王炳輝強調,混沌工程帶來的思維方式的改變,對中國工商而言意義較大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採訪嘉賓簡介:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"王炳輝,中國工商銀行軟件開發中心高級研究員,主導了中國工商銀行API開放平臺和分佈式運維支撐體系的建設,主要負責API開放平臺、分佈式監控、混沌工程、流量錄放、高可用保障等領域的研究及落地工作。"}]}]},{"type":"horizontalrule"},{"type":"heading","attrs":{"align":"center","level":4},"content":[{"type":"text","text":"掃描下方二維碼,進入有獎問答"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"參與國內首個混沌工程調研報告"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/78\/b4\/78ca90e0bd194c1745abb38a57f3c4b4.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"25%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解我國混沌工程發展全貌,中國信通院聯合混沌工程實驗室啓動《中國混沌工程調查報告》問卷徵集活動,深入探索我國系統穩定性現狀及混沌工程使用情況、行業採納度、技術成熟度及未來發展趨勢,以期推動混沌工我國的概念普及,提升國內系統穩定性,促進軟件質量發展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次調查問卷由中國信通院聯合混沌工程實驗室、infoQ、VCEC、中國雲原生社區共同發起,參與問卷的用戶有機會獲得電腦包、文化衫等精美禮品,掃描上方二維碼進入問卷。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"混沌工程實驗室成員包括:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/05\/38\/059b75d168a41a99d5aeef0a116ee838.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章