規避雲服務宕機的架構設計方法

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文最初發表於"},{"type":"link","attrs":{"href":"https:\/\/www.forelse.io\/posts\/architectures-for-mitigating-aws-outages\/","title":"","type":null},"content":[{"type":"text","text":"For Else網站"}]},{"type":"text","text":",經原作者Jeff Carter許可,由InfoQ中文站翻譯分享。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前段時間,"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/qiNZJ24ZaEF7gTrgUGdM","title":"","type":null},"content":[{"type":"text","text":"AWS經歷了中斷"}]},{"type":"text","text":",爲應對這種宕機,有很多關於架構的討論以及它們能如何有效處理這種狀況。因爲這些討論在成本、複雜性和權衡方面有很大不同,所以我想在概覽層面簡要介紹其中的幾個,然後深入介紹一個在很多對話中被忽略的一種架構。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"多雲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,就是關於多雲價值的討論。它的理念就是在多個雲中運行你的應用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e6\/e6f0dee93644a85168f1ddc14ec2926a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過將負載分散到多個供應商,我們就能在其中的某一個供應商出現故障的時候得以倖免於難。在理論上,這種方式聽起來很不錯!當然,兩家雲廠商不會同時宕機。但是,在實踐中,由於種種原因,在應用層面這樣做是很困難的:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每種雲的基礎設施是不同的"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"部署的複雜性會大幅度增加"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兩者之間的帶寬費用相當高昂"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"鑑於此,多雲架構並不是高可用的可行方案(少數的邊緣情況除外)。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"多Region"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來,是關於多Region的討論。AWS Region是由多個可用區(availability zone,AZ)組成的,每個AZ是一個或多個的數據中心,它們具有獨立的電源、網絡和連接。在一個Region的多個AZ中運行能提供高可用性,但是無法提供災難恢復(Disaster Recovery,DR)功能。爲實現這一點,我們需要多個Region。一個非常簡略的多Region結構如下所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e3\/e39574627075aa0543a15416f43ec223.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種方式解決了多雲架構的多個問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用依然在同一個雲中運行,所以基礎設施保持不變"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Region是完全獨立的,因此能獲得同樣的可用性優勢"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Region之間的帶寬費用要比雲之間的費用低得多"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但令人遺憾的是,大多數的評論都是圍繞Active-Active的多Region。也就是將負載同時分佈到多個Region,這帶來了很多關於持久化同步方面的複雜性。同時,這種方式也會增加部署方面的複雜性,並且很多地方都很容易出錯,甚至它本身的停機時間比AWS導致的宕機時間可能還要長。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"多Region DR"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是最近以來一種被忽視的方案。它的理念是在同一時間只有一個Region處於活躍狀態,在發生災難的時候,另外一個備用的Region能接管系統的功能(因此是DR)。這種方式和上面所述方案的收益是一樣的,但是它能極大地規避全Active-Active架構的複雜性。在這種架構下,備用Region不用完全構建,只需要複製持久化數據即可。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f1\/f1c3f98701b111b30740445aef40d4db.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,稍等,在發生災難時,部署完整的應用棧難道不需要一段時間嗎?是的,是這樣的,不過這是允許的!對大多數常見的中斷場景來說,高可用是通過使用多AZ實現的,這種方式就足夠了。如果整個Region出現問題,就像我們前段時間在AWS上所看到的那樣,花費小於一個小時的時間從備份中建立一個新的應用棧,仍然要比大於八個小時的中斷更可取。這個過程可以通過自動化的方式來進行簡化,但即便是手動的(但經過了實踐檢驗)操作,有可選的備用方案也是很重要的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以,我們更深入地探討一下這種架構:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用程序像平常那樣部署在主Region中"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用AWS託管的服務、備份和副本實現數據持久化,這通常只需要一兩個配置即可:"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在不同的Region中爲RDS添加一個讀副本"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"創建Dynamo DB global表"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"啓用S3 bucket副本"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在進行故障恢復的時候,將應用程序部署在其他的Region上,並更新DNS的設置"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這一過程要定期進行測試"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是一個銀彈嗎?絕不是。它並不適用於任何類型的工作負載,也絕對不可能適用於任何類型的宕機。然而,它是一個相對簡單的方案,並且有一定的成本效益。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總之,中斷肯定是會發生的,這絲毫不會降低AWS的價值,但是這確實表明了良好架構和規劃的重要性。我們可以設計一些非常昂貴和複雜的系統來緩解這些中斷,但這對大多數客戶來說是過猶不及和不切實際的。幸運的是,我們還有一些其他的選擇,它們可能會提供一個“足夠有效”的解決方案,並有合理的權衡,這應該成爲在AWS上開展工作時的“最佳實踐”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.forelse.io\/posts\/architectures-for-mitigating-aws-outages","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/www.forelse.io\/posts\/architectures-for-mitigating-aws-outages"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章