十年三次重大架構升級,微博應對“極端熱點”的進階之路

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3年前沒扛住當紅花旦和頂流小生結婚的突發流量,但如今他們離婚的熱點,微博扛住了。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"“一個可靠的架構從來都不是設計出來的,而是逐步演進而來的。”這句話用來形容微博系統架構的改造歷程再適合不過。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"截至 2020 年 10 月,微博月活躍用戶達 5.23 億,作爲當今中文社交媒體的頭部品牌, 微博一直是社會熱點事件傳播的最主要平臺。而熱點事件往往具有不可預測性和突發性,10 分鐘內可能帶來流量的翻倍增長,甚至更大。如何快速應對突發流量的衝擊,確保線上服務的穩定性,是極大的挑戰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"過去幾年,微博無數次遭遇突發熱點事件帶來的流量衝擊,雖然剛開始確實暴露了不少穩定性問題,導致後來每次出現明星熱點事件,就會有大批網友調侃“微博掛了嗎”、“微博是不是又掛了”、“微博這次竟然沒掛”等等。但與此同時,微博研發團隊也在持續不斷地“打怪升級”,微博背後的技術方案和系統架構經過多次調整優化,變得越來越穩定。2020年疫情以來,微博出現多次整體流量翻2倍、甚至3倍的極端熱點事件,最終都平穩度過。本文,我們有幸採訪了新浪微博研發中心研發總監、微博後端架構多次升級的親歷者和負責人劉道儒,請他跟我們聊聊,圍繞“極端熱點”這個微博獨有的場景,團隊是如何展開架構改造和高可用保障工作的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"微博後端架構十年,三次重大升級"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"劉道儒在2010年加入微博,2011年開始參與和負責微博核心系統的改造和優化工作。據他介紹,過去這十年間,微博總共經歷了三次比較重大的後端架構升級。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d5\/56\/d504cc9b0e3d1cb481a12bcce3bf5856.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#a5a5a5","name":"user"}}],"text":"當前微博整體架構圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"分佈式平臺化架構(2011年-2014年):"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"爲應對極速增長的流量,研發團隊將微博核心系統從PHP架構改造成平臺化的Java架構,並構建了分佈式緩存、千億級存儲、異地多活、監控、服務化等基礎架構,改造完成後微博擁有了支撐億級DAU、千億級存儲的高可用架構。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"彈性混合雲架構(2015年-2019年):"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"2015年起微博流量持續增長且熱點頻發,流量隨時都可能成倍增加。爲在可控的成本下完成熱點應對工作,微博研發團隊構建了基於Docker和公有云的彈性混合雲架構,從核心業務開始花了三四年的時間逐步推廣到微博各個業務,最終讓各主要業務都具備了極速擴容能力,最新的彈性擴容速度是10分鐘5000臺。同時,也完成了微博的熱點聯動機制和Service Mesh技術架構WeiboMesh的建設,並推廣到了各業務。(延伸閱讀:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzIzNjUxMzk2NQ==&mid=2247490910&idx=1&sn=40efd42c17826c537dd00f0da30d1d6c&chksm=e8d7e29cdfa06b8a620b5cd9a1a1b584928400a7f6d7b3c143794596a502fb1b06e2f577622b&scene=27#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"《從觀望到落地:新浪微博Service Mesh自研實踐全過程》"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":")"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"智能雲原生架構(2020年至今)"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":隨着彈性能力的提升,微博單個運維和DBA能保障的服務和資源規模大幅增加,微博DBA人均管理1萬以上的資源端口,但隨着架構複雜度不斷提升,如何提供普遍的高品質的保障服務變成新的挑戰。研發團隊開始對數據庫、緩存、消息隊列等進行智能化和彈性調度改造,並完成了可將單位成本降低50%的基於阿里雲高配神龍服務器的整租零售方案建設,同步也在將運維和DBA團隊升級爲DevOps團隊。由於雲原生架構的改造和建設工作量非常巨大,相關工作還在持續推進中。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"除了微博核心系統,研發團隊也配合搜索、熱門微博、廣告、直播、視頻等業務進行了混合雲和雲原生架構升級改造,大幅提升了微博全站的熱點和穩定性保障能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"雖說成功的架構升級改造確實能夠爲企業帶來不小的收益,但改造需要付出的代價同樣不得不提前考量好。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在劉道儒看來,架構改造不僅消耗架構師團隊大量的時間,也需要業務研發團隊騰出很多時間和精力來支持,甚至會延緩業務研發速度,所以"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"如果不是問題致命且有普遍性,或者能帶來效率十倍以上提升,不要輕易展開一個大的架構改造項目。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"由於架構升級的成本非常巨大,2~3年的時間只能進行一項重大架構改造,選擇某一個架構改造就要放棄其他所有可能的架構改造,這就要求所選的"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"架構改造要具備獨特性、技術領先性以及很強的扇出效應"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"獨特性是指架構改造解決的問題要有很強的差異性,如果獨特性比較差,那麼大家都能很快做到,架構改造的成果就會大打折扣。以微博爲例,研發團隊"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"在選擇架構改造方向時主要會圍繞“極端熱點”這個微博獨特的場景來進行"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",包括做Web自動化擴縮容和數據庫自動化擴縮容都是圍繞這個場景來的,這樣在這個場景下就很容易做到持續領先。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"技術先進性則對技術品牌、招聘和研發競爭力非常有價值。而扇出效應則是指所選的方向要有以點帶面的能力,比如微博藉助領先的彈性能力改進了業務治理、上線發佈、快速研發、壓測、快速故障處置等方面的能力,藉助雲原生能力具備了產品化異地多活、資源雲、整租零售、在離線整合等能力,這些能力交織在一起構成了一套完備的雲原生基礎架構體系。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"劉道儒補充表示,爲了進一步攤銷架構改造的高成本,所選的架構改造方向要能進行產品化,能普遍支持公司的各種業務場景,只對某個業務場景有價值的方向不適合大規模來做。業務的故障和問題記錄可以作爲參考,看看所選的架構改造方向對過去一年的故障和問題的覆蓋度有多少。優先選擇高頻的方向,一方面能解決很多具體問題,另一方面也更能獲得業務的大力支持。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"應對“極端熱點”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"通過部署混合雲架構、讓業務具備彈性擴容能力,是微博面對頻繁爆發的熱點事件帶來的突發流量時,解決內部資源冗餘度不足問題的有力武器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"微博熱點分爲很多不同的類型,其中如地震、遊戲類的熱點主要衝擊Feed流,而明星類的熱點主要衝擊熱搜和熱門微博流。熱點的訪問路徑不同對應的技術鏈路也略有不同,但總的來說從網絡入口、四七層、接入層到業務層、平臺層、資源層,就像現實中的洪水洪峯一樣,各層都會逐步承受流量洪峯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/99\/fe\/9944f3eb180d4224c0acd4db44b277fe.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"差別在於不同層的擴容難易度、維持高冗餘度的成本、最高峯值大小等。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"通常網絡入口、四七層和接入層比較容易擴容,維持高冗餘的成本也較低;而平臺層由於服務器規模大維持高冗餘度成本就很高,熱搜、熱門微博等業務流量漲的非常快、3分鐘流量就能翻5~10倍,資源層擴容慢需要持續維持高冗餘度。另外,由於需要逐層逐業務地進行熱點應對治理,暫未治理的系統或業務如果遇到熱點承壓就會比較大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"由於熱點的業務鏈路和技術鏈路都很多樣,不可能防住一兩個點就高枕無憂,但所有系統和業務全部覆蓋的服務器成本和改造的人力成本會非常高,這也是微博熱點應對的挑戰之處。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"劉道儒表示,經過多年的實踐和研發,"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"最有效的手段還是對熱點頻發的業務和鏈路進行熱點聯動覆蓋,量化流量並通過動態擴縮容維持服務冗餘度,再對熱點進行實時判斷並聯動擴容。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"具體來說,會有一個服務實時檢測服務流量變化情況,如果流量快速增長並達到預定的一級熱度閾值,檢測服務就會對外廣播一級熱度消息。不同的服務和模塊會根據不同級別的熱度做出不同的反應,比如報警及通知模塊會通過IVR電話、郵件、內網IM消息等告知開發、運維等同學進行應對。而各業務系統則會在收到消息後,根據各自預定的方案進行擴容、降級等操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d5\/a3\/d56840cc379896036e3381d0yy2274a3.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 同時,日常工作中讓所有的業務均支持動態擴縮容,這樣即使某個業務第一次遇到熱點,也能快速擴容恢復服務。對於網絡、專線等公共基礎設施,則由基礎架構團隊持續監控並維持足夠冗餘度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"相比5年前,微博如今已經構建了完備的熱點應對機制,建立了包括熱度等級、烈度等級、熱點預測與高冗餘度保持、熱點聯動擴容、熱點聯動動員等機制,能夠在2分鐘內發現熱點並進行體系的降級、擴容等聯動應對,具備7*24小時10分鐘5000臺彈性服務器的極速彈性擴容能力,各核心業務均納入到熱點聯動範圍,並建立了涵蓋四七層、專線等全鏈路基礎設施冗餘度監控與保持機制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"2020年疫情以來,微博出現多次整體流量翻2倍、甚至3倍的極端熱點事件,熱點聯動機制均很好地保障了熱點發生時的全站穩定性,單次最大彈性擴容服務器達數萬臺"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"高可用保障的挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"除了極具挑戰的突發熱點場景,微博系統架構的可靠性保障還面臨來自其他方面的挑戰,異地多活就是其中之一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"微博早在十年前就開始嘗試"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/weibo-multi-datacenter-deployments","title":null,"type":null},"content":[{"type":"text","text":"異地多活部署"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",而這項工作一直持續了十年之久。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"2010年-2014年,微博研發團隊摸索並構建了初步的異地多活技術體系,並綜合各種因素最終採取了核心業務同城多活的架構,這期間團隊也走了一些彎路。最開始團隊選擇了基於MySQL觸發器的方案,後來遇到從庫延遲和消息到達無序等問題導致第一次方案失敗;2012年參考雅虎的方案又研發了微博多機房消息分發服務wmb,解決了異地消息同步的問題,微博核心系統實現了北京-廣州異地雙活架構;但到了2014年,隨着微博業務變得愈加複雜,普遍的異地多活成本已經非常巨大,微博核心系統撤離了廣州退回到北京同城雙活架構。直到後來有了混合雲架構之後,才又變成同城多活架構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"2015年以來,微博研發團隊逐步構建了彈性混合雲體系、WeiboMesh體系、統一運維平臺、數據備份與恢復平臺、指標治理與AIOps體系、資源雲體系、整租零售技術等一系列的前置技術體系,逐步統一了基礎架構體系。如今團隊正在推進5分鐘8萬臺服務器(Pod)的彈性能力、10分鐘百T級數據傳輸與恢復能力、跨語言智能服務治理、數據庫彈性擴縮容等領先的技術和架構能力,進而讓業務可以低感知甚至無感知地支持異地多活。到2020年底,隨着微博雲原生技術和資源雲技術的發展,微博終於具備了全站範圍低成本做異地多活的技術能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在劉道儒看來,技術發展到今天,異地多活已經不僅僅是一個技術問題,更多是成本與容災能力的平衡。異地多活不僅會大幅增加數據庫、服務器等成本,而且新業務的研發、老業務的改造、基礎設施建設等都要增加異地多活的支持。如果是遠距離的異地多活,不僅核心業務要做異地多活,所有相關的業務都要做異地多活,相關的成本會呈數量級增加。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"由於異地多活需要提高數據庫等資源的冗餘度與成本,微博研發團隊目前在重點建設1小時1萬臺服務器的快速遷移和異地重建能力(延伸閱讀:《"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MjM5MDE0Mjc4MA==&mid=2651072332&idx=2&sn=8ca5bc34dd3687011a9a343a897bd1cd&chksm=bdb9df1f8ace5609a5b83409cb6c37941673d3460844be7bc494beca228669fae5eb8fcab684&mpshare=1&scene=23&srcid=0330AT7cwDdr3TI7M8dD4vTc&sharer_sharetime=1618388398069&sharer_shareid=981118977eb8fe323bd68d2c3e035ce0%23rd","title":null,"type":null},"content":[{"type":"text","text":"業界前所未有:10分鐘部署十萬量級資源、1小時完成微博後端異地重建"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"》)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"除了基礎架構部主導的整體架構、基礎設施等層面的建設,微博的故障響應定級機制、服務SLA保障、重大事件應急演練、重大事件職守與保障等制度和機制在保障微博整體系統高可用上同樣發揮着重要作用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"此外,作爲最早的移動互聯網產品之一,微博已經經過十多年的迭代,擁有衆多業務線和各種新老服務,衆多的老服務和老資源如何維護和改造也是一個很大的挑戰。爲此,微博研發團隊構建了WeiboMesh架構和統一運維平臺以實現新老架構的跨語言標準化治理,並組建了專門的架構改造團隊幫助業務團隊進行架構改造與升級。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"雖然微博核心系統採用的是Java語言,但近年來隨着廣告和推薦體系的快速發展,C++語言架構在微博的應用場景越來越多,由於建設時間較短且研發團隊分散等原因,C++技術體系的完備性難以滿足業務需求。2020年下半年以來,藉着對推薦引擎進行升級重構的機會,微博組建了C++架構師團隊並完善了C++體系的開發、架構、運維等體系,在今年5月份即將召開的"},{"type":"link","attrs":{"href":"https:\/\/qcon.infoq.cn\/2021\/beijing\/presentation\/3481","title":null,"type":null},"content":[{"type":"text","text":"QCon 北京站"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"上,這個重構項目的負責人馬駸將會與大家分享相關經驗。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"雲原生浪潮下的技術選型思考"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"當前整個基礎架構和雲原生體系都在蓬勃發展,AIOps、邊緣計算、容器編排、雲原生數據庫等方向都發展得很迅速。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"隨着監控體系的完善和大數據技術的發展,AIOps能夠讓基礎設施治理更智能,大幅提升問題發現、故障定位、流量預測、服務治理等的效率。邊緣計算除了提供靜態服務外,已經具備函數式動態服務能力,並將很快具備普遍的動態服務能力,屆時與5G技術結合,用戶在毫秒級即可獲得強大的動態計算能力。容器編排則可以對資源進行全局錯峯調度,對Web、數據庫、大數據等資源統一調度合理調配,大幅提升資源整體的利用率。雲原生數據庫則能夠讓數據庫的自動化治理程度不斷提升,數據庫的自動分庫分表、自動擴縮容、自動遷移等能力將很快變成現實。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在劉道儒看來,IT領域層出不窮的新技術和新解決方案是值得研發人員慶幸的。這些新技術、新解決方案背後是旺盛的、未能充分滿足的且充滿挑戰的需求,它們也代表一波新的技術浪潮,新技術浪潮中會有巨大的紅利,這些紅利不僅包括效率或成本的大幅優化,也有利於研發人員的晉升和研發團隊的發展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"不過他也表示,如果某項技術連續幾年都很熱,說明這項技術還在發展中、還不夠成熟,還有一些關鍵難點未能攻克,不能普遍採用。這個時候更適合技術實力強大,且場景特別契合的公司和團隊先做嘗試,在部分場景落地並率先搶到紅利。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"任何新的先進技術在帶來便利和收益的同時,也會帶來新的挑戰。只有及時進行組織和系統的調整才能更快地掌控新的技術體系。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"從好的一面來看,雲原生浪潮大大加快了基礎架構體系的自動化和智能化演進速度,傳統對業務的人工運維和保障工作大幅減少,基礎設施的研發和保障工作大幅增加,需要基礎架構團隊在人員構成上不斷演進,需要更多綜合能力強的架構師,也需要更多的DevOps。雲原生的浪潮會大幅提升系統穩定性和治理水平,並大幅節省成本,工程師和架構師的工作會獲得更多認可。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"但與此同時,在雲原生技術完全成熟之前,雲原生架構的風險比之前更大了。以前很多系統都是獨立的,很多操作都是手動的,問題很多但很少同時出現;而云原生之後,由於有大量的中心節點、中心繫統的存在,以及大量的自動化工作,平常的問題會少很多,但全局性故障的風險卻大了很多。針對新的可靠性挑戰,劉道儒建議可以圍繞中心繫統等建立健全監控和故障恢復體系,並確保所有自動化的操作都能手工干預,同時也要定期進行災備演練。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在技術路線和架構選擇上,劉道儒一直認爲適合自己業務的技術纔是好技術,研發團隊要依據業務的特點選擇自己的技術路線,而不是盲目跟風。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"以微博爲例,2015年微博在做Docker編排技術選型時,由於當時的運維同學更熟悉和習慣用IP方式管理和調度資源,就放棄了Swarm的編排方案,也沒采用當時已開始火爆的K8s編排方案,直到2020年微博需要對高配服務器進行資源切割調度時才轉向K8s方案,這幫助團隊快速實現了Docker在微博的落地和廣泛使用。在2016年的時候,微博更多是一種“保守主義”和“實用主義”的架構思路,當時團隊比較小研發能力弱,系統的缺陷和挑戰也很多,保守主義是種不錯的防禦策略。而到了今天,微博已經有成規模的架構師團隊,系統和技術體系也相對健全,搶佔新技術制高點開始變成重點。當下,微博的研發團隊會更多對新技術方向進行跟蹤和終局預判,並與各業務線的需求和問題匹配,從而持續修正研發規劃。對沒有前途或者微博沒有競爭力的方向儘早放棄,但對新的戰略性方向會更加激進地投入和嘗試,從而用先進的技術來賦能業務發展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在混合雲和雲原生技術快速發展的當下,劉道儒對於後端架構技術選型也有一些新的思考。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"首先要看哪些是雲廠商擅長做的"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",如果雲解決方案的成本和效率遠超過自研,那麼直接用雲廠商的解決方案是不錯的選擇。如果擔心被雲廠商綁死,可以同時選擇兩個或多個雲廠商的服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"其次要根據自己公司的業務特點選擇適合的技術路線和架構戰略,這個技術路線一定要匹配自己的獨特業務場景,纔能有競爭力"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。然後要在相關點上持續建設,該自研的一定要自研,該用新技術的時候一定不要保守,只有新技術纔有大的紅利。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"最後對於雲廠商未覆蓋、也不在自己技術路線上的場景和需求,可以直接借鑑業界的成熟解決方案,夠用就好"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",要把精力聚焦在自己的技術路線上。當然如果所在公司財大氣粗,對基礎設施投入特別充裕,可以選多條技術路線同時建設,但不建議全面鋪開,畢竟投入再大也很難匹敵頭部雲廠商的投入和規模。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"採訪嘉賓介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"劉道儒,新浪微博研發中心研發總監"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"劉道儒爲微博基礎架構部負責人,曾供職於搜狗等公司。負責微博雲平臺、運維、DBA 等基礎平臺及關係流、推薦引擎等後端系統的研發工作,同時負責微博熱點應對及全站穩定性保障工作。擅長雲原生架構及大規模分佈式系統的構架和高可用保障,在雲原生、混合雲、大規模數據的存儲、處理、訪問、高可用保障等方面有豐富的實踐經驗。作爲項目負責人,曾主導過微博多機房部署、微博A\/B測試平臺、微博混合雲、客戶端 Feed 性能優化等項目,並主要參與了微博平臺化改造、平臺穩定性改造等項目,目前主要關注雲原生、混合雲、大數據架構、AIOps 等技術方向。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"活動推薦:"},{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"5 月 29-31 日,QCon 全球軟件開發大會將在北京舉辦。大會匯聚 150+ 位演講嘉賓,同時設立 29 個熱點技術專場包括 Serverless、Flutter、DDD、音視頻、雲原生、智能金融、大數據、數字化轉型、人工智能等,內容源於實踐並面向社區,"},{"type":"link","attrs":{"href":"https:\/\/qcon.infoq.cn\/2021\/beijing?utm_source=wechat&utm_medium=infoq&utm_campaign=9&utm_term=0502&utm_content=arti1","title":"xxx","type":null},"content":[{"type":"text","text":"點擊瞭解更多"}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章