維基百科技術架構演進分析

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"互聯網系統涉及的行業與領域千差萬別,本質上就是將現實中的線下場景遷移到了線上,利用計算機系統爲人們提供更加快捷的服務。可以這樣理解,互聯網就是一個強大的工具,比以往任何工具都具有規模效應。原有的門店只能接待方圓幾公里以內的客戶,而現在則是全世界都有可能是其消費客戶。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"互聯網有一個通用的法則,就是“規模決定一切”。互聯網帶來的規模效應,將我們原有的認知提升了很多個數量級。原來銷售商品,能夠達到上億銷量額的時間需要花費很長時間。而現在,一名主播通過獨有的方式,藉助互聯網工具,短時間內就完成了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"既然互聯網具有如此巨大的效應,那麼,就會非常想要了解全世界的站點,訪問量排名靠前的都是什麼公司。當然,也有一點私心,那就是了解頂尖的互聯網公司的軟件架構是如何實現的。要知道互聯網帶來的規模效應,導致系統面臨巨大的挑戰,諸如高性能、高併發、高可用、海量數據、安全、成本等挑戰。他們的站點是抗住巨大的用戶訪問的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲此,我搜索了全球訪問量排名靠前的站點,發現維基百科赫然列在全球第五的位置,位於谷歌、Facebook、YouTube、Baidu之後。驚訝的地方在於,相比於其他高訪問量的站點,維基百科的研發維護人員實際上也就十幾個人,系統的服務器也都是捐獻出來的硬件設備,數量不多,而且性能質量肯定不是非常好。這就帶給我極大的好奇,相比於那些財大氣粗的互聯網巨頭,維基百科是怎麼利用極其有限的資源爲全球用戶提供了高質量的服務。這是一個值得思考的問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce0cd5e2cadbd5a2c1bf112c10fecc78.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了更好了解維基百科是如何進行架構演變的,故將其發展分爲創立階段、發展階段、成熟階段。這樣也更能夠理解維基百科怎麼成爲今天的樣子的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一、創立階段","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相比於傳統的軟件系統,互聯網下的系統應用已經無法一次性設計,永遠交付使用了。這是時代發展的需要,是經濟社會快速發展的必然。因此,對於維基百科創立之初,實際上,肯定與許多的互聯網創業公司一樣,先構建基本的系統架構,基於一個核心的創新業務需求點,迅速投入市場,吸引第一批用戶,進而驗證模式的可行性和有效性。還有一點,互聯網遵循漸進式發展,系統需要根據用戶需求不斷循環迭代。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,這一階段,怎麼簡單,怎麼來,怎麼快速,怎麼來。軟件架構設計部署如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5e/5e364a75940077b383b62dfd82b6166d.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼這麼認爲。維基百科2001年創立,那個時候,實際上,互聯網用戶的規模還沒有達到很高的程度,並且整個世界都剛剛經歷過互聯網泡沫。單體機應用實現系統快速上線運行即可。數據庫也是單機運行即可。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這應該就像谷歌最開始從斯坦福大學實驗室開始,Facebook從扎克伯格哈佛宿舍開始,阿里巴巴從馬雲的家裏開始一樣。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"二、發展階段","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着第一階段的成功,維基百科的業務受到了用戶認可,大量的用戶不斷湧入維基百科,直接或者間接使用維基。這裏需要補充說明一點,影響互聯網系統架構的兩個重要因素,一個是業務的複雜度,另一個就是用戶規模。對於維基百科而言,業務複雜度不高,主要是提供用戶查閱信息使用。因此,保證信息詞條的正確性,以及如何更加快速響應用戶訪問請求,這是維基需要思考的問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"能夠編輯詞條信息的用戶和操作,相比於查閱的用戶和操作,肯定非常少。實際上,這也不是對訪問用戶提供的功能,更像是一個管理系統,針對文章內容進行編輯更新操作。那麼,只有擁有權限的人,才能去操作。對於這部分,系統不會有什麼太大的壓力。當然,隨着內容的增加,擁有編輯詞條權限的用戶越來越多。但是,無論如何變化,閱讀的用戶數量遠遠大於編輯的用戶數量。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"保證信息的讀操作,就是非常重要的一環。此時,爲了應對大量的用戶訪問,就需要對系統架構進行升級,對維基業務進行完善。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統最先感受到訪問壓力的就是數據,尤其用戶大量的讀操作。爲了加快用戶響應,將數據庫進行讀寫分離。隨着用戶訪問繼續增加,查閱的數據大多數是相同的數據,沒有必要每次都需要去訪問數據庫,故增加緩存應用,存儲靜態數據,進而加快響應。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着維基百科的發展,擁有的信息和訪問用戶越來越多。此時,可以採用負載均衡,增加應用服務器數量,構建應用服務器集羣,進而分攤用戶訪問數量,減小單臺應用服務器壓力。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"發展階段系統架構部署圖如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/89/89d6c76e62762c3cea90afb7f524b50a.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"三、成熟階段","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個階段,維基百科的用戶規模和業務需求已經趨於穩定。同時,經過創立、發展階段的累積,系統的架構不斷得到升級改進。這個階段主要是針對各個環節,進行進一步的優化,主要目的就是提供用戶更加快速的服務。系統在PV、DUA、TPS、QPS等性能指標的輔助下,不斷完善系統。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於維基百科是面向全球用戶,實現異地多活,構建多個數據中心,根據用戶訪問地址,就近選擇數據中心,此時,就需要針對負載均衡升級。首先搭建DNS負載均衡,用戶通過域名訪問DNS服務器,DNS負載均衡通過地址列表,進而選出距離用戶最近的數據中心IP地址。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bb/bb075cf8c105ef866e546dbe37191713.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"維基使用了GeoDNS作爲DNS負載均衡服務器。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,我們指導DNS負載均衡只能針對訪問用戶進行地理劃分,也不能進行負載均衡算法設置,非常機械,不夠靈活。這是其缺點。因此,對於維基百科負載均衡,實際上是構建了三層。第一層使用DNS進行地理劃分,第二層則使用LVS軟件負載均衡,第三層,則繼續使用LVS。對於軟件負載均衡,LVS達到幾十萬級別,相比於Nginx的萬級,性能要好很多。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於大量用戶訪問維基百科,就是用於查閱之用,故維基在第二層與第三層之間,構建了反向代理服務器,使用Squid Caching Layers反向代理服務器,用於緩存靜態頁面數據。反向代理服務器,就是用於存儲靜態數據,當用戶請求系統,反向代理緩存具有用戶需要訪問的數據,則反向代理服務器直接返回數據,將數據返回給用戶,而不是將請求繼續發送至應用服務器。這樣也減小了應用服務器的計算壓力。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這裏,爲什麼不適用CDN。我認爲,還是基於成本考慮,畢竟維基百科主要收入還是來自於捐贈,不能像盈利的互聯網機構那樣,擁有雄厚的資金,故沒有必要,也沒有資金再去移動運營商那裏搭建CND服務器。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了增加用戶搜索的精準度,使用搜索引擎,提高用戶查詢的效率。我們知道,關係型數據MySql進行模糊查詢的時候,性能非常低,基於MySql存儲引擎以及索引結構,也無法提高模糊搜索的性能。而搜索引擎就是專門針對搜索而開發的中間件,對於搜索有着強大的功能。維基百科的搜索引擎採用lucene。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據庫數據增加到一定程序,會超出MySql容量,故採用擴容的方式,提升數據存儲的大小。同時,也爲搜索引擎提供數據來源。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"緩存系統也會採用分佈式緩存系統,提升緩存容量。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於維基百科,採用了反向代理,將緩存的數據返回給用戶。但是,由於某些詞條編輯更新,需要及時通知反向代理服務器,告知數據無效,故構建了消息通知服務,即Invalidation notification。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下便是今天維基百科的系統架構圖。應用服務器是基於PHP語言進行開發的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/cb/cb31c598d02bcbd132b7fa833b33d769.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上內容便是,我對於維基百科架構演進的一點思考。爲什麼我要這麼做呢?因爲我覺得當今的架構答案不重要,重要的是,系統是怎麼來的,是如何進行架構升級,如何進行技術選型的,如何方案選擇的。這個過程更加重要,沒有這個思維的過程,記住再多的答案又有何用。要知道今天的社會變化非常快,你理解的答案,過一段時間就不再是答案了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,這個思維過程更加重要,更加值得學習和刻意訓練。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,非常感謝你的閱讀,我自己的水平有限,文章肯定有很多錯誤的地方。能夠看出錯誤的地方,也說明你很厲害,知道那個點不對,那個點還行,也希望能夠指出來,幫助我進步。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章