🌏【架構師指南】分佈式技術知識點總結(數據處理)

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據分析","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"從傳統的基於關係型數據庫並行處理集羣、用於內存計算近實時的,到目前的基於hadoop的海量數據的分析,數據的分析在大型電子商務網站中應用非常廣泛,包括流量統計、推薦引擎、趨勢分析、用戶行爲分析、數據挖掘分類器、分佈式索引等等。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"並行處理集羣有商業的EMC Greenplum,Greenplum的架構採用了MPP(大規模並行處理),基於postgresql的大數據量存儲的分佈式數據庫。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"內存計算方面有SAP的HANA,開源的nosql內存型的數據庫mongodb也支持mapreduce進行數據的分析。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"海量數據的離線分析目前互聯網公司大量的使用Hadoop、Spark、Blink、Flink。Hadoop在可伸縮性、健壯性、計算性能和成本上具有無可替代的優勢,事實上已成爲當前互聯網企業主流的大數據分析平臺。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop通過MapReduce的分佈式處理框架,用於處理大規模的數據,伸縮性也非常好;但是MapReduce最大的不足是不能滿足實時性的場景,主要用於離線的分析。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於MapRduce模型編程做數據的分析,開發上效率不高,位於hadoop之上Hive的出現使得數據的分析可以類似編寫sql的方式進行,sql經過語法分析、生成執行計劃後最終生成MapReduce任務進行執行,這樣大大提高了開發的效率,做到以ad-hoc(計算在query發生時)方式進行的分析。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"基於MapReduce模型的分佈式數據的分析都是離線分析,執行上是暴力掃描,無法利用類似索引的機制;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"開源的Cloudera Impala是基於MPP的並行編程模型的,底層是Hadoop存儲的高性能的實時分析平臺,可以大大降低數據分析的延遲。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"目前Hadoop使用的版本是Hadoop1.0,一方面原有的MapReduce框架存在JobTracker單點的問題,另外一方面JobTracker在做資源管理的同時又做任務的調度工作,隨着數據量的增大和Job任務的增多,明顯存在可擴展性、內存消耗、線程模型、可靠性和性能上的缺陷瓶頸;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop2.0 yarn對整個框架進行了重構,分離了資源管理和任務調度,從架構設計上解決了這個問題。","attrs":{}}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"實時計算","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在互聯網領域,實時計算被廣泛實時","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"監控分析、流控、風險控制","attrs":{}},{"type":"text","text":"等領域。電商平臺系統或者應用對日常產生的大量日誌和異常信息,需要經過實時過濾、分析,以判定是否需要預警;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時需要對系統做自我保護機制,比如對模塊做流量的控制,以防止非預期的對系統壓力過大而引起的系統癱瘓,流量過大時,可以採取拒絕或者引流等機制;","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"有些業務需要進行風險的控制,比如彩票中有些業務需要根據系統的實時銷售情況進行限號與放號。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原始基於單節點的計算,隨着系統信息量爆炸式產生以及計算的複雜度的增加,單個節點的計算已不能滿足實時計算的要求,需要進行多節點的分佈式的計算,分佈式實時計算平臺就出現了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}},{"type":"strong","attrs":{}}],"text":"這裏所說的實時計算,其實是流式計算,概念前身其實是CEP複雜事件處理,相關的開源產品如Esper,業界分佈式的流計算產品Yahoo S4,Twitter storm、flink、blink等,以storm和blink和flink開源產品使用最爲廣泛。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於實時計算平臺,從架構設計上需要考慮以下幾個因素:","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"伸縮性","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着業務量的增加,計算量的增加,通過增加節點處理,就可以處理。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"高性能、低延遲","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從數據流入計算平臺數據,到計算輸出結果,需要性能高效且低延遲,保證消息得到快速的處理,做到實時計算。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"可靠性","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"保證每個數據消息得到一次完整處理。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"容錯性","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 系統可以自動管理節點的宕機失效,對應用來說,是透明的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 整個集羣的管理是通過zookeeper來進行的。客戶端提交拓撲到nimbus。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Nimbus針對該拓撲建立本地的目錄根據topology的配置計算task,分配task,在zookeeper上建立assignments節點存儲task和supervisor機器節點中woker的對應關係。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" zookeeper上創建taskbeats節點來監控task的心跳;啓動topology。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Supervisor去zookeeper上獲取分配的tasks,啓動多個woker進行,每個woker生成task,一個task一個線程;根據topology信息初始化建立task之間的連接;Task和Task之間是通過zeroMQ管理的;之後整個拓撲運行起來。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Tuple是流的基本處理單元,也就是一個消息,Tuple在task中流轉,Tuple的發送和接收過程如下:","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"發送Tuple,Worker提供了一個transfer的功能,用於當前task把tuple發到到其他的task中。以目的taskid和tuple參數,序列化tuple數據並放到transfer queue中","attrs":{}},{"type":"text","text":"。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"在0.8版本之前,這個queue是LinkedBlockingQueue,0.8之後是DisruptorQueue","attrs":{}},{"type":"text","text":"。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"在0.8版本之後,每一個worker綁定inbound transfer queue和outbound queue,inbound queue用於接收message,outbound queue用於發送消息","attrs":{}},{"type":"text","text":"。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 發送消息時,單個線程從transferqueue中拉取數據,把這個tuple通過zeroMQ發送到其他worker中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 接收Tuple,每個worker都會監聽zeroMQ的tcp端口來接收消息,消息放到DisruptorQueue中後,後從queue中獲取message(taskid,tuple),根據目的taskid,tuple的值路由到task中執行。每個tuple可以emit到direct steam中,也可以發送到regular stream中,在Reglular方式下,由Stream Group(stream id-->component id -->outbound tasks)功能完成當前tuple將要發送的Tuple的目的地。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 通過以上分析可以看到,Storm在伸縮性、容錯性、高性能方面的從架構設計的角度得以支撐;同時在可靠性方面,Storm的ack組件利用異或xor算法在不失性能的同時,保證每一個消息得到完整處理的同時。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"實時推送","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 實時推送應用場景非常多,比如系統的監控動態的實時曲線繪製,手機消息的推送,web實時聊天等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 實時推送有很多技術可以實現,有Comet方式,有websocket方式等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Comet基於服務器長連接的“服務器推”技術,包含兩種:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Long Polling:服務器端在接到請求後掛起,有更新時返回連接即斷掉,然後客戶端再發起新的連接","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Stream方式:每次服務端數據傳送不會關閉連接,連接只會在通信出現錯誤時,或是連接重建時關閉(一些防火牆常被設置爲丟棄過長的連接, 服務器端可以設置一個超時時間, 超時後通知客戶端重新建立連接,並關閉原來的連接)。","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"Websocket:長連接,全雙工通信","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 是HTML5的一種新的協議。它實現了瀏覽器與服務器的雙向通訊。webSocket API中,瀏覽器和服務器端只需要通過一個握手的動作,便能形成瀏覽器與客戶端之間的快速雙向通道,使得數據可以快速的雙向傳播。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Socket.io是一個NodeJS websocket庫,包括客戶端的JS和服務端的的nodejs,用於快速構建實時的web應用。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據存儲","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 數據庫存儲大體分爲以下幾類,有關係型(事務型)的數據庫,以oracle、mysql爲代表,有keyvalue數據庫,以redis和memcached db爲代表,有文檔型數據庫如mongodb,有列式分佈式數據庫以HBase,cassandra,dynamo爲代表,還有其他的圖形數據庫、對象數據 庫、xml數據庫等。每種類型的數據庫應用的業務領域是不一樣的,下面從內存型、關係型、分佈式三個維度針對相關的產品做性能可用性等方面的考量分析。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"內存型數據庫","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 內存型的數據庫,以高併發高性能爲目標,在事務性方面沒那麼嚴格,以開源nosql數據庫mongodb、redis爲例。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"Mongodb","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"通信方式","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"多線程方式,主線程監聽新的連接,連接後,啓動新的線程做數據的操作(IO切換)。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"數據結構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據庫-->collection-->record","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MongoDB在數據存儲上按命名空間來劃分,一個collection是一個命名空間,一個索引也是一個命名空間。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同一個命名空間的數據被分成很多個Extent,Extent之間使用雙向鏈表連接。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在每一個Extent中,保存了具體每一行的數據,這些數據也是通過雙向鏈接連接的。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每一行數據存儲空間不僅包括數據佔用空間,還可能包含一部分附加空間,這使得在數據update變大後可以不移動位置。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"索引以BTree結構實現。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你開啓了jorunaling日誌,那麼還會有一些文件存儲着你所有的操作記錄。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"持久化存儲","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MMap方式把文件地址映射到內存的地址空間,直接操作內存地址空間就可以操作文件,不用再調用write,read操作,性能比較高。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"mongodb調用mmap把磁盤中的數據映射到內存中的,所以必須有一個機制時刻的刷數據到硬盤才能保證可靠性,多久刷一次是與syncdelay參數相關的。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"journal(進行恢復用)是Mongodb中的redo log,而Oplog則是負責複製的binlog。如果打開journal,那麼即使斷電也只會丟失100ms的數據,這對大多數應用來說都可以容忍了。從1.9.2+,mongodb都會默認打開journal功能,以確保數據安全。而且journal的刷新時間是可以改變的,2-300ms的範圍,使用--journalCommitInterval命令。Oplog和數據刷新到磁盤的時間是60s,對於複製來說,不用等到oplog刷新磁盤,在內存中就可以直接複製到Sencondary節點。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事務支持","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Mongodb只支持對單行記錄的原子操作","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HA集羣","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"用的比較多的是Replica Sets,採用選舉算法,自動進行leader選舉,在保證可用性的同時,可以做到強一致性要求。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然對於大量的數據,mongodb也提供了數據的切分架構Sharding.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"部署平臺","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"監控、統計","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大型分佈式系統涉及各種設備,比如網絡交換機,普通PC機,各種型號的網卡,硬盤,內存等等,還有應用業務層次的監控,數量非常多的時候,出現錯誤的概率也會變大,並且有些監控的時效性要求比較高,有些達到秒級別;在大量的數據流中需要過濾異常的數據,有時候也對數據會進行上下文相關的複雜計算,進而決定是否需要告警。因此監控平臺的性能、吞吐量、已經可用性就比較重要,需要規劃統一的一體化的監控平臺對系統進行各個層次的監控。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"平臺的數據分類","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用業務級別:應用事件、業務日誌、審計日誌、請求日誌、異常、請求業務metrics、性能度量","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統級別:CPU、內存、網絡、IO","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時效性要求","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"閥值,告警:","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時計算:","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近實時分鐘計算","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"按小時、天的離線分析","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時查詢","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"節點中Agent代理可以接收日誌、應用的事件以及通過探針的方式採集數據,agent採集數據的一個原則是和業務應用的流程是異步隔離的,不影響交易流程。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據統一通過collector集羣進行收集,按照數據的不同類型分發到不同的計算集羣進行處理;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些數據時效性不是那麼高,比如按小時進行統計,放入hadoop集羣;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些數據是請求流轉的跟蹤數據,需要可以查詢的,那麼就可以放入solr集羣\\ES集羣進行索引;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些數據需要進行實時計算的進而告警的,需要放到storm集羣中進行處理。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據經過計算集羣處理後,結果存儲到Mysql或者HBase中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"監控的web應用可以把監控的實時結果推送到瀏覽器中,也可以提供API供結果的展現和搜索。","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章