🌏【架構師指南】分佈式技術知識點總結（數據處理）

原創

2021-07-23 11:44

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據分析","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"從傳統的基於關係型數據庫並行處理集羣、用於內存計算近實時的，到目前的基於hadoop的海量數據的分析，數據的分析在大型電子商務網站中應用非常廣泛，包括流量統計、推薦引擎、趨勢分析、用戶行爲分析、數據挖掘分類器、分佈式索引等等。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"並行處理集羣有商業的EMC Greenplum，Greenplum的架構採用了MPP(大規模並行處理)，基於postgresql的大數據量存儲的分佈式數據庫。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"內存計算方面有SAP的HANA，開源的nosql內存型的數據庫mongodb也支持mapreduce進行數據的分析。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"海量數據的離線分析目前互聯網公司大量的使用Hadoop、Spark、Blink、Flink。Hadoop在可伸縮性、健壯性、計算性能和成本上具有無可替代的優勢，事實上已成爲當前互聯網企業主流的大數據分析平臺。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop通過MapReduce的分佈式處理框架，用於處理大規模的數據，伸縮性也非常好；但是MapReduce最大的不足是不能滿足實時性的場景，主要用於離線的分析。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於MapRduce模型編程做數據的分析，開發上效率不高，位於hadoop之上Hive的出現使得數據的分析可以類似編寫sql的方式進行，sql經過語法分析、生成執行計劃後最終生成MapReduce任務進行執行，這樣大大提高了開發的效率，做到以ad-hoc(計算在query發生時)方式進行的分析。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"基於MapReduce模型的分佈式數據的分析都是離線分析，執行上是暴力掃描，無法利用類似索引的機制；","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"開源的Cloudera Impala是基於MPP的並行編程模型的，底層是Hadoop存儲的高性能的實時分析平臺，可以大大降低數據分析的延遲。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"目前Hadoop使用的版本是Hadoop1.0，一方面原有的MapReduce框架存在JobTracker單點的問題，另外一方面JobTracker在做資源管理的同時又做任務的調度工作，隨着數據量的增大和Job任務的增多，明顯存在可擴展性、內存消耗、線程模型、可靠性和性能上的缺陷瓶頸；","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop2.0 yarn對整個框架進行了重構，分離了資源管理和任務調度，從架構設計上解決了這個問題。","attrs":{}}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"實時計算","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在互聯網領域，實時計算被廣泛實時","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"監控分析、流控、風險控制","attrs":{}},{"type":"text","text":"等領域。電商平臺系統或者應用對日常產生的大量日誌和異常信息，需要經過實時過濾、分析，以判定是否需要預警；","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時需要對系統做自我保護機制，比如對模塊做流量的控制，以防止非預期的對系統壓力過大而引起的系統癱瘓，流量過大時，可以採取拒絕或者引流等機制；","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"有些業務需要進行風險的控制，比如彩票中有些業務需要根據系統的實時銷售情況進行限號與放號。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原始基於單節點的計算，隨着系統信息量爆炸式產生以及計算的複雜度的增加，單個節點的計算已不能滿足實時計算的要求，需要進行多節點的分佈式的計算，分佈式實時計算平臺就出現了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}},{"type":"strong","attrs":{}}],"text":"這裏所說的實時計算，其實是流式計算，概念前身其實是CEP複雜事件處理，相關的開源產品如Esper，業界分佈式的流計算產品Yahoo S4,Twitter storm、flink、blink等，以storm和blink和flink開源產品使用最爲廣泛。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於實時計算平臺，從架構設計上需要考慮以下幾個因素：","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"伸縮性","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着業務量的增加，計算量的增加，通過增加節點處理，就可以處理。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"高性能、低延遲","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從數據流入計算平臺數據，到計算輸出結果，需要性能高效且低延遲，保證消息得到快速的處理，做到實時計算。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"可靠性","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"保證每個數據消息得到一次完整處理。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"容錯性","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 系統可以自動管理節點的宕機失效，對應用來說，是透明的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 整個集羣的管理是通過zookeeper來進行的。客戶端提交拓撲到nimbus。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Nimbus針對該拓撲建立本地的目錄根據topology的配置計算task，分配task，在zookeeper上建立assignments節點存儲task和supervisor機器節點中woker的對應關係。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" zookeeper上創建taskbeats節點來監控task的心跳；啓動topology。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Supervisor去zookeeper上獲取分配的tasks，啓動多個woker進行，每個woker生成task，一個task一個線程；根據topology信息初始化建立task之間的連接;Task和Task之間是通過zeroMQ管理的；之後整個拓撲運行起來。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Tuple是流的基本處理單元，也就是一個消息，Tuple在task中流轉，Tuple的發送和接收過程如下：","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"發送Tuple，Worker提供了一個transfer的功能，用於當前task把tuple發到到其他的task中。以目的taskid和tuple參數，序列化tuple數據並放到transfer queue中","attrs":{}},{"type":"text","text":"。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"在0.8版本之前，這個queue是LinkedBlockingQueue，0.8之後是DisruptorQueue","attrs":{}},{"type":"text","text":"。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"在0.8版本之後，每一個worker綁定inbound transfer queue和outbound queue，inbound queue用於接收message，outbound queue用於發送消息","attrs":{}},{"type":"text","text":"。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 發送消息時，單個線程從transferqueue中拉取數據，把這個tuple通過zeroMQ發送到其他worker中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 接收Tuple，每個worker都會監聽zeroMQ的tcp端口來接收消息，消息放到DisruptorQueue中後，後從queue中獲取message(taskid,tuple)，根據目的taskid,tuple的值路由到task中執行。每個tuple可以emit到direct steam中，也可以發送到regular stream中，在Reglular方式下，由Stream Group（stream id-->component id -->outbound tasks）功能完成當前tuple將要發送的Tuple的目的地。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 通過以上分析可以看到，Storm在伸縮性、容錯性、高性能方面的從架構設計的角度得以支撐；同時在可靠性方面，Storm的ack組件利用異或xor算法在不失性能的同時，保證每一個消息得到完整處理的同時。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"實時推送","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 實時推送應用場景非常多，比如系統的監控動態的實時曲線繪製，手機消息的推送，web實時聊天等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 實時推送有很多技術可以實現，有Comet方式，有websocket方式等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Comet基於服務器長連接的“服務器推”技術，包含兩種：","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Long Polling：服務器端在接到請求後掛起，有更新時返回連接即斷掉，然後客戶端再發起新的連接","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Stream方式:每次服務端數據傳送不會關閉連接，連接只會在通信出現錯誤時，或是連接重建時關閉（一些防火牆常被設置爲丟棄過長的連接，服務器端可以設置一個超時時間，超時後通知客戶端重新建立連接，並關閉原來的連接）。","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"Websocket：長連接，全雙工通信","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 是HTML5的一種新的協議。它實現了瀏覽器與服務器的雙向通訊。webSocket API中，瀏覽器和服務器端只需要通過一個握手的動作，便能形成瀏覽器與客戶端之間的快速雙向通道，使得數據可以快速的雙向傳播。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Socket.io是一個NodeJS websocket庫，包括客戶端的JS和服務端的的nodejs，用於快速構建實時的web應用。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據存儲","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 數據庫存儲大體分爲以下幾類，有關係型（事務型）的數據庫，以oracle、mysql爲代表，有keyvalue數據庫，以redis和memcached db爲代表，有文檔型數據庫如mongodb，有列式分佈式數據庫以HBase，cassandra,dynamo爲代表，還有其他的圖形數據庫、對象數據庫、xml數據庫等。每種類型的數據庫應用的業務領域是不一樣的，下面從內存型、關係型、分佈式三個維度針對相關的產品做性能可用性等方面的考量分析。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"內存型數據庫","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 內存型的數據庫，以高併發高性能爲目標，在事務性方面沒那麼嚴格，以開源nosql數據庫mongodb、redis爲例。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"Mongodb","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"通信方式","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"多線程方式，主線程監聽新的連接，連接後，啓動新的線程做數據的操作（IO切換）。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"數據結構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據庫-->collection-->record","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MongoDB在數據存儲上按命名空間來劃分，一個collection是一個命名空間，一個索引也是一個命名空間。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同一個命名空間的數據被分成很多個Extent，Extent之間使用雙向鏈表連接。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在每一個Extent中，保存了具體每一行的數據，這些數據也是通過雙向鏈接連接的。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每一行數據存儲空間不僅包括數據佔用空間，還可能包含一部分附加空間，這使得在數據update變大後可以不移動位置。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"索引以BTree結構實現。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你開啓了jorunaling日誌，那麼還會有一些文件存儲着你所有的操作記錄。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"持久化存儲","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MMap方式把文件地址映射到內存的地址空間，直接操作內存地址空間就可以操作文件，不用再調用write,read操作，性能比較高。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"mongodb調用mmap把磁盤中的數據映射到內存中的，所以必須有一個機制時刻的刷數據到硬盤才能保證可靠性，多久刷一次是與syncdelay參數相關的。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"journal（進行恢復用）是Mongodb中的redo log，而Oplog則是負責複製的binlog。如果打開journal，那麼即使斷電也只會丟失100ms的數據，這對大多數應用來說都可以容忍了。從1.9.2+，mongodb都會默認打開journal功能，以確保數據安全。而且journal的刷新時間是可以改變的，2-300ms的範圍,使用--journalCommitInterval命令。Oplog和數據刷新到磁盤的時間是60s，對於複製來說，不用等到oplog刷新磁盤，在內存中就可以直接複製到Sencondary節點。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事務支持","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Mongodb只支持對單行記錄的原子操作","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HA集羣","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"用的比較多的是Replica Sets，採用選舉算法，自動進行leader選舉，在保證可用性的同時，可以做到強一致性要求。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然對於大量的數據，mongodb也提供了數據的切分架構Sharding.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"部署平臺","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"監控、統計","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大型分佈式系統涉及各種設備，比如網絡交換機，普通PC機，各種型號的網卡，硬盤，內存等等，還有應用業務層次的監控，數量非常多的時候，出現錯誤的概率也會變大，並且有些監控的時效性要求比較高，有些達到秒級別；在大量的數據流中需要過濾異常的數據，有時候也對數據會進行上下文相關的複雜計算，進而決定是否需要告警。因此監控平臺的性能、吞吐量、已經可用性就比較重要，需要規劃統一的一體化的監控平臺對系統進行各個層次的監控。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"平臺的數據分類","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用業務級別：應用事件、業務日誌、審計日誌、請求日誌、異常、請求業務metrics、性能度量","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統級別：CPU、內存、網絡、IO","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時效性要求","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"閥值，告警：","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時計算：","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近實時分鐘計算","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"按小時、天的離線分析","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時查詢","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"節點中Agent代理可以接收日誌、應用的事件以及通過探針的方式採集數據，agent採集數據的一個原則是和業務應用的流程是異步隔離的，不影響交易流程。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據統一通過collector集羣進行收集，按照數據的不同類型分發到不同的計算集羣進行處理；","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些數據時效性不是那麼高，比如按小時進行統計，放入hadoop集羣；","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些數據是請求流轉的跟蹤數據，需要可以查詢的，那麼就可以放入solr集羣\\ES集羣進行索引；","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些數據需要進行實時計算的進而告警的，需要放到storm集羣中進行處理。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據經過計算集羣處理後，結果存儲到Mysql或者HBase中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"監控的web應用可以把監控的實時結果推送到瀏覽器中，也可以提供API供結果的展現和搜索。","attrs":{}}]}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

京東廣告研發——效率爲王：廣告統一檢索平臺實踐

1、系統概述實踐證明，將互聯網流量變現的在線廣告是互聯網最成功的商業模式，而電商場景是在線廣告的核心場景。京東服務中國數億的用戶和大量的商家，商品池海量。平臺在兼顧用戶體驗、平臺、廣告主收益的前提推送商品具有挑戰性。京東廣告檢索平臺

2024-04-25 23:17:47

Sealos 雲主機正式上線，便宜，便宜，便宜！

我們基於 Sealos 雲開發的能力，僅用三天時間就上線 Sealos 的雲主機能力，現在不太懂容器的同學也可以在 Sealos 上開心的使用虛擬機了，本文先說 Sealos 雲主機的優勢，再聊聊我們是怎麼這麼快實現上線的，以及爲什麼我們要

2024-04-26 21:14:40

從零開始學架構V2-架構設計流程-2

一、架構設計流程架構的設計的是爲了降低整體的複雜性，那麼架構設計的第一步就是熟悉業務，識別其中的核心訴求，僅考慮技術的話就是識別複雜度。 1.1 識別複雜度架構的複雜度主要來源於第一節中介紹的“高性能”“高可用”“可擴展”等幾個方面，實

2024-04-25 23:56:26

從零開始學架構V2-初識架構設計-1

一、架構設計的主要目的爲了解決軟件系統複雜度帶來的問題二、複雜性來源軟件的架構設計是一個非常複雜的過程；基於業務&技術現狀、公司成本、團隊規模、團隊技術能力、近三年業務發展規模預測、技術發展趨勢等條件篩選出合適的技術、編寫多種架構設計

2024-04-25 23:56:25

三十分鐘入門基礎Go（Java小子版）

前言 Go語言定義 Go（又稱 Golang）是 Google 的 Robert Griesemer，Rob Pike 及 Ken Thompson 開發的一種靜態、強類型、編譯型語言。Go 語言語法與 C 相近，但功能上有：內存安

2024-04-25 23:17:43

Stable Diffusion中的embedding

Stable Diffusion中的embedding 嵌入，也稱爲文本反轉，是在 Stable Diffusion 中控制圖像樣式的另一種方法。在這篇文章中，我們將學習什麼是嵌入，在哪裏可以找到它們，以及如何使用它們。什麼是嵌入embe

2024-04-25 21:31:13

「實戰應用」如何用圖表控件LightningChart創建2D氣泡圖

LightningChartJS是Web上性能特高的圖表庫，具有出色的執行性能 - 使用高數據速率同時監控數十個數據源。 GPU加速和WebGL渲染確保您的設備的圖形處理器得到有效利用，從而實現高刷新率和流暢的動畫，常用於貿易，工程，航空航

2024-04-25 11:36:06

詳解數倉的向量化執行引擎

本文分享自華爲雲社區《GaussDB(DWS)向量化執行引擎詳解》，作者： yd_212508532。前言適用版本：【基線功能】傳統的行執行引擎大多采用一次一元組的執行模式，這樣在執行過程中CPU大部分時間並沒有用來處理數據，更

2024-04-25 10:33:17

百度安全多篇議題入選Blackhat Asia以硬技術發現“芯”問題

Blackhat Asia 2024於4月中旬在新加坡隆重舉行。此次大會聚集了業界最傑出的信息安全專業人士和研究者，爲參會人員提供了安全領域最新的研究成果和發展趨勢。在本次大會上，百度安全共有三篇技術議題被大會收錄，主要圍繞自動駕駛控制器安

2024-04-25 09:33:19

前端面試題 - 說一下原型和原型鏈？

前端面試題 - 說一下原型和原型鏈？ JavaScript 中，萬物皆對象，對象分爲普通對象和函數對象。所有的函數都是函數對象（typeof f === 'function'），其他都是普通對象（typeof o === 'object'

2024-04-24 23:51:10

前端面試題 - JS的垃圾回收機制？

前端面試題 - JS的垃圾回收機制？有兩種垃圾回收策略：標記清除：標記階段即爲所有活動對象做上標記，清除階段則把沒有標記（也就是非活動對象）銷燬。引用計數：它把對象是否不再需要簡化定義爲對象有沒有其他對象引用到它。如果沒有引用指向該

2024-04-24 23:51:03

數據結構筆記淺記（十三）哈希表

「哈希表 hash table」，又稱「散列表」，它通過建立鍵 key 與值 value 之間的映射，實現高效的元素查詢。具體而言，我們向哈希表中輸入一個鍵 key ，則可以在 𝑂(1) 時間內獲取對應的值 value 。從本質上看，哈

2024-04-24 23:39:16

一則 TCP 緩存超負荷導致的 MySQL 連接中斷的案例分析

除了 MySQL 本身之外，如何分析定位其他因素的可能性？作者：龔唐傑，愛可生 DBA 團隊成員，主要負責 MySQL 技術支持，擅長 MySQL、PG、國產數據庫。愛可生開源社區出品，原創內容未經授權不得隨意使用，轉載請聯繫小編並註

2024-04-24 23:20:53

離開工位老是忘記鎖屏？試着讓電腦自動完成這事吧！

1.場景說明公司要求離開工位要立刻鎖定電腦屏幕防止信息泄露，但無論是使用鎖屏快捷鍵還是設置觸發角，總感覺不得勁。想想汽車現在基本都是自動鎖車了，電腦它就不能自己鎖屏嗎？於是抽空蒐羅了一些自動化的解決方案，並按照Win和Mac進行分類。

2024-04-24 23:17:17

RocketMQ 之 IoT 消息解析：物聯網需要什麼樣的消息技術?

前言：從初代開源消息隊列崛起，到 PC 互聯網、移動互聯網爆發式發展，再到如今 IoT、雲計算、雲原生引領了新的技術趨勢，消息中間件的發展已經走過了 30 多個年頭。目前，消息中間件在國內許多行業的關鍵應用中扮演着至關重要的角色。隨着數

2024-04-24 23:40:04

24小時熱門文章

最新文章

最新評論文章