掌門教育自研APM實際分享

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"爲什麼我們要自研掌門教育自己的APM系統","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對數據分析這塊,掌門教育內部,後端服務使用的是開源的Apache SkyWalking系統,雖然SkyWalking已經提供了非常方便的SDK,可以滿足我們很多場景下的需求。但對於掌門教育目前的一些定製化的前端業務場景,我們很多的業務需求依然難以完全覆蓋,以此我們前端需要一套自己的APM系統。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"目前天眼系統的使用場景","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"天眼系統主要是針對外部C端用戶信息進行記錄,目前掌門教育已經有400+個前端項目,接入天眼系統的應用數量也有100+,接近所有項目總數的30%,主要覆蓋Web端、H5、App這些應用場景。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"掌門教育天眼系統的模塊結構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/16/16f165b7c882a117038252229dcfe93d.jpeg","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"探針:數據採集、上報是APM系統的發起點,它主要負責在客戶端程序中採集數據,併發送到我們服務器端的收集器。針對探針的設計,最大的難點主要在於我們如何去設計,並獲取我們需要的數據信息,比如跟用戶體驗及其相關的95/99線等等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"收集器、存儲器:收集、存儲器本身只是一個簡單的應用程序,但結合到數據源多樣化的topic類型、龐大日誌量,以及我們要保持系統的穩定性、可靠性,這就對我們提出了更高的技術要求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據可視化界面:UI系統是我們另外一個非常核心的應用產品,類似我們常見的PV、UV指標,都需要在這一層中被暴露出去,向我們的業務賦能這些關鍵數據信息。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"天眼系統的輻射能力","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"異常預警:前端異常告警的概念相對於後端應用來說,理念可能不是很強,比如後端redis-timeout這種異常是非常致命的,前端這樣的類似的場景就比較少。但現在,很多極度影響用戶使用體驗的場景,對於一家互聯網公司來說,也已經越來越重要,這就要求我們能夠尋找並提供一種方式、方法去讓前端團隊能夠對這些關鍵指標進行預警。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"工單追蹤:我們很多時候,C端用戶會報障過來,過去我們只能提供後端api的調用鏈來分析問題,但假如用戶App本身出現了問題,比如卡頓等等這樣的問題,那我們就需要能夠獲取到用戶的設備情況、網絡情況來進行分析。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務指標分析:對於前端應用來說,一個頁面內容的渲染、交互,可以分爲很多細小的過程,比如我們打開一個新頁面,需要哪些流程進行處理,每一個流程的表現情況如何,這些數據信息如果能夠記錄下來,並且進行鍼對性的分析,我們前端就可以針對性進行優化。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"前端APM重點關注的數據類型","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/08/087f9fcbc35e93fbcac2e7c35510913f.jpeg","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們目前APM系統,結合了非常多掌門教育定製化場景的數據類型,這些數據類型可能不一定適合每一個公司,這取決於你公司具體業務場景。在掌門技術部,我們很多的上報信息跟產品、項目是強相關的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通用性數據類型,我們包括PV、UV,設備信息,流量信息、系統信息,用戶App前後臺存活信息等等,另外H5、App採集方式的區別也比較大,上報的方式也不一樣。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據採集的一些問題和數據上報時機問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d9/d907ee761027cfdbad8ee5118a6b6f70.jpeg","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一個問題是數據源的準確性問題。前端數據源的採集相對於後端,往往受到的影響因素很多,比如後端常見的一些訪問超時,發生的時候就可以快速的記錄下來,而前端會面臨着延遲的概念,另外前端採集還會面臨很多數據丟失的情況,這種種因素髮生的概率非常高,這就對我們前端數據源的採集帶來了很多的挑戰。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二個問題是數據上報時機問題。對於C端用戶環境而言,我們的業務交互和採集數據上報都會佔用同一個帶寬資源,我們必須要保證業務的優先級,儘量不去影響用戶使用體驗,所以我們必須要實現一定的調度、控制,比如上報數據間隔變大或者變小,讓它自動化,自己自動去發現什麼時候合適去上報數據,同時我們也會需要一定的延遲上報能力,看看多少量的情況下更合適上報,而不是定時、定量去發送。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"未來展望","attrs":{}}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"我們希望能夠在數據上報成功率上再進一步,目前我們的上報成功率大概在98%左右,我們希望這個數據可以達到99%以上。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"天眼系統研發的初衷,是希望能夠補充我們公司定製化場景下的一些問題,所以我們也不希望閉門造車,未來,我們會去跟業務方進行溝通,對接更多的技術、業務需求,最終做到與公司互相賦能。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"目前,我們的Topic數目、日誌量開始慢慢多起來,這麼多的數據量裏面,我們去做數據信息的檢索,去查某一項的數據,性能上還是有很大的提升空間,未來我們可能調研一些其他方案來解決這些問題。","attrs":{}}]}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章