託管頁前端異常監控與治理實戰

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"導讀","attrs":{}},{"type":"text","text":":隨着業務快速迭代發展,系統對業務的監控、優化不再侷限於行爲、性能監控。前端異常監控更能反應用戶端的真實體驗。精細化的監控可以及時主動發現問題減少損失,針對性的分析治理甚至能帶來業務增益。本文結合廣告託管團隊異常監控治理的經驗,介紹從異常打點收集、報警監控、排查分析、治理優化的實戰總結。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"全文8455字,預計閱讀時間19分鐘。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"一、前言","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"行爲、性能、異常打點是前端領域的老生常談,實踐層面很多團隊對這些打點的應用範圍也是:行爲 > 性能 > 異常。這不難理解,行爲統計從團隊收益角度來看,短期內更加直觀,甚至有一些打點本身就是業務需求,如功能上線後的 PV、UV 統計等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般對於線上服務來說,後端異常監控是必須項,服務異常的主動發現也多從後端來,前端的異常監控能扮演什麼樣的角色呢?加入這樣的投入從管理者角度來看是划算的嗎?異常怎麼監控能更快的發現並引導止損?面對這些問題,很多業務的前端異常監控工作,還沒開始就結束了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們團隊在實踐中總結了一些思考和經驗,希望能對讀者有一點幫助。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.1 業務背景介紹","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們是百度廣告託管業務,承接着衆多行業的站點建設工作。這其中包含移動/桌面端的web站點、小程序、HN(百度App的類ReactNative方案)等多種載體,每天有大量的訪問流量。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於網民來說我們要保障流暢的閱讀、交互體驗。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於廣告主來說,我們要提供高質量保障。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過前端異常監控與治理,業務團隊收穫了提前發現問題、及時止損,優化廣告效果等諸多收益。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.1.1 要解決什麼問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如文章一開始介紹,業務發展的相當一段時間內,團隊的重心一直在後端的監控和報警完善上。但當我們將服務的穩定性治理達到一定的標準後,發現一些線上問題仍然難以召回,例如","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"頁面整體或某些部分渲染異常,影響體驗甚至廣告轉化和成本。可能造成的原因舉例:","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"靜態資源加載異常,包含script資源、圖片素材等","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"API訪問異常","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"JS執行異常","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相較後端異常監控,資源加載、JS執行異常都是前端異常監控帶來的增量場景,端到端的接口穩定性更接近用戶的真實感知,更能表明網絡對穩定性帶來的影響。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"小流量場景提前發現問題","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"產品的發佈往往需要小流量、AB 測試的驗證。或者某些問題僅在一些特定場景下觸發,因爲流量限制,很難通過服務數據波動發現,隨着擴量造成更大負向影響或客戶投訴後才被發現。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前端異常監控能很好地幫助我們解決這些場景的問題。下面將從","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"異常收集","attrs":{}},{"type":"text","text":"、 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"完善監控報警","attrs":{}},{"type":"text","text":" 、 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"異常排查","attrs":{}},{"type":"text","text":" 、 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"異常治理","attrs":{}},{"type":"text","text":"幾個階段,介紹我們的主要工作和經驗。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"說明:本文更多從業務應用視角討論問題,對於通用的埋點接受服務、數據處理、展示平臺不做太多探討,所幸團隊已經有這樣的專業人員和平臺。結合我們的業務場景需求,和平臺共同設計並支持了","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"通用監控","attrs":{}},{"type":"text","text":"之外的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"業務異常監控","attrs":{}},{"type":"text","text":",後面會介紹。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"二、異常收集","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一步,我們要把異常情況,以打點的形式發送至收集服務。這包含很多文章提到的通過 window 監聽捕獲到的 error 等通用方案 ,還有一些更加隱蔽,但對業務有很大影響的業務異常。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.1 通用異常收集","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通用異常收集是一種無侵入的異常收集方式,無需業務開發者主動表達,在系統發生異常時,通過事件的冒泡、事件捕獲或者一些框架提供的hook 函數來進行錯誤的收集。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對頁面中異常進行收集時,主要會涉及兩類場景:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於網絡請求導致的資源加載型異常,比如圖片加載失敗、script鏈接加載失敗","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於運行時導致的異常,這類異常多數是由於一些代碼的兼容性或者未考慮到的邊界情況產生的","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"針對資源加載異常,業務中會有以下兩種監控方式","attrs":{}},{"type":"text","text":":","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"使用資源自身的 onerror 事件,在資源加載失敗時將錯誤上報出去。這種場景一般需要藉助打包工具,在代碼打包時,針對相關的資源添加onerror 的邏輯,例如使用 script-ext-html-webpack-plugin 針對所有script 標籤添加 onerror 屬性。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"利用:","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"window.addEventListener('error', fn, true)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對運行時產生的異常, 通常我們使用以下方式進行監控:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"頁面頂層添加如下事件:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"window.onerror 或 window.addEventListener('error', fn)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但這種處理方式也有其侷限性,針對未 catch 住的 promise 產生的異常無法進行捕獲,所以在業務使用時,一般是額外再添加一個事件監聽方法來捕獲未被處理的 promise 異常。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"window.addEventListener('unhandledrejection', fn)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對運行時產生的異常,一些前端框架也給我們提供了配置方法來簡化我們的日常開發。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"React","attrs":{}},{"type":"text","text":" ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"框架:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 React 16之後,框架支持componentDidCatch 用於對render 時異常進行捕獲。但在使用時需要注意,參見error boundary","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(https://reactjs.org/docs/error-boundaries.html)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Error boundaries do not catch errors for:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Event handlers","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Asynchronous code (e.g. ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"e.g.","attrs":{}}],"attrs":{}},{"type":"text","text":" ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"setTimeout","attrs":{}}],"attrs":{}},{"type":"text","text":" ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"or r``equestAnimationFrame","attrs":{}}],"attrs":{}},{"type":"text","text":" ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"callbacks)","attrs":{}}],"attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Server side rendering","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Errors thrown in the error boundary itself (rather than its children)","attrs":{}}]}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Vue","attrs":{}},{"type":"text","text":" ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"框架:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Vue 框架也提供了類似全局的錯誤配置。下面方法可以指定組件的渲染和觀察期間未捕獲錯誤的處理函數。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"Vue.config.errorHandler = (err, vm, info) => {}\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從 2.2.0 起,這個鉤子也會捕獲組件生命週期鉤子裏的錯誤。同樣的,當這個鉤子是 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"undefined","attrs":{}}],"attrs":{}},{"type":"text","text":" 時,被捕獲的錯誤會通過 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"console.error","attrs":{}}],"attrs":{}},{"type":"text","text":" 輸出而避免應用崩潰。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從 2.4.0 起,這個鉤子也會捕獲 Vue 自定義事件處理函數內部的錯誤了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從 2.6.0 起,這個鉤子也會捕獲 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"v-on","attrs":{}}],"attrs":{}},{"type":"text","text":" DOM 監聽器內部拋出的錯誤。另外,如果任何被覆蓋的鉤子或處理函數返回一個 Promise 鏈 (例如 async 函數),則來自其 Promise 鏈的錯誤也會被處理。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"注意:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在捕獲到的異常中,經常會看到錯誤信息爲 \"Script error\" 的異常。該類異常產生的場景是網站請求並執行一個跨域腳本,如果該腳本報錯,在全局監聽異常的方法中,就會捕獲到錯誤信息爲 \"Script error\" 的異常。由於瀏覽器的安全限制,這裏並未展示出具體的報錯信息,這對排查問題是十分不友好的。目前的項目打包之後的資源文件大多會單獨部署到 CDN 服務上,資源的引用域名爲 CDN 域名,與頁面運行的域名不一致。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"常見的解決方式爲使用打包工具,在 script 鏈接上添加:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"crossorigin(https://developer.mozilla.org/zh-CN/docs/Web/HTML/Attributes/crossorigin)屬性,同時要在資源的返回頭中添加 access-control-allow-origin: yourorigin.com。這樣,通過 CDN 地址引入的 JS 在運行報錯時,基於全局的錯誤監聽方法就可以獲取到完整的錯誤信息了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.2 業務異常收集","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.2.1 如何定義業務異常","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在通用的異常收集基礎上,系統增加了業務自定義異常的打點。它是一種“有埋點”的開發方式(相對上層開發者不感知的“無埋點”方式來說)。開發者在程序中顯式發出數據點,並常伴隨一些當時的運行數據。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼要增加這種方式?標準方式採集到的異常數據信息量有限,大多是異常堆棧等。但仍有一些場景不能很好滿足:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"無法直接從底層捕獲到","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然控制檯中你看不到飄紅的報錯,但從業務角度來看有些問題仍需關注。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如:APP 下載業務中,客戶需要在頁面中綁定渠道下載包,再用頁面進行廣告投放。有時候客戶操作失誤,將安卓下載包投放到了 iOS 中,這種情況在頁面渲染階段不會有什麼異常,但明顯對廣告轉化是不利的,從業務視角需要被發現解決。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"一些異常堆棧信息不充分","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們需要獲取一些運行時的數據,來輔佐後續問題的定位和分析。比如當前訪問的賬戶 id ,當時應用狀態中的某些關鍵業務數據等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如:某次業務中發現大量的 \"onAndroidBack is not defined\" 的異常,通過異常信息中攜帶的產品線 id ,快速定位到產生異常的產品線,和開發同學溝通後,定位到了問題代碼進而進行業務兼容。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"分析成本高,報警時效低","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你有過實踐,會發現異常數據的分析、統計工作並不輕鬆,尤其是在非常通用的異常堆棧中找到更精準的問題,及時報警、排查、止損。業務異常能讓我們在異常發生時,更加直觀地定位根因。我們並不是想在分析階段偷懶,而是讓數據服務的計算邏輯更加簡單直接,這樣大數據處理時效更高,報警後人工分析修復問題也更高效。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.3 異常收集協議","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了能夠支持","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"通用異常","attrs":{}},{"type":"text","text":"和","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"業務自定義異常","attrs":{}},{"type":"text","text":",需要設計統一的傳輸、存儲數據協議。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"傳輸和存儲協議的設計:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"傳輸協議的的設計需要遵循以下原則:頂層 schema 穩定,業務信息可擴展。除此之外,下游數據處理模塊能夠快速對接也是其設計時需要重點考慮的一個因素,於是我們對整體的數據傳輸協議格式進行了以下的定義:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一級 key 延續通用的數據處理模塊支持的數據結構,比較重要的有以下幾個字段。其中 meta 字段爲廣告託管業務中一些通用的數據字段。並且對 meta 中的 extra 字段進行了二次開放,通過埋點 sdk 對外暴露的 api ,開發者可以將一些其他的業務數據一起上傳,以輔助後期異常的排查。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"{ exception: // 儲存異常堆棧相關信息 request:// 儲存當前頁面相關的信息 meta: { xxx: // 業務相關字段 extra:{ // 開發者可自行擴展的字段 } }}\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上述的設計在滿足穩定性,完備性的前提下,又支持業務的靈活擴展。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"這裏或許大家會有疑問,爲什麼 extra 的字段不能打平放在 meta 中?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之所以這樣設計,和底層的表結構索引的建立以及下游數據處理的複雜度息息相關。放在 meta 中的業務相關字段是託管頁的通用字段,爲了方便後續的查詢,在數據庫中是以列的形式存在,需要提前枚舉出,爲了定義這些一級 key ,對整體業務進行梳理。最終定義在一級 key 中的字段特徵爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務中用於歸因分析的 id 字段","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以 extra 字段爲代表的輔助進行信息排查的字段","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存放在 meta 一級 key 中的數據升級成本較高,需要上下游都理解其業務含義後統一升級。爲了方便業務方進行信息的靈活擴展,在 meta 中添加 extra 字段,並將該字段以字符串形式存儲到數據庫中。數據庫存儲使用了百度自研的 BaikalDB(https://github.com/baidu/BaikalDB) ,對這類結構化信息的實時存儲和讀取有很好的支持。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了能實現通用異常和自定義異常的上報,我們對異常捕獲進行了分級。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d1/d124f711e5e429b4315f8e6c05784ffa.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"通用異常","attrs":{}},{"type":"text","text":"使用下面的方式進行上報:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"window.addEventListener('error', error => { logSdk.addWindowErrorLog(error)},true)// 藉助vue 框架的能力,將運行時的js 異常進行上報Vue.config.errorHandler = (err, vm, info) => { logSdk.addCustomErrorLog({ errorKey: xx, // 框架收集到的異常,會有一個默認確定的errorKey。 error: err, userExtra: { message: info } });};\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"業務自定義異常","attrs":{}},{"type":"text","text":"使用下面方式進行上報:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"try { xxx 業務邏輯} catch(e) { logSdk.addCustomErrorLog({ errorKey: 'xxx', // 具體的業務類型 error: e, userExtra: { // 業務自定義擴展字段。對應傳輸協議中的extra。 } })}\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在頁面打開時,通過傳入業務 meta 信息對埋點 sdk 進行實例化。當發生異常時,如果業務方進行了捕獲,則由業務方自行構造錯誤參數,調用 API 進行上報。未被業務捕獲到的異常,將通過框架的統一異常處理邏輯進行上報。同時,註冊全局的 error 事件處理方法,針對一些資源加載異常,以及一些其他的異常進行捕獲。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣層層嵌套的模式,即實現了業務方自定義異常的訴求,又可以對未被業務方捕獲的異常進行統一收集上報。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在異常日誌存儲入庫前需要對數據進行預處理,這裏藉助了公司內流式計算平臺的能力,針對每條日誌數據進行實時 ETL 處理,最終將 meta 中的數據以及在 nginx 層獲取到的一些數據實時存儲到數據庫中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣從傳輸協議的通用性到存儲查詢的高效性綜合考慮後,最終得到了一個存儲線上異常日誌的表,這個存儲異常數據的表結構列非常之多,是一個很大的“寬表”,這個“寬表”爲後面的數據聚合報警提供數據支持。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"三、完善監控報警","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大量的數據要想產生準確高效報警,需要經過下圖的流程:基於打點的元數據,創建監控項;基於監控項的統計,設定報警策略。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b2/b246fb1775d24e5bf708dc0bb8b633a6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"3.1 圈定監控項","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如前所述,監控平臺將異常收集並形成一個大寬表,基於寬表,多個聚合項的條件分析,可以滿足絕大多數的監控項訴求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"監控項","attrs":{}},{"type":"text","text":":對某列數據的過濾,如 URL 包含某個 query ,業務類型屬於某個範圍。平臺上支持瞭如下圖的多種過濾條件(支持正則)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f3/f3c4288a80299fc6f1d7b6eafd001e72.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"監控聚合","attrs":{}},{"type":"text","text":":多個監控項的交集。如(業務線 === XXX) && (請求狀態碼 === 500)。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"3.2 制定報警策略","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"報警策略有三大關鍵因素:聚合週期、報警接收組、觸發機制。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"聚合週期","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"監控項是一種統計規則,聚合週期是對規則統計的窗口。根據數據量、重要性等設定合理的聚合週期。如對於最高優的、波動敏感的廣告轉化相關異常設定30秒的準實時報警;反之可以適當加大窗口。避免波動較大的監控項頻繁誤報。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fc/fc185d1003b6b474033ac1fd77415011.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"觸發機制","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"觸發有閾值和波動兩種方式。針對比較平穩的異常數可以設定閾值,例如分日看某些業務指標基本持平;針對有波動的異常,可以通過昨日、上週、兩週前來對比,例如用戶訪問量在一天內成一定規律波動,異常量會跟隨波動變化。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如下圖波動較大,沒有明顯的時間規律,適合用閾值:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/64/6427f9f35016769f3387e1b8f6529e1b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如下圖異常數量有時間規律,可以設定","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"波動報警","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d8/d8411b24a652ad3a8e7bcd5dd260cb8a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在實踐過程中,我們常常雙管齊下,平衡報警的準招率是一個很不簡單的事情。我們也會一邊觀察、治理,一邊調參。更多的一些挑戰和方案,會在後面提到。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"報警接收組","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過郵件、即時通信消息、短信等方式保證報警觸達。最重要的經驗是,報警接收人永遠不要","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"單點依賴","attrs":{}},{"type":"text","text":" !","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"3.3 挑戰","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面提到,異常監控報警的準招重要且有很大挑戰。一般 Server 服務會有網關,運行環境穩定,而前端代碼運行的環境會更加不可控,對完善監控都是很大挑戰。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"挑戰1: 如何建立完備的監控?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"異常都上報之後,必須針對每類異常都能夠感知,正常來說,按照異常的類型(資源加載異常、API 異常、JS 執行異常)來分類,針對每一類異常建立監控,即可滿足完備性的要求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是在實踐中發現,這種設置不適合託管頁。文章開頭提到,託管頁的業務場景覆蓋不同的端。不同端之前的流量差異巨大,不同端之間複用同一個異常監控項,流量小的端產生的錯誤很容易被淹沒在整體異常中。因此,從託管業務出發,將異常劃分爲兩個維度:異常類型和異常所在的端。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c9/c9b6a6a0ef23851aebcfeec434103215.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兩個維度進行組合進行建立的監控項既可以滿足完備性的要求,又可以及時的發現不同端之間產生的問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"挑戰2: 如何提高報警的精準度?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"託管頁的報警是基於各種條件進行實時的聚合,然後與預設的閾值進行對比來判斷是否觸發了報警。理論上來講,報警的準確度取決於業務方,只要聚合的條件足夠精準,報警就足夠的精準。但是這是一個成本與實踐的反覆試驗,在報警觸發之前,你不知道什麼樣的條件能夠排除掉這種無效報警。因此,在業務的不斷實踐探索中,沉澱了一些通用的異常聚合條件以提升報警的精準性:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"排除爬蟲流量(通過ua)","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"只看商業流量(通過商業投放參數判斷)","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"逐步完善的異常黑名單(已知的無法解決的異常,比如外部注入導致的 \"Script error\" 等異常)","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"**舉例來說:**開始的時候設置來自某個業務線的 JS 異常報警。聚合條件設置如下","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"業務線 = xxx && 錯誤類型 = js異常\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"優化後的報警聚合條件爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"業務線 = xxx&& 錯誤類型 = js異常&& 商業流量標誌 != '' // 排除掉非商業流量&& ua not like '爬蟲ua' // 排除掉爬蟲流量&& error_message not like 'Script error' // 排除掉黑名單中的異常\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲避免每個報警項都重複的設置相同的聚合條件,把一些通用的數據在頂層進行過濾,在提升報警精準性的同時減少了每個業務方的配置工作。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d9/d9dc81bedbf61155f39ca08fffa014ac.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"挑戰3:帶有明顯週期性的異常監控項如何設置監控的同比和環比?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對有明顯週期性的異常監控在初期設置的時候,一般都會比較謹慎。設置的過小,產生的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"無效報警","attrs":{}},{"type":"text","text":"會很多。設置的過大,有報警時無法及時觸發。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種情況在實踐中發現:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"同環比的設置不應該在一開始設置的,應該觀察一段時間再設置。比如,同比昨天的數據,這個閾值的設置應該在至少積累2天數據後再設置,以實際每天數據的波動情況來進行合理的閾值百分比設置。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"同比環比的設置不應一成不變的,應該隔一段時間更新一次。隨着業務的發展,線上的異常請求是不斷變化的,如果發現一段時間內的報警變多了,而排查後發現大部分是無效的報警,這個時候,你就需要重新考慮你的報警設置的是否合理了。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"挑戰4:如何監控報警後的問題跟進情況?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"託管頁的異常治理,不僅是一個基礎能力的建設。還希望形成異常問題發現、異常跟進、異常解決的工程化能力閉環。異常的跟進打通公司內部任務管理平臺,針對每一個報警創建一個任務卡片,由具體的異常負責人進行跟進。當問題解決後,可以在任務卡片上進行具體信息的錄入,以此來實現每個異常都有專人跟進處理的目標。爲提升問題的跟進率,我們還會基於任務卡片的信息進行例行化的統計,針對卡片停留時長、卡片個數等進行分析計算並打通內部即時通訊工具對卡片統計信息進行例行化推送。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/16/16ec51c3c307fc119cac75b2129b60eb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"===","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"四、異常排查","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在收到一個異常報警後,快速定位到報警產生的原因是一個非常常見的業務場景。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從實踐中,我們總結出幾點能夠快速提升異常排查效率的方式:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"善用聚合","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"很多時候,線上的異常數據是在某個區間內來回波動。當突然出現一個突刺時,通過聚合可以快速的查詢到問題所在。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a4/a4cf6d34e0e9de51304e53b334a5a22e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b0/b003d125c6450a1e856b2e83f793ea94.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過這些聚合條件,可以快速的發現這些突發異常的相似點。常用的聚合選項可以有 ip、ua、設備 id、URL等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如:一次線上的資源加載失敗報警中,發現異常日誌中的頁面URL、資源失敗URL、投放參數均不相同。排除了個別廣告頁加大投放流量的可能,也排除了機器腳本刷頁面的可能。最後通過 ip 聚合後,發現異常都是在某個地區,和CDN 同學反饋後,發現這個地區缺失存在網絡故障,及時的推動,避免了更大範圍的損失。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"完善基礎能力的支持","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前線上的 JS 都是壓縮之後的。一旦產生了異常信息,在異常堆棧中存儲的也是壓縮之後的信息,不便於問題的排查。因此,我們協同下游錯誤分析平臺,上傳託管頁相關的 sourcemap 資源。這樣在產生的 JS 執行異常中的報錯信息就可以通過 sourcemap 文件,直接定位到原始錯誤文件位置,方便開發者快速的定位到發生問題的代碼位置,提升問題排查效率。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"五、異常治理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面介紹的異常收集、異常報警以及異常的排查,偏重於一種被動的場景。觸發了線上報警,纔會介入問題的排查。但其實除了這些偶爾的突刺帶來的報警問題跟進,我們也主動出擊,針對線上現存的一些異常,探究一些通用的方案,以主動優化線上的異常場景,提升託管頁線上的穩定性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,爲了統一治理目標,協同各方一起處理線上異常,我們從以下幾步出發進行線上異常治理目標的設定:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.明確異常錯誤類型","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 JS 執行異常,API 異常,資源加載異常的基礎上,再次進行細分。最終落地4個異常類型,分別爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"JS 執行異常","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"API 異常","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖片資源加載異常","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SCRIPT 資源加載異常","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.清洗數據","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於託管落地頁運行的場景不一,爲排除一些測試數據或者網絡爬蟲數據的影響,在數據的篩選時只看來自商業流量的錯誤。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"線上已知的一些由於端上注入導致的一些不影響前端穩定性的異常錯誤信息,建立錯誤信息的黑名單,通過具體的錯誤信息,排除此類錯誤的干擾。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"3.建立合適的數據標準","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲抹平不同產品線之間的流量差異,我們提出了","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"單次廣告點擊產生的異常數","attrs":{}},{"type":"text","text":"的概念。將異常數量的絕對值變成了一個以廣告流量爲基準的相對值,以此來衡量不同流量產品線下的異常量。這樣歸一化之後,排除了廣告流量對異常數據量的影響。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對和","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"網絡情況","attrs":{}},{"type":"text","text":"相關指標比如:單次廣告點擊圖片/ SCRIPT 加載失敗數以及 API 請求失敗數","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"建立數據標準的流程是:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"給出基準時間段","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"計算出基準時間範圍內每天不同產品線的單次廣告點擊圖片/ SCRIPT 加載失敗數以及 API 請求失敗數,並給出80分位值","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"以基準時間內最小的一個80分位值作爲優化的目標。(可基於業務自行調整)","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"核心思路","attrs":{}},{"type":"text","text":":此類異常都是由於網絡原因導致的,不同產品線之間的值應該趨於一致。基於此,取80分位值作爲優化的基準線,沒有達到這個基準線的產品線除了網絡因素外一定存在其他的問題,可以推動這些產品線向這個統一的標準對齊。(爲了避免某一天的數據過於極端,可以考慮取平均值或者去除突刺數據後取最小值來得到最後的目標值)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對和運行時關係比較密切的指標:單次廣告點擊 JS 失敗數","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"建立數據標準的流程是:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"給出基準時間段","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"按照 errorKey,errorMessage 進行聚合並按照從大到小排序。在結果中找出是由於託管頁自身的 JS 執行時導致的異常。這些異常是預期可以被優化到0的異常。排除掉這些得到的最終值再除以落地頁的流量得到一天的單次廣告點擊 JS 執行異常的數據。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"以基準時間內最小的一個值作爲優化的目標。(可基於業務自行調整)","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"建立優化目標後,便可以針對性的優化。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對由於網絡原因導致的資源加載異常核心採取以下思路。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"改用 CDN 鏈接或減少資源大小可降低第一次加載失敗率","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"圖片使用 CDN 鏈接","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"圖片進行合理壓縮或使用更高壓縮比的圖片格式(如webp等)","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"重試可以降低最終資源加載失敗率","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們針對 API 請求異常、SCRIPT 加載異常、圖片加載異常分別從底層出發,建立了相應的重試機制。其中,除 SCRIPT 加載異常的重試業務方無感知外,API 請求異常以及圖片加載異常,業務方都可以通過傳入相關參數來進行業務表達,以支持不同的業務場景。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對 JS 執行時的異常,我們建立了一個完整的處理流程:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"從通用的異常監控中發現業務可優化異常;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"通過細化具體的監控條件,針對該業務異常建立單獨的監控;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"上線優化方案,處理該類異常;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"觀察監控數據下降是否符合預期。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過以上四步來優化每一類具體的業務異常。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過以上方式,我們設定了合適的目標,並進行了針對性的優化。最終,每個異常指標的數據在針對性治理後均有不同程度的下降,同時,在異常治理時引入線上實驗以衡量降低線上異常數對廣告轉化的影響,實驗結果表明:app 下載,以及線索的轉化均有所提升。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"六、後記","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"異常治理是一條難但正確的道路。在業務落地實踐中,遇到了很多問題和挑戰,我們完成了從0到1的過程,探索了一種可持續的前端異常監控與治理的方式,但是很多事情還需要深耕,這樣才能不斷的降低託管頁前端的異常數量,提升託管頁線上的穩定性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"---------- END ----------","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度 Geek 說","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度官方技術公衆號上線啦!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"技術乾貨 · 行業資訊 · 線上沙龍 · 行業大會","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"招聘信息 · 內推信息 · 技術書籍 · 百度周邊","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"歡迎各位同學關注","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章