新一代雲網採控之採集架構篇

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5G時代是雲和網相互融合的時代,涵蓋雲、網、端的“雲網產品融合,雲網一體化運營體系”。採集平臺定位於中國電信新一代雲網運營系統(OSS3.0)和中國移動5+2+N網管中臺(OSS4.0)的技術底座,提供全網全專業的採集控制服務,對接範圍包括各專業網管EMS/OMC、各網元實體、DPI數據平臺等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/22/22af185cae2d6924388284cf0bc825c5.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"面臨的挑戰","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於採集來說,各運營商的要求和各廠商的架構演進,都在向","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"“框架+服務/插件”","attrs":{}},{"type":"text","text":"模式演進。採集框架負責生成和調度任務,服務負責執行任務,兩者間難點和壓力也有較大不同。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可見出採集產品的難點、重點主要在服務/插件層面。其面臨的挑戰主要體現在:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"業務差異性:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採集流程環節差異性:不同專業的數據消費方對原始數據的處理要求不同,比如同一份無線性能數據,給網優系統只需要把數據解析後按文本文件提供,而給性能分析系統則要求進行歸一化後通過文本文件和HDFS提供;其次是內部處理不同,比如文件類的採集和解析是兩個不同環節,而網元指令類採集,爲提效會將採集和歸一化合並在同一個環節中。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採集方式差異性:專業不同,採集協議存在多種,主流包括CLI、TL1、SNMP、FTP/SFTP、Http、Socket、Telemetry等。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解析方式差異性:採集源提供的原始文件包括XML文件、CSV文件、JSON文件、消息流、指令結果等,不同的數據解析方式不一。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"輸出方式差異性:按照數據的分類,性能數據入文件系統或HDFS、配置數據需同時入文件系統和數據庫、告警消息入消息隊列或文件。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上述各種差異需要在具體項目落地的時候按業務需求分析來實現,這對產品的擴展性提出了較高的要求:","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"敏捷、高內聚、低耦合、高配置性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"工作可靠性:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採集是上層網管系統的底座,底座的可靠性對上層系統的應用效果有着決定性的影響。比如告警採集出現丟失、中斷,會影響故障系統的故障生成的完整度和根因分析的準確度;配置採集出現丟失,影響設備資源入庫入網,進而影響業務開通無對應資源使用或資源分配錯誤。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"處理高效性:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設備告警的產生具有不可預知的因素,可能在某個時間突增甚至是形成告警風暴,風暴產生時候的流量會比平時提升1~2個數量級,但同時告警處理時延要求並不能降低(必須要保證3秒內的時延),否則出現阻塞會影響端到端故障的生成時延。再比如配置數據採集處理延時會影響前端操作感知,試想下裝維人員在現場給設備加電通網後還得等幾分鐘才能在系統上看到設備這會是什麼樣的體驗。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"透明化:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單次採集過程的時間跨度從1秒鐘到2分鐘不等,期間的處理過程屬於純後臺計算,如果我們只提供任務出入時的信息,過程中黑盒運行會讓用戶缺乏安全感,誰知道是運行中還是宕機了呢。同時採集數量巨大,在採集過程中我們也不太可能靠人力去分析每筆採集任務的完成質量,我們該如何用一種簡潔的模式來判斷採集完成的質量呢。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"低成本:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"熟悉運營商OSS圈子的同學們都知道,近年來甲方在單系統/項目上的投資逐年減少,另一方面乙方又要求提升利潤率/端到端人效,這相互間的矛盾該如何解決?","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"降低產品的交付和運維難度成本是一個非常重要的方向。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"架構思路","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面我們也談到,採集產品的難點、重點主要在服務/插件層面,接下來我們就針對“服務/插件”的架構進行設計思路的簡要分析。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1容器+N組件","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於高擴展性要求的挑戰,我們借鑑了電腦/網絡設備主機的架構思路,主機至少會有一個標準化架構的主板,用戶可以根據自己的需求在上面靈活安插各種功能規格的板卡。主板負責提供電力和數據交互,各種功能板卡實現業務功能,比如顯卡、聲卡、網卡、存儲卡等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在前面我們也分析了採集場景的多樣化,按主機架構思路,我們將各種採集過程中要直接用到的各種功能通過差異化的組件來實現,每個組件只關注各自範圍內的功能,不重疊,高內聚化。同時我們還提供一個採集服務容器,負責接收反饋採集任務和心跳狀態上報(外部交互),負責調度各組件的工作順序以及之間的子任務數據交換(內部交互)。組件只需實現標準的接口即可在容器中運行,這樣N種組件可編排出遠大於N種採集流程。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/48/482ed685f0bfca92caf51ef0295cb937.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在實際實現的時候,我們還會對這種容器+組件的架構模式進行升級。對於同一種採集源會將所用到的組件都封裝在一個線程組內,組件根據採集流程模板進行多線程實例化。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過這種組件按需組合的模式,可廣泛敏捷適應不同業務場景,用戶就像拼樂高一樣,只需掌握每個組件的特性,即可實現所需的採集場景,解決業務擴展的敏捷度問題。我們前面在採集流程差異化分析時候所舉例的場景,即可通過這種模式解決,如下圖所示。不同場景採集流程的環節是不同的,甚至還會有分支的情況。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e0/e044a062882a8787f1462ae306cf0d98.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"組件關係樹","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"曾經有幸與一位友商架構師聊過採集架構的設計,他說他們之前也做過類似的架構,但最終卻無法在生產上的落地,究其原因,是由於組件數量過於龐大,其採集產品提供了200+以上的組件,實施交付時開發人員對存量組件無法瞭解這些存量組件而選擇直接開發新組件,導致繼續膨脹。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"各專業數據採集的過程其實是有個大致的步驟:採集-解析-標準化-分發-通知。我們遵從OO設計模式,嚴格設計組件繼承關係,控制組件規模數量,降低實際應用複雜度。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f8/f885cf61770a453ecfae008f670a25c3.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根組件AbstractComponent衍生出6類抽象組件,再衍生出具體可用的功能組件。這樣需求開發人員只需在樹形結構中設計好每種組件的位置,只關注組件的開發(無需關注容器),儘可能繼承父類功能,最大限度的代碼複用,減少新功能組件的開發難度和時間,解決業務擴展敏捷和成本的問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"環節組件異步任務調度","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在容器中各組件是相互獨立且並行工作的,它們就像工廠流水線的工位,只負責完成各自工位上的工作內容,那麼就會涉及工件在各工位上的傳遞。爲避免組件任務的前後等待導致的阻塞,我們採用了異步任務調度模式。採集組件間要傳遞的內容。我們將其分成兩類:工作半成品/成品和任務消息通知。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/09/0985768cd774adaf1a4924d3dc9ed1a4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於半成品/成品,如果顆粒度較大,我們會採用本地文件的方式(搭配高速固態硬盤);如果顆粒度較小,我們會採用REDIS方式;如果顆粒度非常小甚至可以合併到消息隊列中。對於消息可以採用MQ/KAFKA,甚至是JVM中的內存隊列(搭配任務補償和消息補採機制應對小概率的異常中斷情況)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種異步任務調度模式,可有效縮短單筆採集任務的執行耗時,提升整體採集效率。在後面的大文件採集分析裏面,我們會基於這種方式對採集組件進一步的升級。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"可視化流程設計器","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了讓採集功能組件更加容易編排設計出採集流程,我們提供了一個可視化採集流程設計器,在設計器中對已註冊的組件分類管理,用戶可以將這些組件拖拉拽到設計區形成採集環節,並加入環節連線來確定各個組件的工作順序,對環節關聯的組件按業務需要設置各種擴展參數。設計器的參考實現如下圖所示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e3/e39e483c77e213dc798421df72188407.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在實際應用中,採集流程大部分是可複用的,我們會針對常用的消息採集(告警或性能主動推送)、文件採集(OMC/網元已提前生成數據文件)、指令採集(實時調用OMC/網元查詢指令實時獲取實時指標值)這些場景,再結合各採集源提供的接口類型,提前設計十幾種常用流程模板,用戶遇到特殊場景(如需要多分支處理、需要持久化等)可基於這些流程模板通過複製功能生成新流程模板後簡單修改即可。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"標準化日誌","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面說過採集任務的執行時間從1~+∞秒,過程如黑盒運行對用戶來說是非常可怕的,這要求在採集服務執行過程中要不斷地輸出事件日誌。這些事件不僅需要覆蓋全,還要易於理解,降低使用過程中項目一線和研發後端的溝通成本,就像ORCLE在使用的時候我們只要說出ORA-00001就知道該怎麼處理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此我們整理了一套系統事件規格,將其作爲開發規範(特別是組件),在容器和組件實現的時候埋點,如出現異常級別的事件可根據不同的事件類型(錯誤碼)提供不同的自動或手動的處理工具。事件規格表的部分實現參考如下表。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/35/3568901f3cb63e3e98f2ad85392ba6b3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過這種方式,給項目一線規範了運維操作,在前線和後端建立了標準的“行話”,後續再結合WOODY自動化運維工具,同時解決了透明化和低成本運維的問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"異常處理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採集平臺屬於後臺靜默運行的系統,最佳運行狀態就是沒人感知到它的存在,這要求其運行具有相當的穩定性,能自動處理各種異常情況。這些異常情況我們從來源方面可大致分成三類:業務功能類,軟件架構類(非業務功能)和運行環境類(IAAS、PAAS和網絡等)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a5/a5e1e74a8426657691328a555b901236.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務功能類的異常處理需要在組件實現的時候根據業務分析,上面只是我們FTP組件的樣例,在其它的組件中都需要獨立分析實現,結合標準化日誌將信息拋向前端。這種能力是隨着組件的擴展在不斷的擴展,由於我們組件是基於OO模式可繼承擴展的,因此其業務異常處理能力也在不斷的積累。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於軟件架構類的異常情況,我們將多個服務實例(容器)組合成一個抽象服務,在實例心跳中斷或任務超時的情況下,可轉移至其它節點執行(具體可見部署部分)。但這裏需要注意的是,消息類採集任務轉移或重啓後,可能存在消息丟失,需要在組件上實現補償功能。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於運行環境類問題,由於PAAS平臺和IAAS服務已雲化,基本是基礎設施部門統一提供和問題處理,不在這裏詳細分析。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"規範流程上線","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"組件研發到流程設計到加載應用遵循以下步驟:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6e/6e29b1085f96d835c777365a7a0417be.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"組件需求分析:根據需求設計流程,分析是否要開發新組件以及新組件的位置。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"組件開發實現:重寫或複用父類的init(初始化),實現dealMessag(實現新的功能邏輯)、dealDataIn(獲取輸入)、dealDataFinish(輸出完成)、judgeDataOut(判斷是否完成)方法。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"組件註冊:完成組件開發後,打包部署到環境。運維人員在組件管理中進行新組件註冊,填寫配置組件編碼/名稱、組件類型、類路徑、組件擴展參數等。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採集流程設計:運維人員設計採集流程,配置採集環節關聯功能組件,配置組件參數。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採集綁定類型:設置流程應用範圍包括網元廠商型號版本。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"典型場景","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"文件採集","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過FTP採集文件是最常見的採集場景之一,一般情況下單個文件大小在M之內,使用上述架構方案處理只需要單環節的組件支持多實例多線程即可,採集和計算轉換處理時長也會在10秒內完成。但在採集無線專業的性能文件會碰到一種超大文件採集場景,單文件在100M左右,文件內包含10W行(基站小區)記錄,性能指標1K左右,如果常規處理JVM內存佔用將近1G處理時長在10分鐘左右,在多OMC採集會造成內存溢出錯誤,錯峯採集處理又會造成無線性能指標輸出延時,影響最終的網優性能分析及時率。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對超大大文件的處理,我們利用磁盤內存,將大文件化整爲零,全程處理使用文件碎片,最後按序進行結果合併,類似於TCP報文分拆成多個IP報文同時傳送,能降低了服務內存佔用且保證了服務處理的性能。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce4b388c900462cc8cbff4257f0c1d64.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文件切割解析的邏輯是,當源文件超過一定大小時,將源文件切割成多個小文件進行處理,存入磁盤;同一時間只讀入並處理兩個小文件,以此降低服務內存的使用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"消息採集","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息採集屬於常駐類任務,任務單會觸發服務(可能是服務端或客戶端)向採集源訂閱消息,然後接收源主動推送過來的消息,這裏最常見的消息採集就是告警採集。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/76/765e532314ac15291136eb32e017c96a.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在告警採集流程中,由於組件間處理的告警消息顆粒度很小,其任務通知消息和處理成品我們都通過KAFKA進行傳遞。此外告警採集需要保證不丟失和風暴出現時可平穩、有序的處理,兩者缺一不可。在現有的告警處理流程中,我們通過風暴識別攔截、提取告警級別、延遲派發過程進行告警風暴的處理;通過活動告警同步、歷史告警同步保證告警連續性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"告警風暴:","attrs":{}},{"type":"text","text":"在告警正常的處理過程中,如果上1分鐘內告警量達到風暴值,後續告警消息處理時,會先提取出高級別告警進行處理和派發,其它告警消息寫入FQ緩存延遲派發操作。如果高級別告警輸出流量小於閥值,按FIFO原則從FQ提取處理直到整體輸出流量達到閥值。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"告警連續性:","attrs":{}},{"type":"text","text":"每次告警輸出都需進行告警流水號的比對,從而判斷告警是否連續,當告警出現不連續時,可通過活動告警同步、歷史告警同步,進行告警數據的二次處理,獲取丟失的告警數據,並進行正常的告警處理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"部署方案","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面主要分析了整個服務/插件的架構設計,基於這樣的架構思路會非常方便實際生產應用的部署。我們可以將服務部署模式分成以下幾類,每類搭配上特定的硬件要求,讓這種架構優勢能充分發揮出來:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"超大/大顆粒度文件採集:高速固態硬盤","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"密集型文件採集/告警採集:高速網絡(與REDIS及消息中間件高頻交互)","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"複雜指令型採集:多CPU","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在實際應用中,我們不僅會按上述情況來選擇搭配服務/插件的部署環境,還會將不同專業的服務部署進行隔離,避免網絡連接的安全隱患。同時多個採集實例會組合成一個採集服務,不僅是任務均衡分擔,當出現實例異常情況將任務轉移到正常實例中,當任務超時時可重新派發執行任務,保障業務採集高可靠性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8f/8f3fbb508ebf9f2fd3d07acb3dffcf7b.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"應用成效","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2020年移動某省的統一採集項目是新架構版本的第一個實踐項目,共涉及3大專業(無線、核心網、傳輸)6個廠商共計219個OMC,按採集場景歸類90+。通過“容器+組件”架構,快速配置出各場景採集流程,成功解決了業務擴展的流程差異、採集方式差異、解析方式差異和輸出差異問題,整個項目實施不到60天就完成全部專業採集加載上線。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過組件多實例及異步任務調度技術,單採集源的告警類消息採集具備6K筆消息/秒,時延<1秒的採集能力,如果是多采集源則一臺8C虛機的吞吐量能達到3W筆消息/秒,性能比原有采集系統提升了10倍;配置/性能類文件採集處理輸出達到36G/小時,效率比原有采集系統提升了5倍,4G無線性能60M大文件採集耗時2分鐘,效率比原系統提升了3倍。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"項目上線運行幾個月的時間,實現穩定的靜默運行,用戶日常只關注系統提供了每日數據質量報告。由於系統運行穩定,運維簡單且自動,運維投入人員只需0.5人。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在移動另一個省的多雲管理項目中,該架構繼續大放光彩,雖然從採集模式、數據處理和輸出上有較大差別,但僅通過多雲組件擴展就實現了複雜的多雲採集業務處理,服務容器保持穩定不變。多雲採集業務實現較爲複雜,受限篇幅這裏就不展開詳細分析,我們將在後續的採控產品文章中再做設計分享。","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章