百分點大數據技術團隊:輿情平臺架構實踐與演進

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"編者按"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現代社會每天都有大量信息產生,抖音、小紅書等自媒體的普及,不斷豐富着人們表達看法、傳播訴求、分享信息的渠道和形式。如何完成多源異構數據的收集和處理,挖掘海量信息中的價值,洞察事件背後的觀點和情緒,是做好政府和企業輿情監測工作不可忽視的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百分點輿情洞察系統(Mediaforce)是一款面向政企客戶的輿情監測SaaS 產品,自2014年上線至今,已累計服務客戶近萬家,積累了逾20 PB的全網數據,通過構建豐富的上層應用,爲客戶提供精準、實時、全面、多維度的洞察服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文從底層數據治理、上層應用架構,以及數據個性化和智能化角度,分享了大數據平臺架構、AI平臺架構和微服務架構在輿情產品上的實踐。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、平臺架構簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"伴隨着互聯網內容形態的蓬勃發展,Mediaforce 平臺數據量增長迅速,在產品創新和迭代過程中,自身平臺架構也在不斷的演進。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"互聯網輿情本質上是對互聯網公開信息的採集、分析、研判,併產生業務價值,是一個價值數據挖掘的過程,我們覆蓋了90%以上的網絡公開數據,包含但不限於以下信源:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在線新聞、報刊、貼吧、博客、論壇、微博、微信、APP客戶端;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"電視、廣播等;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"社交自媒體:抖音、快手、小紅書等。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百分點科技通過對以上數據進行存儲、挖掘、可視化分析等一系列處理,最終爲用戶呈現多終端觸達、一站式的輿情監測和價值分析平臺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到目前爲止,大體分爲如下三個平臺架構,對應職責如下:"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"大數據平臺架構"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據共享:統一業務數據存儲,結合業務實際場景對數據進行關聯使用,避免數據重複存儲,降低溝通成本;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務共享:統一服務架構,避免服務孤島,統一服務的訪問入口和訪問規則;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"易於使用:通過平臺服務和工具的形式暴露平臺能力,屏蔽平臺底層細節。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"AI平臺架構"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據層:以平臺化能力應對數據收集、數據準備等繁重工作,同時結合業務,構建數據流轉閉環;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"深度學習平臺層:實現多租戶及彈性的資源分配、模型庫擴展、可視化訓練和調整、滾動更新等能力;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用和工具層:藉助Rest\\Grpc模型開放能力,對接金融領域輿情、定製化行業標籤、離線數據預測等場景。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"微服務架構"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"拆分:按照業務垂直拆分和功能水平拆分的總原則,以及從業務側儘量規避分佈式事務等考慮;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雲原生:減少微服務架構的運維成本,藉助容器化技術,實現資源動態感知、擴縮容等特性。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、大數據平臺架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百分點輿情洞察系統最初是通過自主構建IDC來支撐,IaaS層由單獨的運維團隊來進行維護。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大數據平臺(IaaS層除外)分層如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0b\/0b7f4b651ed24e75ff094c62f0aae683.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"輿情的數據應用場景不同於海量日誌、海量商品檢索等的側重於簡單標籤聚合,輿情應用完全基於自然語言全文檢索,同時結合內存複雜聚合計算。爲了保證檢索準確率,往往會配置複雜的關鍵詞和距離限定,因此對於檢索引擎的內存優化策略要求很高。可以說,數據存儲和檢索架構的升級,是輿情業務的核心之一。在百分點科技大數據平臺架構演進歷程中,"},{"type":"text","marks":[{"type":"strong"}],"text":"大致可以分爲三個階段:業務共享數據倉庫階段、業務自建數據集市階段、湖倉一體階段。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 共享數據倉庫階段"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在業務規模初期,大部分精力集中於業務系統的迭代和開發,採用共享數據倉庫的解決方案。流程如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c7\/c7d869006d62eea1bf00c48b68694561.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到,隨着客戶規模和數據量的增大,以及業務複雜度的提升,僅僅依靠共享的數據倉庫,已經無法滿足需求。產生的主要問題如下:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務側查詢響應時長無法保證;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"複雜查詢以及聚合操作,加重Elasticsearch Cluster負擔,甚至引起節點OOM;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"冷熱數據未分離。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 自建數據集市階段"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着客戶量及數據量的增多,百分點科技對數據倉庫進行了冷熱數據隔離,並通過自主構建數據集市來滿足業務的快速響應。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/6a\/6ac998e5cc0e1f99cc005f1ed7f6fee4.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面將從數據倉庫層、數據集市層進行介紹。"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ES Cluster從2.3.4升級到6.0.0(當時最新版本);"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據倉庫核心做了冷熱數據分離,熱數據使用SSD硬盤存儲,且只存儲近一週數據,冷數據使用HDD硬盤,存儲近兩年數據,互聯網數據具有良好的時序性,按天拆分,在保證集羣運維便利的同時,滿足數據變更\\刪除的業務需求;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據集市以業務最小查詢單位-話題爲粒度進行拆分和構建,可以認爲是將上層業務需要的結果,預計算存儲至數據集市層,這樣業務查詢只需查詢自己獨有的庫便可以進行分析和響應,其中需要相對複雜的機制保障數據一致性,這裏不做介紹。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"調整後,業務查詢響應延遲基本可控,並且具有良好的隔離性。但同時也面臨着下述挑戰:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"離線數據(2年以上歷史數據)以HDFS爲存儲介質,不支持更新、無法查詢複用;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在目前數據集市層的拆分力度下,由於業務邏輯複雜性,需要藉助內存計算,在以年爲跨度查詢週期,顯得力不從心;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據集市層實時數據的計算具有一定的延遲,需要保留熱數據集羣來支持實時數據的查詢,架構不夠優雅。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 湖倉一體化階段"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着輿情在客戶羣中深入使用,在保證查詢低延遲的情況下,需要能支撐3~5年的長跨度數據檢索。同時爲應對SaaS產品矩陣的擴充,需要易用、可擴展的數據平臺支撐。本次架構優化的核心目標爲:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"低響應延遲下,大跨度查詢可擴展至3~5年(秒級);"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"靈活的爲其他業務應用做好平臺支撐,加強ODS、DW建設;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"減少ES Cluster數據冗餘;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡化數據集市層計算鏈路,提高數據時效性。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.1 數據集市層"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對客戶和線上日誌進分析,得到如下結果:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)客戶數據量級"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/29\/2977ea795599a55e055a928491abb323.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對線上客戶數據量進行採樣,統計一年數據量,千萬級數據量的客戶羣體佔1%。所以我們將目標定義爲千萬級數據量下的,複雜聚合查詢分析響應時長在3~5秒內。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2)查詢類型統計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"藉助數據集市,將大量的依據全文檢索聚合統計分析場景轉化爲OLAP場景。對線上日誌進行分析,二次全文檢索查詢流量佔比不到20%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/6d\/6db4b282e09311107a672f05eccca32d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"依據上述結論,將數據集市層要解決的問題進行彙總如下:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"80%查詢是OLAP場景,20%查詢是全文檢索;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要支持實時更新;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據規模支持千萬級別,並支持擴展;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢響應時長在3~5秒。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通常來說,面對海量數據的低成本存儲+高效檢索的需求,業界通常使用HBase+ Elasticsearch的組合方案,但該方案除了開發維護複雜、數據一致性弱等常見問題,通常還要由Elasticsearch來承擔OLAP,以及全文檢索的功能職責。對於重OLAP查詢場景,使用MPP查詢引擎往往能獲得較低的查詢延遲,如:Clickhouse、DorisDB等。在考慮支持實時更新等多種條件下,我們將方案集中於Elasticsearch、TiDB+ Elasticsearch、DorisDB+Elasticsearch三種技術進行嘗試:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Elasticsearch"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ES是一款面向OLAP場景的全文檢索分析引擎,下面是在Elasticsearch 7.8.0環境中的測試:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)集羣環境"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/69\/6920aa8ae804077d485f3ae754ee8365.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2)測試索引"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用單shard、無副本、百萬級別索引32個,十萬級別索引18個。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3)測試結論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將客戶端併發數等價於索引數目,持續20輪進行壓測。對業務進行抽象,選取如下測試用例:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n{\"size\":0,\"query\":{\"bool\":{\"filter\":[{\"bool\":{\"adjust_pure_negative\":true,\"boost\":1}},{\"range\":{\"pubTime\":{\"from\":1551430186000,\"to\":1615366186000,\"include_lower\":true,\"include_upper\":true,\"boost\":1}}},{\"bool\":{\"adjust_pure_negative\":true,\"boost\":1}}],\"must_not\":[{\"term\":{\"mask\":{\"value\":true,\"boost\":1}}}],\"adjust_pure_negative\":true,\"boost\":1}},\"track_total_hits\":2147483647,\"aggregations\":{\"termsAgg\":{\"terms\":{\"field\":\"titleSimHash\",\"size\":2000,\"min_doc_count\":1,\"shard_min_doc_count\":0,\"show_term_doc_count_error\":false,\"order\":[{\"_count\":\"desc\"},{\"_key\":\"asc\"}]}},\"carAgg\":{\"cardinality\":{\"field\":\"titleSimHash\",\"precision_threshold\":10000}}}}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/21\/211f3cfc82da172723aa22958ac03922.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"測試中發現集羣相對穩定,相對於單線程,多線程下的平均延遲高於1s也較少。在Elasticsearch6.0.0上進行相同的測試,其中平均延遲延遲高於1s佔80%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"TiDB+Elasticsearch"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"TiDB 4.0版本已經是一款HTAP混合型分析引擎,將測試數據集限定爲千萬級,在測試中設置:tidb_hashagg_final_concurrency=20和tidb_hashagg_partial_concurrency = 20,平均耗時穩定在 8s~9s。由於聚合後的基數較大,壓力都集中在TiDB側,未能達到去ES的OLAP的場景。更多信息請參照AskTUG:"},{"type":"text","marks":[{"type":"underline"}],"text":"千萬級數據group by性能調優"},{"type":"sup","content":[{"type":"text","text":"[1]"}]},{"type":"text","text":"。隨着TiDB 5.0發佈,TiFlash已經不僅僅是一個列式存儲引擎這麼簡單。TiFlash引入了MPP模式,使得整個TiFlash從單純的存儲節點升級成爲一個全功能的分析引擎。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[1] https:\/\/asktug.com\/t\/topic\/68474\/1"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"DorisDB+Elasticsearch"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Mpp引擎列式存儲設計對於數據更新是極其不友好的。藉助DorisDB的更新模型引擎,內部通過版本號,可以支持大規模的數據實時更新,當然在查詢時需要完成多版合併。同時Doris-On-ES將Doris的分佈式查詢規劃能力和ES(Elasticsearch)的全文檢索能力相結合,提供更完善的OLAP分析場景解決方案。目前Doris On ES不支持聚合操作如sum,avg, min\/max 等下推,計算方式是批量流式的從ES獲取所有滿足條件的文檔,然後在Doris中進行計算。在測試場場景下,性能是可以滿足OLAP場景。實踐中發現,由於自建IDC機器較爲老舊,無法支持SIMD指令,致使無法安裝DorisDB。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在目前的業務場景下,百分點科技最終選擇單一的Elasticsearch來作爲數據集市層的存儲和計算引擎。後續如果數據集市有更大的數據量以及業務低延遲的OLAP查詢場景,還是會考慮結合MPP查詢引擎來滿足業務的擴展。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.2 數據倉庫層"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在之前的很長一段時間內,Elasticsearch Cluster承擔了大量數倉的職能。通過多集羣進行冷熱數據隔離。在本次調整中,百分點科技藉助索引生命週期管理(ILM)和Hot\\Warm架構來實現在一個集羣中進行數據的管理。在實踐中,我們將Elasticsearch率先升級到7.12.0,以滿足向量化檢索等更多場景。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.3 源數據層"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之前會將採集的數據存儲至kafka,作爲數據傳輸中轉。但kafka一般存儲的時間週期較短,且功能單一。因此需要一套統一的存儲計算平臺,需要滿足如下要求:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"全量的離線數據是通過ES-Hadoop進行按天備份,後續的變更就無法做到同步,複用性、靈活性較差;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖片、音視頻等非結構化數據的接入,需要方便與上層機器學習應用深度融合;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"輔助數據倉庫,構建數據集市,保證實時性。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在最新的架構中,百分點科技將數據先入湖,構建ODS,輔助構建上層DW和DM。關於Data Lake,最終選取Hudi作爲源數據層存儲計算方案,並做了以下嘗試:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Iceberg"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Iceberg工程架構具有極高的抽象,可以與各種引擎無縫融合。字符串模糊匹配是一種重要場景,測試中遇到以下問題:如果某個字段存儲爲空字符串,在匹配中就會出現異常:"},{"type":"text","marks":[{"type":"underline"}],"text":"java.lang.IllegalArgumentException: Truncate length should be positive"},{"type":"sup","content":[{"type":"text","text":"[2]"}]},{"type":"text","text":"。另外就是查詢對Stream相關支持還處於開發階段,對於增量數據處理只能以Java Api方式實現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[2] https:\/\/github.com\/apache\/iceberg\/issues\/2065"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Hudi"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hudi顯得尤爲成熟,但是與 Spark 引擎綁定的較爲緊密。在Hudi 0.6中對底層代碼進行抽象,以適配Flink等主流計算引擎。同時其完善的增量查詢機制非常適合實時數據集市的構建。另外Hudi Table並不需要提前創建,可以在寫入數據時自動創建,這也是區別於Iceberg的一個點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hudi的引入,爲底層數據平臺帶來了ACID能力,並且提供較好實時性。特別是爲數據集市實時數據構建帶來便捷,提供可擴展性。目前的簡易數據架構如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1a\/1a438e9b754352a3e3deadff4db27a0e.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、AI平臺架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在海量的文本數據上,利用豐富的數據挖掘、深度學習、人工智能算法,訓練在線和離線語義模型,一站式挖掘滿足客戶需要的輿情分析需求。在這一歷程中,大致分爲兩個階段:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文本分析平臺:將通用文本能力服務化;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"深度學習建模平臺:高效、易用、低門檻的模型定製開發平臺。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在上述演進中,最主要的變化在於各行各業都已經積累了較多的高價值數據,並且越來越需要定製滿足自己場景的個性化模型。下面主要從這兩個階段分別展開對應的工作。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"文本分析平臺"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在輿情分析場景中,依賴於分詞、詞性、新詞發現、命名實體、主體分類、文本聚類、關鍵詞提取、自動摘要、文本去重、情感分析、內容轉換(簡繁、拼音)、自動糾錯、自動補全、文檔解析等各種功能。產品架構和數據流程如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3b\/3beae2980355c57f9301afe93002b4ff.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"深度學習建模平臺"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着深度遷移學習成熟和行業應用,帶來最大的益處在於可以依據少量的訓練數據便可以得到較好的訓練結果。從下述對比中:可以看到Bert在少訓練集下就能達到較好的結果,也爲後續的定製化模型奠定了基礎。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/33\/33d97fd29df06d496dc1a500e28811bb.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"輿情繫統本身可以看作爲信息工程架構,客戶可以容忍數據精準度,但是不允許相同的數據持續犯錯。可學習、可持續、可定製已經變的尤爲重要。這也是深度學習建模平臺的由來。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面是整體的業務架構和流程分析,具體技術細節可參照:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MjM5MzI5NjY2MA==&mid=2653787712&idx=1&sn=359e5152af83f2f5864f0c84c8abc33b&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"NLP模型開發平臺在輿情分析中的設計和實踐"}],"marks":[{"type":"underline"}]},{"type":"text","text":"。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/2e\/2e906bba9f05c52a02e22fc6d21ab646.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、微服務架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面對互聯網架構演進之路進行總結如下,其中帶顏色標記的爲實踐中的產物。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/72\/722a9939741351001b57f49be8bad3dc.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"輿情業務應用系統從最核心幾個業務功能,目前已經擴展至幾十個業務模塊。同時藉助成熟的底層模塊,快速沉澱出金融輿情、行業版等衆多項目。大致經過以下三個階段。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 單體架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在業務初期,使用SpringBoot作爲單體應用開發程序,可極大加快業務推進速度,簡易架構如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/05\/05317e434b7de0819128ed16cfc50233.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單體架構的優點在於其易開發、易測試、易部署、易擴展,但是業務耦合嚴重,也爲業務擴展、服務治理帶來了新的挑戰。例如:登錄服務和查詢服務在一個單體應用中,因爲查詢服務是一個耗內存的操作,高峯時會引起FullGC,致使登錄功能異常。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 微服務架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"微服務可以定義如下:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"⼀種架構⻛格,將單體應⽤劃分成⼀組⼩的服務,服務之間相互協作,實現業務功能。每個服務運⾏在獨⽴的進程中,服務間採⽤輕量級的通信機制協作(通常是HTTP\/JSON);"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個服務圍繞業務能⼒進⾏構建,並且能夠通過⾃動化機制獨⽴地部署;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"很少有集中式的服務管理,每個服務可以使⽤不同的語⾔開發,使⽤不同的存儲技術;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參考:https:\/\/www.martinfowler.com\/articles\/microservices.html。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着業務擴展,業務耦合嚴重,開發效率低下、排查問題困難等。秉承業務維度垂直拆分和功能維度水平拆分的原則,同時儘量避免分佈式事務等複雜度問題。拆分後架構圖如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/61\/618b9cd581a5c3dc098b691f582c690b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"微服務拆分功效:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務邏輯層:拆分後服務模塊30+;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"監控體系建立:日誌監控、Metrics監控、調用鏈監控、告警系統、健康檢查;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"配置中心:靈活可視化的配置管理中心;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開發效率、團隊協作能力提升。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 雲原生架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雲原生包含了一組應用的模式,用於幫助企業快速,持續,可靠,規模化的交付業務軟件。其特點如下:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容器化封裝:以容器爲基礎,提高整體開發水平,形成代碼和組件重用,簡化雲原生應用程序的維護,在容器中運行應用程序和進程,並作爲應用程序部署的獨立單元,實現高水平資源隔離;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"動態管理:通過集中式的編排調度系統來動態的管理和調度;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"面向微服務:明確服務間的依賴,互相解耦。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"藉助百分點科技內部雲平臺,將微服務結構容器化封裝,極大的降低了部署、運維的成本,也爲服務的穩定性增加了保證機制。下面主要介紹一下雲平臺的基礎概念和應用成效。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"平臺基礎概念:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"命名空間"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"管理常規用戶的資源訪問權限的中央載體,讓一組用戶組織和管理他們的內容,並與其它羣體區隔開來。是用戶賬號的唯一公共URL訪問地址。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Docker容器爲資源分割和調度的基本單位,封裝整個軟件運行時的環境,爲開發者和管理員設計的,用於構建、發佈和運行分佈式應用平臺。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"鏡像"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"含有啓動Docker容器所需的文件系統結構及其內容,因此是啓動一個Docker容器的基礎。採用分層的結構構建。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"項目"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過標籤標識的多個版本的鏡像組成。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"構建"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將輸入參數轉換爲結果對象的過程;通常用於將輸入參數或源代碼轉換爲可運行的鏡像從構建鏡像創建Docker容器並將它們推送到集成的容器鏡像倉庫(Harbor)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"S2I構建:通過注入應用源代碼到Docker鏡像並且組建新的Docker鏡像來生成可運行的鏡像新鏡像中融合基礎鏡像和構建的源代碼,並可搭配docker run命令使用。S2I支持遞增構建,可重複利用以前的下載依賴項和過去構建的構件等。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"平臺部署應用的最小單位,一個服務爲一個功能單元,如mysql數據庫服務。是定義容器實例的邏輯集合以及訪問它們的策略,一個服務至少包含一個容器實例,服務通常用於爲一組相似的容器提供永久IP。在內部,服務在被訪問時實行負載均衡並代理到相應的支持容器實例,可以在服務中任意添加或者刪除支持容器,而一直保持服務可用。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"配額"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在同一個命名空間內可以創建的最大對象資源數量,以及每個容器請求的計算\/內存\/存儲資源。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"高級編排"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"編排模板:描述可以參數化和處理一系列對象,生成的服務、構建配置和部署配置。可以爲開發人員即時創建可部署的應用。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"平臺資源對象層級關係:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c9\/c925713bcd0ebc8d65df6a17e2a2119a.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"目前平臺代碼構建支持三種模式:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d4\/d4310647043c195c6f10d18b706ea11b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"智能構建基於平臺所提供的Builder鏡像,自動下載應用源碼進行編譯。在基礎鏡像之上,自動編譯代碼。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dockerfile構建"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶自己編寫Dockerfile,指定代碼庫、Dockerfile位置及代碼分支後可以構建項目鏡像。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自定義的Dockerfile,可以指定自定義基礎鏡像以及編譯環境變量、配置信息等構建出更復雜的編譯或運行環境,構建靈活性相比前者更高。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Push構建"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過平臺提供的push構建流程,將本地定製化鏡像上傳到鏡像倉庫,導入後的鏡像可以在平臺中進行部署、調試、使用。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"平臺Scale功能包含水平伸縮和垂直伸縮,以下是水平伸縮的例子:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/23\/237934a8c173a8df6b5a926e29593f97.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"平臺提供容器實例監控,可以按照時間區間圖形化展示容器的CPU、內存和網絡的使用情況:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d1\/d1c9c77b1e8fcaaafa8a25faa7a63e3d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"企業SaaS一般是圍繞獲客、轉化、留存這三個階段展開,平臺的易用性、數據的準確性和實時性等都是客戶留存的核心要素。在多年的實踐中,大數據架構以數據湖爲ODS層,來保證對原始數據高效、靈活的處理,同時爲其他業務線開放數據處理能力。AI平臺架構提供一套端到端的閉環流水線,打造個性化、智能化的業務。微服務架構通過容器化,極大的降低維護成本,同時保證線上穩定性。隨着SaaS產品矩陣的擴充,百分點科技在金融輿情、企業品牌監測等多個方向進行積極嘗試,底層平臺架構在業務的快速落地中起到了重要作用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:百分點大數據團隊(ID:baifendian_com)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/DWzemgVoH1Z_ltArspLhJQ","title":"xxx","type":null},"content":[{"type":"text","text":"百分點大數據技術團隊:輿情平臺架構實踐與演進"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章