數據治理“PAI”實施方法論

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"編者按"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據作爲第五大生產要素,已逐漸成爲政府和企業決策的重要手段與依據。面對數據多樣化、數據需求個性化、數據應用智能化的需求,以及在2B和2G行業中數據質量參差不齊、數據應用難以發揮價值、數據資產難以沉澱等問題,如何做好數據治理工作、提升數據治理能力成爲了政府和企業數字化轉型的重中之重。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百分點大數據技術團隊基於多年的數據治理項目經驗,總結了一套做好數據治理工作及提升數據治理能力的實施方法論。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近年來,推動數據治理體系建設一直是業界探索的熱點,另外,《中共中央、國務院關於構建更加完善的要素市場化配置體制機制的意見》將數據作爲第五大生產要素提出意義非同一般。但與勞動力等生產要素不同的是,數據是無形的,且數據孤島林立,要想發揮數據價值,提升數據治理能力是必要舉措。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百分點結合多年政府各個部門及各類企業數據治理項目經驗,提出數據治理項目開展過程中數據治理平臺應具備4大能力:聚、治、通、用,以及項目實施總體指導思想:PDCA。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/93\/57\/93c7e72a14c846b9e8e5004f0298a457.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"四大能力建設"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"聚"},{"type":"text","text":": 數據匯聚能力,面對數據來源各異,數據類型紛繁多樣,數據時效要求不一等各類情況,數據治理首先能把各類數據接入到平臺中,“進的來”是第一步。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"治"},{"type":"text","text":": 狹義數據治理能力,包括數據標準、數據質量、元數據、數據安全、數據生命週期、主數據。核心是保證數據標準的統一、藉助元數據掌握數據資產分佈情況及影響分析和血緣關係、數據質量地持續提升、數據資產的安全可靠、數據資產的淘汰銷燬機制以及核心主數據的統一及使用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"通"},{"type":"text","text":": 數據拉通整合能力,原始業務數據分散在各業務系統中,數據組織是以滿足業務流轉爲前提。後續數據需求是根據實際業務對象開展而非各業務系統,所以需要根據業務實體重新組織數據。比如政府單位針對人的綜合分析通常會涉及:財產、教育程度、五險一金、繳稅、家庭成員等,需要以身份證號拉通房管局、交通局、教育局、人社局、稅務局、衛健委等多個委辦局數據。數據拉通整合能力是後續滿足多樣化需求分析的基礎,是數據資產積累沉澱的根基,也是平臺建設的另一個重點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"用"},{"type":"text","text":": 數據服務能力,數據資產只有真正賦能於前端業務才能發揮實際效用,所以如何讓業務部門快速找到並便利的使用所需數據資產是數據治理平臺的另一項核心能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"P:plan,標準、規劃、流程制定;D:do,產品工具輔助落地;C:check,業務技術雙重檢查保證;A:action,持續優化提升數據質量及服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/e9\/09\/e9e9c34c2fcf08718d691b9cc0666b09.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結合數據治理項目實際落地實施過程以四大能力構建、PDCA實施指導思想提出了“PAI”實施方法論,即流程化(process-oriented)、自動化(automation)、智能化(intelligence)三化論,以逐步遞進方式不斷提升數據治理能力,爲政府和企業後續的數據賦能業務及數據催生業務創新打下堅實基礎。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"流程化將數據治理項目執行過程進行流程化梳理,同時規範流程節點中的標準輸入輸出,並將標準輸入輸出模板化。另外對各流程節點的重點注意事項進行提示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自動化針對流程化之後的相關節點及標準輸入輸出進行自動化開發,減輕人力負擔,讓大家將精力放在業務層面及新技術拓展上,避免重複人力工作。如自動化數據接入及自動化腳本開發等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"智能化針對新項目或是新領域結合歷史項目經驗及沉澱給出推薦內容,比如模型創建、數據質量稽覈規則等。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、數據治理流程化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因數據治理類項目通常採用瀑布式開發模式,核心流程包含:需求、設計、開發、測試、上線等階段,流程化是將交付流程步驟進行詳細分解並對項目組及客戶工作內容進行提煉及規範,明確每個流程的標準輸入、輸出內容。流程節點、節點產出物及數據治理平臺四大能力對應關係如下所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/10\/ed\/108ce8103dc0f41796922713b4703ced.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中因需求、概要設計和詳細設計爲執行過程中的核心流程節點,將針對此三部分進行詳細講解。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 需求調研"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.1 需求調研流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據類項目總體調研流程如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/5a\/63\/5a63e05aff96yye8fc0b7740c5cyy563.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據調研是整個項目的基礎,既要詳細掌握現有業務現狀及數據情況又要準確獲取客戶需求,明確項目建設目標。如上圖所示總體分成三個大的時間節點:包括需求調研準備、需求調研實施及需求調研後期的梳理確認。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需求調研準備包括:調研計劃確定、調研前準備,具備條件的儘量開一次調研需求見面會(項目啓動會介紹過的可以不需要再組織)。其中調研前準備需針對客戶的組織架構及業務情況進行充分的瞭解,以便在後續的調研實施階段有的放矢,調研內容更爲詳實,客戶需求把控更爲準確。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"調研實施階段一般組織兩輪調研,第一論主要是瞭解業務運轉現狀、對接業務數據以及客戶需求。第二輪針對具體的業務和數據的細節問題進行確認,及分析後的客戶需求與客戶確認。對於部分系統的細節問題以線下方式對接,不再做第三輪整體調研。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需求調研後期主要是針對客戶需求及客戶業務及數據現狀進行內外部評審並確認簽字,以《需求規格說明書》形式明確本期項目建設目錄。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.2 需求調研工作事項"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/7f\/cd\/7fb23c2f155855dd99b8ac363e2f8acd.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上表描述了需求調研過程關鍵節點的客戶方及項目組工作內容內容及輸入輸出,並說明了需求調研階段的總體原則、調研方式及相關要求。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.3 需求調研注意事項"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)需求收集"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關鍵干係人需求"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"真正用戶是誰及其需求"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需求獲取前置問題:客戶管什麼,重點關注什麼,目前如何管理,欠缺什麼,重複勞動有哪些?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2)需求驗證"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3W驗證,誰來用,什麼場景下用,解決哪些問題?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原型草圖"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3)需求管理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"核心需求(需求需融入業務流程併發揮實際效用)"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"識別是否行業共性(有餘力則做沒有則算,項目管理角度不需要,行業角度需要)"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(4)需求確認"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"形成文字版需求規格說明書"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"務必簽字確認(後續可以更改,大變更需記錄)"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 概要設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據治理項目概要設計主要涵蓋網絡架構、數據流架構、標準庫建設、數據倉庫建設四部分內容。總體目標是明確數據如何進出數據治理平臺(明確網絡情況)、數據在平臺內部如何組織及流動(數據流架構及數據倉庫模型)以及數據在平臺內部應遵循哪些標準及規範(標準庫)。針對每部分具體工作事項及輸入、輸出如下所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/2a\/3f\/2a2f92978432f275604175ff615e303f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.1 網絡架構示意圖"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/ac\/aa\/ac592f71052fb09f9yy604bc076cddaa.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"網絡架構要明確硬件部署方案、待接入系統網絡情況及後續使用人羣及訪問系統方式,以便滿足數據接入及數據服務需求。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2 數據流示意圖"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/e3\/ba\/e3a66a5fb7f45edd75f0dd0577b1a8ba.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據流架構要明確各類數據的處理方式及流向,以便確認後續數據加工及存儲方式。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3 數據標準內容示意圖"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/1f\/49\/1f72132725fe53526d32b62f46053149.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"標準庫建設要明確平臺所遵循的各類標準及規範,以保證平臺建設過程的統一規範,爲後續業務賦能打下堅實基礎。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.4 數據倉庫主題域及核心實體示意圖"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/66\/0c\/6604d86d67c50eb6188e937901d1350c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據倉庫建設要明確主題域及關鍵實體,明確後續數據拉通整合的實體對象,以更好地支撐繁雜多變的數據需求。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 詳細設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"詳細設計針對項目實際落地的工作模塊分別進行設計,明確每部分實現的設計,具體模塊、工作內容、輸入、輸出如下所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/34\/e3\/3472776e42bc383b215b79883868aee3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、數據治理自動化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在將數據治理項目流程化以後整個工作內容及具體工作產出已經比較明確了,但是會發現流程中會涉及到大量的開發工作,同時發現很多工作具有較高的重複性或相似性,開發使用的流程及技術都是一樣的只是配置不同,因此針對流程化以後各節點的自動化開發應運而生。通過配置任務的個性化部分,然後統一生成對應的開發任務或腳本即可完成開發。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自動化處理一般有兩種實現路徑,其一是採購成熟數據治理軟件,其二是自研開發相應工具。其中數據治理過程中可實現自動化處理的流程節點如“工序”標藍色部分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/60\/a5\/609e3f016be02d78d9c82dddf446a5a5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注:對於需求調研、模型設計等流程節點因爲涉及到線下的訪談、業務的理解更多的是與人的溝通交流,進而獲取相應的業務知識及需求,並非單純的計算機語言同時“因人而異”的情況也比較常見,所以此部分相關工作暫時還以人工爲主。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因數據接入、腳本開發及數據質量稽覈在日常工作中佔用時間較長,下面將詳細講解此三部分內容。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 批量數據接入"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據接入是所有數據治理平臺的第一步,批量數據接入佔數據接入工作量的70-90%之間。自動化處理即將任務個性化部分進行抽象化形成配置項,通過配置任務的抽象化配置項,進而生成對應的任務。批量數據接入抽象以後的配置項如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"源系統:源系統數據庫類型"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"源庫名:源系統數據庫庫名稱(數據庫的鏈接方式在其他地方統一管理)"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"源表名:源系統數據庫庫表名稱"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目標系統:目標數據庫類型"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目標庫:目標數據庫庫名稱"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目標表:目標數據庫庫表名"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"增\/全量:1表示全量接,0表示增量接"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/25\/68\/2504bfc9874286ea8c62b7e8c7b3b668.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"示例配置如上,不管使用sqoop、datax等方式都可以批量生成對應命令或配置文件,實現批量生成接入作業,實現自動化數據接入工作,數據接入效率提升75%以上,後續只需驗證數據接入正確性即可。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 腳本開發"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"資源庫、主題庫的加工腳本佔整體開發工作工作的50%-80%,同時經過對此部分數據加工方式進行特定分析後,數據常用的處理方式如下一般有以下幾種類型:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/4e\/c5\/4e5190292b802a3d79673680e899f3c5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將以上加工方式進行總結後可沉澱出以下幾種數據處理方式:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/57\/10\/577fe36891a2e85ef168bd0fd8d7d310.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結合Mapping文檔選定以上數據處理方式的一種即可自動生成資源庫或主題庫對應腳本,開發效率得到大幅度提升,整體效率提升60%以上(模型及Mapping設計尚需人工處理)。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. DQC"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據質量是PDCA實施總體指導思想的關鍵一步,是發現數據問題以及檢查數據標準規範落地的必須環節。針對具體的規則都可以通過產品和自助開發來實現,只需進行相應配置即可實現自動化檢查,具體檢查事項如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/10\/29\/10aa708c165eyy4fb3e044d3bf3abf29.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、數據治理智能化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過自動化階段以後數據治理流程中數據倉庫模型設計、Mapping映射等階段依舊有非常多人工處理工作,這些工作大部分跟業務領域知識及實際數據情況強相關,依賴專業的業務知識和行業經驗纔可進行合理地規劃和設計。如何快速精通行業知識和提升行業經驗是數據治理過程中新的“攔路虎”。如何更好地沉澱和積累行業知識,自動地提供設計和處理的建議是數據治理“深水區”面臨的一個新的挑戰。數據治理智能化將爲我們的數據治理工作開闢一個 “新天地”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在整個數據治理流程中智能化可以發揮作用的的節點如“工序”標紅色部分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/01\/92\/01da38c4ec315fc4ca9deeecede99192.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實現智能化的第一步是如何積累業務知識及行業經驗,形成知識庫。數據治理知識庫應包括:標準文件、模型(數據元)、DQC規則及數據清洗方案、腳本數據處理算法、指標庫、業務知識問答庫等,具體涵蓋內容及總體流程如下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/2d\/26\/2daaf57b2b57c1ae34a5841e5f48aa26.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 標準文件"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在2B和2G行業尤其是2G行業,國家、行業、地方都發布了大量的標準文件,在業務和技術層面都進行了相關約束,並且指導新建業務系統的開發。標準文件知識庫涵蓋幾個方面:a.國標、行標、地標等標準的在線查看 b.相關標準的在線全文檢索 c.標準具體內容的結構化解析。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 數據元(模型)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於不同行業來說技術標準中的命名以及模型是目前大家都比較關注的,也是在做數據中臺類項目以及數據治理項目比較耗時的地方,在金融領域已經比較穩定的主題模型在其他行業尚未形成統一,所以對於做2B和2G市場的企業如何能沉澱出特定行業的數據元標準甚至是主題模型,對於行業理解及後續同類項目交付就至關重要。具體包括:實體分類、實體名稱名稱、中文名稱、英文名稱、數據類型、引用標準等。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. DQC(數據質量稽覈)&數據清洗方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據治理的關鍵點是提升數據治理,所以不同行業及各個行業通用的數據質量清洗方案及數據質量稽覈的沉澱就尤爲重要,比如通用規則校驗身份證號18位校驗(15轉18)、手機號爲11位(如有國際電話需加國家代碼)、日期格式、郵箱格式等。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4. 腳本開發"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據類項目中,數據mapping確認以後就是具體的開發了,由於數據處理方式的共性,可以高度提煉成特定類型的數據處理,比如交易流水一般採用追加的方式,每日新增數據append進來即可。狀態類的歷史拉鍊表形式等。此過程中的步驟都可以通過自動化程序來實現,同時藉助於上面沉澱的具體標準內容,進一步規範化腳本開發。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5. 指標庫"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於一個行業的理解一定程度上體現在行業指標體系的建立,行業常用指標是否覆蓋全,指標加工規則是否有歧義是非常重要的兩個考覈項,行業指標庫的建立對於業務知識的積累至關重要。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"6. 業務知識問答庫"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/52\/42\/52fcd71576fa6a0eae724f64dc9d7e42.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"行業知識積累的最直觀體現是業務知識問答庫的建立,各類業務知識都可以逐步沉澱到問答庫中,並以問答等多種交互方式更便利的服務於各類使用人員。比如生態環境領域AQI的計算規則,空氣常見污染因子、各類污染指標的排放限值等,都可以以問答對形式進行沉澱。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a2\/7c\/a23264af42e1b6192aabb0e1685c837c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於以上知識的不斷沉澱積累,在數據治理開展過程中即可進行智能化推薦。如上圖所示,在做實體及屬性認定時結合NLP技術和知識庫規則即可進行相似度認定推薦。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"並且隨着行業知識的不斷積累和完善後期可以直接推薦行業主題模型及主數據模型,以及針對實體及屬性的數據標準、數據質量檢查規則的推薦。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"流程化是數據治理工作開展第一步,是自動化和智能化的基礎,將數據治理各節點開展過程中用到的內容進行梳理並規範,包括:業務流程圖、網絡架構圖、業務系統臺賬等,行業知識梳理完善以後形成行業版知識(抽離通用版),如標準文件梳理:1.代碼表整理,2.數據元標準整理(數據倉庫行業模型對應標準梳理)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自動化是將流程化標準後的工作進行自動化開發,涉及倉庫模型設計、標準化、腳本開發、DQC、指標體系自動化構建,包括:自動化程序生成和自動化檢查。自動程序生成一是解放生產力,提高效率而是提升開發的規範化。自動化檢查包括:1.發現數據問題,出具質量報告(唯一性、空值等通用問題),2.行業知識檢查(行業版內置,不同行業關注的重要數據問題,並且會不斷完善知識庫)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"智能化是在流程化、自動化基礎之上針對數據拉通整合、主題模型、數據加工檢查給出智能化建議,減少人工分析的工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總體思路先解決項目上標準化執行問題,然後提升建設效率及處理規範化問題(自動化處理),最後基於業務知識的沉澱最終實現全流程智能化構建。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"本文轉載自公衆號百分點(ID:baifendian_com)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/MulmH1nnQorlzxS-357MXg","title":"","type":null},"content":[{"type":"text","text":"數據治理“PAI”實施方法論"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章