乾貨 | 攜程平臺化常態化數據治理之路

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據的重要性不言而喻。每個數據工程師每天會產生大量數據,但這些數據佔用的成本、帶來的價值、質量如何,以及在保證安全的前提下是否能夠更高效地使用,是每個公司在大數據發展到一定階段後都會遇到的問題。而攜程由於涉及的業務線多,數倉團隊多,數據安全高效地流通也是一個治理難點。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、治理思路"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"何爲數據治理?數據治理和衆多新興學科一樣,也有很多種定義。IBM認爲,數據治理是根據企業的數據管控政策,利用組織人員、流程和技術的相互協作,使企業能夠將“數據作爲資產”來管理和應用。根據伯森和杜波夫的定義,數據治理是一個關注於管理信息的質量、一致性、可用性、安全性和可得性的過程。這個過程與數據的擁有和管理的職責緊密相關。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通常認爲,數據治理是圍繞數據資產展開的一系列工作,以服務組織各層決策爲目標,是數據管理技術、過程、標準和政策的集合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"綜上,數據治理離不開數據資產的沉澱,只有對數據有宏觀地把控、明細地探究,才能貼合數據特性進行治理。所以要進行集團層面的數據治理,就需要集團層面的數據資產平臺。攜程數據資產管理平臺(大禹)應運而生。攜程數據治理體系的目標是可以讓每一位數據生產者對各自擁有的數據進行常態化治理。而目前階段數據治理的核心目標就是提升數據價值、提高數據質量、促進數據流通。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"數據價值"},{"type":"text","text":":首先要治理的就是低價值甚至無價值的數據,例如長期無訪問、生命週期過長的數據。其次計算資源消耗較多的數據要進行歸因分析,針對性優化。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"數據質量"},{"type":"text","text":":完善表的元數據信息包括責任人、數倉分層、主題、重要等級和敏感等級,配置數據質量監控,重點治理無人維護的數據。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"數據流通"},{"type":"text","text":":保障安全的前提下,提高權限審批效率,促進數據流轉。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、方案實施"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.1 元數據建設"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據治理的首要工作是搭建元數據數倉。元數據一般分爲四類:技術元數據、操作元數據、管理元數據和業務元數據,分別描述了數據的物理化、處理過程、管理過程及數據定義。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"技術元數據:存儲相關數據,包括表的元數據、字段元數據等。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"操作元數據:ETL相關數據,包括調度元數據、執行元數據、調度之間的血緣元數據等。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"管理元數據:包括管理者信息、監控日誌、管理日誌、管理成效等。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務元數據:包括數據標準、數據質量、數據指標、數據字典、數據代碼、數據安全等。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現階段最爲豐富的數據是技術元數據和操作元數據, 有了這些元數據就可以對計算\/存儲成本、元數據完整度、數據質量監控的覆蓋率\/通過率、臨時表、無人維護表等進行統計分析,進而推進相關專項治理。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2 專項治理"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.2.1 成本治理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大多數的數據工程師關注的是需求交付,對存儲、計算成本認識不足。目前集團大數據集羣計算成本和存儲成本比例是4:6,通過初步治理,可節約年成本數千萬元。在大禹(數據資產管理平臺)上可以直觀地看到每個員工擁有的Hive表數、日均存儲成本、日均計算成本和在完成數據治理後預計節省的年成本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3.2.1.1 計算成本"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算成本主要來自於CPU資源的消耗,根據每個調度任務對CPU核數和時間的佔用情況估算出成本。CPU的運行成本根據集羣的運營情況,計爲10元\/1M VCS(每個CPU核佔用的秒數)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算資源主要消耗在ETL調度和Adhoc查詢,由此我們對典型低效SQL進行了歸因分析。選擇部分BU作爲試點,針對單次消耗大於10元的高消耗調度進行優化。雖然集團內這些高消耗調度佔比1%,但是佔據了千萬量級的年計算成本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5a\/5a7695ab4165d0f70abedeec0f46cb46.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"表1:高消耗問題歸因及解決方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於Adhoc查詢而言,1%超過30元\/次,13%超過0.3元\/次。僅這14%的查詢就佔據了超一半的算力成本。除了邏輯、業務、分區層面的優化,技術參數優化也進行全面推廣。例如常見的幾類MR優化:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)合併小文件:配置Map輸入合併、Map\/Reduce輸出合併。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)合理控制reducer數量"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參數1:hive.exec.reducers.bytes.per.reducer(默認1G)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參數2:hive.exec.reducers.max(默認爲999)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"reducer的計算公式爲:min(參數2,總輸入數據量\/參數1),也可以通過設置mapred.reduce.tasks直接控制reducer個數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)使用相同的連接鍵:當對 3 張或更多表進行 join 時,如果 on 條件使用相同字段,會合併爲一個 MapReduce Job。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)SMB(sort merge bucket join):用於兩張大表進行join,但需要預先給每張大表基於join的字段建立桶。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"set hive.enforce.bucketing = true; --啓用桶表"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"set hive.optimize.bucketmapjoin = true; "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"set hive.optimize.bucketmapjoin.sortedmerge = true; "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"還有將數據傾斜的異常值打散或單獨處理、啓用壓縮、矢量化執行等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3.2.1.2 存儲成本"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存儲成本重點治理長期無訪問數據和用戶行爲數據(UBT),其次統一表存儲格式爲ORC,採用冷熱存儲、EC存儲,最後清理重複的大文件和業務不再需要的數據。通過這些治理手段,新增存儲需求縮減50%,佔總存儲的20%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)近30天無訪問表的成本佔據總存儲的20%,其中99%是臨時表。這些無訪問表由BU內部進行確認清理,一些日誌表或者集團的用戶行爲數據等需要長期保存的會加入白名單,沒有加入白名單的表會自動刪除。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)用戶行爲數據之前全鏈路保存了三年的歷史,通過逐漸縮短整個流程數據的生命週期達到縮減成本的目的。爲了做到治理過程中下游無感知,將原表改爲備份表再創建一個原表表名的視圖,逐漸縮短視圖可讀的時間範圍,待下游使用無異常之後可將備份表的生命週期縮短。這個優化節省了大量存儲成本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)由於歷史遺留問題,之前表的數據格式未完全統一。RCFile佔比13.46%,Avro佔比1.99%,壓縮表佔比5.4%,非結構化數據佔比24.15%。所以將這些錶轉化爲ORC格式,同時提升計算效率和存儲能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)將不常用但需要保存的數據進行冷存儲。冷存儲的成本爲熱存儲的40%,使用EC技術可進一步壓縮到20%。但是冷存儲會影響查詢的性能,需要根據數據的使用場景綜合考慮。這個優化也節省了不小的存儲成本。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.2.2 質量規範"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先完善表的元數據信息,配置數據質量監控(DQC),其次重點治理無人維護的表和臨時表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)完善元數據信息:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/16\/16504d6918a83d739a7c3223ac8dc808.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"表的元數據信息"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前統計到的表和字段的元數據信息見上圖,從中選取了12個重要指標作爲完整性維度的統計,如下圖。歷史表的完整性也會按照設定的截止時間進行批量補充,同時新建正式表嚴格按照完整性的規範建立,否則無法創建。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8d\/8dc26660f74e65951e324183c5749ec0.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)配置DQC:原則上每個正式使用的表都需要配置DQC校驗,比如保證調度完成後的數據要大於一定數量,今天和昨天的數據波動要在一定的範圍,某些情境下需要主鍵唯一,或者自定義校驗規則。校驗規則分爲強規則、弱規則。強規則會熔斷下游,防止錯誤數據影響到下游的使用,對生產造成不可逆的影響。弱規則會觸發郵件警告。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)無人維護表治理:因爲離職轉崗等原因,有些表的責任人缺失,給下游使用造成了一些困難。我們首先將無人維護表的明細開放給各BU,推動BU補全責任人信息。後期開發了資源轉移系統,離職或轉崗前會將責任人名下的資源進行一鍵轉移。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)臨時表治理:臨時表數量佔總表數量的比例較高,需要進行治理。我們明確了臨時表的使用規範,只是作爲臨時使用,七天後自動刪除。可以用來進行探索性分析、排障,但是不可用於報表依賴、調度依賴、數據傳輸。調度任務中產生的中間表需要在任務結束後刪除。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.2.3 數據流通"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據流通主要關注的是共享數據。有兩個來源:跨BU合作的項目,中臺提供的服務於全業務的數據比如:統一訂單數據等。重點治理的是跨BU合作的項目中由於組織架構的改變、項目組變動、數據源變更等原因產生的權限外溢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現階段的治理考慮兩個方面:既要增加BU之間的數據流通性、提高數據價值,又要及時治理權限外溢、敏感數據泄露。易用性與安全性之間的平衡存在一定挑戰。爲此我們上線級聯審批功能。對於設置級聯審批的表,其下游表的權限審批需要上游表owner共同參與,進一步加強了數據安全性。同時上線了基於密級的差異化審批流程。對於高密表從嚴把控,低密表則儘量簡化審批流程,方便數據快速流通。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、平臺化與常態化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據資產管理平臺目前有三大功能模塊,分別是資產盤點、治理工具、健康分析。三個模塊的關係如下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/6d\/6d1eca17c84366800816948231dac5cd.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中資產盤點主要是資產數據看板,包含集團、BU組織和個人的資產概覽,成本分析,質量和數據共享相關指標。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二個模塊是數據治理。數據屬主可以在“我的工作臺”對有問題的數據進行便捷地治理。需要治理的數據都會以問題標籤的形式進行分類展示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三個模塊是數據健康分析。分爲資源利用、管理規範、成果交付、數據安全四個維度對數據的健康狀態進行統計。BU內部想要提質降本、提高開發效率,健康分會是一個最直觀的指標。如果有BU疏於數據治理,那麼相應的健康分和BU之間的排名就會下降,以此來促進常態化治理。下圖爲數據健康分總覽。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1e\/1e900593efe6fcec320ab76bec779962.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"資源利用"},{"type":"text","text":":考察近7天CPU離散係數、高消耗調度成本系數及近45天無訪問表成本佔比。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c2\/c2f3b6ee288abfe0e1fce1317da707ef.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/66\/66a03e115fc930191dda02bffa11934d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b7\/b72a9de8eb39e487b4c121e5c5d4ec9f.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"資源利用健康分"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"管理規範"},{"type":"text","text":":考察表的元數據(數倉分層、責任人、重要等級、基線、敏感等級和主題等)完整性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c1\/c15e051eacca4ab5d862ef27ab4acf14.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"管理規範健康分"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"成果交付"},{"type":"text","text":":考察失敗調度佔比和查詢時長。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/bc\/bc80545926468142436336d7f2c5596b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8b\/8ba3bc99ac87f2452a66ee08672579c0.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"成果交付健康分"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"數據安全"},{"type":"text","text":":重點考察對敏感數據的使用是否存在風險。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五、總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據治理是一個比較寬泛的概念,每個公司需要治理的數據不一樣,並且同一公司不同的發展階段治理的內容也不一樣。需要決策層根據數據體系發展的階段確定本階段治理的核心目標,以此來展開治理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現階段我們針對數據的成本、質量、流通三個維度的重點問題進行了治理。下階段將會有更高的治理要求。同時由於數據在不斷產生,治理也不是一勞永逸的,所以藉助平臺讓每個數據生產者可以便捷地進行常態化治理是必經之路。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:攜程技術中心(ID:ctriptech)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/B9T_hNBfm8nl85BYvmhftA","title":"xxx","type":null},"content":[{"type":"text","text":"乾貨 | 攜程平臺化常態化數據治理之路"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章