乾貨 | 用數據描述和驅動業務,攜程指標標準化管理實踐

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"攜程金融自2017年成立以來,繼承了互聯網企業“小步快跑,快速迭代”的基因,一直保持高速發展。不過業務的頻繁迭代以及分散性的數據組織架構,給數據治理工作帶來了很大的挑戰。特別是在指標應用層面,這些挑戰更爲明顯:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務頻繁迭代,數據知識相對於業務模型變更存在一定滯後性,導致不同數據使用人員對業務理解存在較大偏差。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據團隊比較分散,指標建設存在嚴重冗餘,不僅導致資源浪費,並且在口徑描述上缺乏一致性管理,導致在指標使用過程中有很多分歧。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"標準化規範浮於表面,無法在數據開發的全生命週期實現系統性約束和校驗,存在數據質量風險。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對以上這些問題,我們參考了業界比較成熟的OneData方法論。OneData提出了數據建設的三個統一:統一指標定義、統一數倉建模,統一開發流程。基於此,我們結合金融獨有的組織架構及業務特點,從指標定義標準化、流程管理系統化兩個層面進行了設計和實踐,以保障數據能有效支撐和驅動業務的高速發展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/be\/be7a338a38c3bf881e578420f646217e.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、指標定義標準化:關於指標定義的思考"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數倉建設者對業務的把控程度直接決定了數倉質量的高低,因此在數據建設過程中如何實現數據模型與業務模型的統一,一直是我們思考的重點。之前我們都是通過文檔或wiki的形式來梳理並記錄業務知識。比如在指標文檔中,我們會通過詳細的文字描述信息來確認指標口徑。但是受限於文檔維護者對業務和需求的理解程度以及口語化描述本身的侷限性,這種方式很容易帶來理解歧義問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過日常工作總結,我們發現要解決理解歧義問題,必須從兩個方面入手:統一收口指標口徑描述和標準化定義流程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"口徑定義收口,即做到指標的統一錄入和查詢功能,保證指標定義邏輯對所有用戶是可見的。口徑定義收口很容易通過系統實現,但是如何保證指標定義流程的標準化呢?最重要的是要保證指標定義工作可流程化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們總結了數據分析人員日常指標定義的工作流程,發現了其中可以流程化的四個節點,如下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e3\/e344f7ac2da6967410f6edbcb239c629.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們以“每日IOS端金融APP註冊用戶數”指標爲例對指標的定義過程進行描述:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"指標定義的首要流程,需要明確其要量化的業務線和場景。該例指標需量化的是“金融”板塊的“拉新”場景。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次對該業務線場景下所涉及的業務流程(或事件)進行梳理,明確關鍵業務步驟,並找到該指標所量化的業務節點,即“註冊”。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後是考慮如何對該業務事件進行量化,即明確定義“註冊”事件的量化口徑是“用戶數”。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後就是對指標進行維度拆解,該例所涉及的維度有兩個“每日”和“IOS端”。“每日”按照日期維度對指標進行聚合彙總;“IOS端”是限定註冊來源,進行下鑽過濾。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、流程管理系統化:系統設計實踐"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於以上描述,我們可以看到指標定義流程已經完成了標準化拆解。但在數據標準管理中,通常標準規範相對好制定,而標準落地就比較困難,這大多都是因爲數據標準缺乏有效約束造成的。因此我們開始了“指標標準化管理系統”的開發,藉助工具化手段將預定義規範約束到指標定義、開發、應用等各個鏈路。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在該系統中,我們將指標定義拆解出的四個流程抽象爲了\"業務板塊和數據域\"、\"業務過程\"、“維度管理”、\"指標設計\"四個模塊。下面就依次介紹一下這幾個模塊的具體實現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d2\/d2b9824996557564ba0465495c8ece1b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.1 業務板塊和數據域"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"指標體系構建的第一步就是對當前需要分析的業務場景進行抽象,明確其所屬場景。業務板塊代表有獨特業務場景和流程的業務體系。業務板塊是一種大的劃分,各業務板塊之間的業務重疊度極低。數據使用方可根據自身業務的理解和抽象,在所屬業務板塊下創建的獨有的指標體系,數據獨立建設。數據域則是某個業務板塊下,對一類業務活動的抽象集合,是一個較高層次的數據歸類標準,是對企業業務過程進行抽象、提煉、組合的集合,一般與數倉主題層對應,面向業務分析。業務板塊和數據域是有效歸納組織業務過程的方式,方便了對指標的快速定位。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2 業務過程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務過程可以說是該系統中最核心的模塊。通過將一次業務行爲事件抽象爲業務過程,並在每個業務過程中維護了具體的表和數倉關聯了起來,實現了業務模型與數據模型的統一。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.2.1 概念"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務過程代表某個數據域下的不可拆分的事件行爲。業務過程代表的是一次業務事件,而且該事件在該數據域下是具體、明確且沒有歧義的。業務過程一般用來標識一次業務活動中的關鍵節點事件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務過程可分爲兩類:事務型和快照型。事務型業務過程代表一個時點動作,因此都會具有時間屬性。比如一次“交易”業務中可能會有:下單,付款,取消,退款,發貨等業務過程,相對應的時間屬性分別有下單時間、付款時間、退款時間、發貨時間等;快照型則表示非事件動作的週期性度量,比如在貸餘額、庫存等,這種類型的時間屬性僅爲觀察時點。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.2.2 數據表映射"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務過程一般會與數倉維度建模過程中的事實表進行關聯,比如:事務型業務過程會對應事務事實表或累計快照事實表,快照型業務過程則一般對應週期快照事實表。業務過程與事實表之間一般爲一對一的關係,也有一對多或多對一的特殊情況,比如:多事務事實表和累計快照事實表就會將多個業務過程產生的事實在一張表中表達,因此在構建過程中,不僅需要維護與事實表的關係,還要添加“約束條件”解決此類問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/64\/644288cb60650c6ee19161797acb48b8.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.2.3 數據流程圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"日常工作中,業務人員總會通過繪製業務流程圖來說明整個業務邏輯流向,幫助開發總覽業務全貌,釐清業務細節。數據分析師,要做到數據驅動業務發展,不僅需要熟知業務流程,也需要熟知數據流程,即將業務流程轉化爲數據流程。基於此我們在系統中實現了特定業務場景下的“業務過程”以業務流程圖的形式呈現出來,幫助數據使用方更加明確業務過程在整個業務域中所處的節點和與其關聯事實表的觸發場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/08\/081fa4b6e08bdd25ad8b077b26b262ce.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.3 維度管理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"維度是業務過程(或事件)發生時所處的環境,用來反映業務的一類屬性。因此,如果要對特定“業務過程”充分描述,就需要滿足與其關聯的事實表要冗餘足夠多的維度屬性。在當前以Hadoop等大數據框架爲主要構建方式的數倉體系中,爲了降低事實表使用時的資源消耗,提升計算效率,事實表大多以冗餘維度屬性的寬表形式存在,但是大量的維度冗餘也帶來了模型穩定性的降低。因爲維度的屬性是有可能發生變化的,如果屬性已經冗餘到事實表中,那麼維度屬性就與事實一起被記錄到事實表中。如果後續維度屬性值改變,由於事實表已經生成,事實表的內容基本不會再做改變,這樣就會出現已記錄的維度屬性與真實的維度屬性不一致,導致數據錯誤的情況。因此,維度屬性冗餘帶來的收益與弊端要綜合考慮。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼我們如何在事實表不冗餘屬性的基礎上充分描述業務過程呢?我們通過在業務過程上構建“衍生屬性”和“關聯維度”的功能,進行屬性的擴充。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"衍生屬性:通過 SQL 表達式對業務過程所關聯事實表中已有的事件屬性進行二次加工,產生一個新的屬性值。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關聯維度:在業務過程中通過已有事件屬性與維度表關聯,將維度表中的屬性擴充到該業務過程中。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b8\/b80923183a4dec8229b2259fe7e6f7fb.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務過程和維度管理的維護,不僅真正實現了數據對業務的準確描述,另外也給數據賦予了“靈魂”,提高了數倉主題表的可懂性和易用性,爲其有效推廣帶來了很大幫助。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.4 指標設計"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.4.1 定義"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"指標就是業務運轉過程中產生的度量事實,指標設計是爲了在企業內外部使指標的命名、計算方式、業務理解達到一致,避免不同部門同一個指標的數據存在理解不一致的情況。指標定義則是針對業務過程,從不同維度的量化過程。這個過程進一步抽象就是包含兩部分:量化和維度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"量化:也就是指標的統計規則定義。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"維度:則是從事件的不同角度進行細化分析。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.4.2 原子指標"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於指標的量化,我們在系統中提供了“原子指標”的概念,它是基於某一業務過程下的度量,是業務定義中不可再拆解的指標,具有明確的業務含義。原子指標的核心功能就是對指標的聚合邏輯進行了定義。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7d\/7dcbbfac4457bf875cec536a20d26182.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.4.3 派生指標"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原子指標因爲沒有統計粒度(及承載指標的實體維度),因此也就不會有特定的數據與其對應,而只有在原子指標的基礎上添加了粒度等信息,才能和具體數據對應,而這就是“派生指標”。如果說原子指標只是一個抽象的邏輯定義,那麼派生指標則具象化後的實際度量值。派生指標是由原子指標、維度屬性、限定屬性組合而成。之前提到的“每日IOS端金融APP註冊用戶數”指標就是派生指標,對其進行拆解如下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4c\/4ca570ddab3423171488053265bf470f.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原子指標和派生指標的定義,通過派生指標繼承原子指標業務場景、關聯事件、統計邏輯、描述信息等元屬性,進一步收斂了指標定義口徑,提高了指標一致性。如果以軟件工程舉例:原子指標可以理解爲父類。派生指標代表繼承“原子指標抽象類”的實體類。原子指標無法直接構造對象,只能通過派生指標構造,但派生指標必須依賴於某原子指標。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.4.4 指標邏輯解析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於設計即開發的思想,我們可以總結髮現在完成“業務過程”、“原子指標”和“派生指標”整個指標定義流程之後,該指標的邏輯便以數據分析師們最熟悉的“統一語言”呈現了出來,通過SQL實現了指標口徑的一致性約束。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務過程定義了指標的數據源表(與其相關的事實表及關聯維度表)及星型模型關聯關係。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原子指標定義了指標的聚合邏輯(sum\/avg\/count)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"派生指標定義了指標的分組(group by)和限定(where)邏輯。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/15\/15a0c2909864ec4334c10c11be11516b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、實踐應用"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.1 指標項目構建"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"完成指標標準化定義後,我們已經實現了指標定義的統一收口,保證了指標的可見、可查、可信,但是還缺少應用層面的統一收口。如果在應用層面沒有統一收口,那麼前期的指標定義更像是“空中樓閣”,還是無法從實際應用角度解決指標一致性問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如下圖,在沒有實現應用統一收口的時候(重構前),指標系統只是起到了指標註冊和口徑描述功能,調度開發時只是會以指標系統的口徑爲參考,並未完成定義與開發的統一(如“指標B”的建設邏輯還是會冗餘在兩個JOB中),整個研發過程還是屬於“面向需求”的煙囪式,這不僅破壞了數據一致性原則,也缺乏複用能力,增加了資源浪費和後期維護成本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/de\/de4d837d55314ae4b62a9ae07eb2b75e.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,我們基於配置即開發的原則,通過系統保證了指標與任務的統一。開發流程由原來的“面向需求開發”轉變爲“面向指標開發”,實現了指標任務間的解耦,一個指標只對應一個任務,提升了開發運維效率。同時,我們構建了“指標項目”模塊。該模塊對粒度相同的相關指標進行自定義歸納,完成與需求的一一對應,保證了應用層面的統一收口。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.2 指導建模"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面我們提到,指標的統計邏輯來源於數倉中的事實表和維度表。而這兩類表正是數倉維度建模理論的最終呈現。維度建模以分析決策的需求出發構建模型,構建的數據模型爲分析需求服務,因此具體建模過程中,需要對哪些業務從哪些維度進行建模,可以通過指標定義倒推出來。基於該系統的指標定義功能,其實就是維度建模的過程。下圖爲兩者對比:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c2\/c24c3c375e446dc1a20468a312d3fd55.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以發現,我們在指標定義過程中,數據域定義、業務過程拆解、維度細化等方面完全和維度建模的總線矩陣構建流程相符。另外,由於該系統實現了指標定義sql的展示,因此數據分析人員就可以很容易的找到與其相關的事實表和維度表,並通過業務過程的數據表映射功能,瞭解事實表的落地場景和含義,非常有利於數倉模型的推廣,從而提升主題表的複用性,減少重複計算造成的資源浪費。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.3 業務梳理與數據調研"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“懂業務”是數據工作的基石,數倉建設者即是技術專家,也應該是“大半個”業務專家。因爲唯有理解業務,才能發揮數據最大的價值,準確的量化和驅動業務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是很多數據分析人員由於“離數據太近,離業務太遠”而往往很難釐清數據背後的業務流程。針對這個難題,我們通過指標系統“業務過程”這個模塊,幫助數據分析人員去做業務和數據調研,發現業務和數據之間的關聯關係,真正做到“用數據描述和驅動業務”。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五、總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數倉工作中最難的部分並非涉及編碼的ETL開發工作,而是如何從繁雜的中尋找到準確、可信的數據,並用這些數據來準確的描述業務,或者說將業務流程轉化爲數據流程。因此,我們遵從“工作流程化、流程工具化”的原則,以系統的形式將這部分工作規範起來,實現了業務數據調研、指標定義與應用的標準化收口,帶來了以下收益:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"降低了數據調研成本,將調研成果以更爲友好的方式呈現出來,讓數據人員更懂業務、業務人員輕鬆瞭解數據。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"指標定義流程與數倉“維度建模”理論高度吻合,實現了業務模型與數據模型的統一。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"降低了數據使用成本,減少了指標冗餘,提升了數據開發與運維效率。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Chao,攜程資深數據分析經理,關注數據治理、數據倉庫和數據分析領域。致力於數據使用效率及價值提升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:攜程技術(ID:ctriptech)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/_17P0bwKVmzvJ2nEWjSUBw","title":"xxx","type":null},"content":[{"type":"text","text":"乾貨 | 用數據描述和驅動業務,攜程指標標準化管理實踐"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章