《華爲數據之道》讀書筆記:第 5 章 面向“聯接共享”的數據底座建設

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"1 支撐非數字原生企業數字化轉型的數據底座建設框架","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"華爲公司通過建設數據底座,將公司內外部的數據匯聚在一起,對數據進行重新組織和聯接,讓數據有清晰的定義和統一的結構,並在尊重數據安全與隱私的前提下,讓數據更易獲取,最終打破數據孤島和壟斷。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過數據底座,主要可以實現如下目標:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" 1)統一管理結構化、非結構化數據。","attrs":{}},{"type":"text","text":"將數據視爲資產,能夠追溯數據的產生者、業務源頭以及數據的需求方和消費者等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2)打通數據供應通道,爲數據消費提供豐富的數據原材料、半成品以及成品,滿足公司自助分析、數字化運營等不同場景的數據消費需求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"3)確保公司數據完整、一致、共享。","attrs":{}},{"type":"text","text":"監控數據全鏈路下的各個環節的數據情況,從底層數據存儲的角度,診斷數據冗餘、重複以及“殭屍”問題,降低數據維護和使用成本。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" 4)保障數據安全可控。","attrs":{}},{"type":"text","text":"基於數據安全管理策略,利用數據權限控制,通過數據服務封裝等技術手段,實現對涉密數據和隱私數據的合法、合規地消費。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1.1 數據底座的總體架構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"華爲數據底座由數據湖、數據主題聯接兩層組成","attrs":{}},{"type":"text","text":",將公司內外部的數據彙集到一起,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"並對數據進行重新的組織和聯接","attrs":{}},{"type":"text","text":",爲業務可視化、分析、決策等提供數據服務。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據湖是邏輯上各種原始數據的集合,除了“原始”這一特徵外,還具有“海量”和“多樣”的特徵。數據湖保留數據的原格式,原則上不對數據進行清洗、加工,但對於數據資產多源異構的場景需要整合處理,並進行數據資產註冊。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據主題聯接是對數據湖的數據按業務流/事件、對象/主體進行聯接和規則計算處理,形成面向數據消費的主題數據,具有多角度、多層次、多粒度等特徵,支撐業務分析、決策與執行。基於不同的數據消費訴求,主要有多維模型、圖模型、指標、標籤、算法模型5種數據聯接方式。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1.2 數據底座的建設策略","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據底座建設不能一蹴而就,要從業務出發,因勢利導,持續進行。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"華爲數據底座採取“統籌推動、以用促建、急用先行”的建設策略。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據底座資產建設遵從下面四項原則:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)數據安全原則","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)需求、規劃雙輪驅動原則","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)數據供應多場景原則","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)信息架構遵從原則","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"2 數據湖:實現企業數據的“邏輯匯聚”","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.1 華爲數據湖的3個特點","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1)邏輯統一","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"華爲數據湖不是一個單一的物理存儲,而是根據數據類型、業務區域等由多個不同的物理存儲構成,並通過統一的元數據語義層進行定義、拉通和管理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2)類型多樣","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據湖存放所有不同類型的數據,包括企業內部IT系統產生的結構化數據、業務交易和內部管理的非結構化的文本數據、公司內部園區各種傳感器檢測到的設備運行數據,以及外部的媒體數據等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"3)原始記錄","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"華爲數據湖是對原始數據的匯聚,不對數據做任何的轉換、清洗、加工等處理,保留數據最原始特徵,爲數據的加工和消費提供豐富的可能。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.2 數據入湖的6個標準","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1)明確數據Owner","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 數據Owner由數據產生對應的流程Owner擔任,是所轄數據端到端管理的責任人,負責對入湖的數據定義數據標準和密級,承接數據消費中的數據質量問題,並制定數據管理工作路標,持續提升數據質量。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2)發佈數據標準","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 入湖數據要有相應的業務數據標準。業務數據標準描述公司層面需共同遵守的“屬性層”數據的含義和業務規則,是公司層面對某個數據的共同理解,這些理解一旦明確併發布,就需要作爲數據標準在企業內被共同遵守。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"3)認證數據源","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 通過認證數據源,能夠確保數據從正確的數據源頭入湖。認證數據源應遵循公司數據源管理的要求,一般數據源是指業務上首次正式發佈某項數據的應用系統,並經過數據管理專業組織認證。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4)定義數據密級","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 定義數據密級是數據入湖的必要條件,爲了確保數據湖中的數據能充分地共享,同時又不發生信息安全問題,入湖的數據必須要定密。數據定級應在屬性層級。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"5)數據質量評估","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 數據質量是數據消費結果的保證,數據入湖不需要對數據進行清洗,但需要對數據質量進行評估。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"6)元數據註冊","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 元數據註冊是指將入湖數據的業務元數據和技術元數據進行關聯,包括邏輯實體與物理表的對應關係,以及業務屬性和表字段的對應關係。通過聯接業務元數據和技術元數據的關係,能夠支撐數據消費人員通過業務語義快速地搜索到數據湖中的數據,降低數據湖中數據消費的門檻,能讓更多的業務分析人員理解和消費數據。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.3 數據入湖方式 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據入湖遵循華爲信息架構,以邏輯數據實體爲粒度入湖,邏輯數據實體在首次入湖時應該考慮信息的完整性。原則上,一個邏輯數據實體的所有屬性應該一次性進湖,避免一個邏輯實體多次入湖,增加入湖工作量。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"數據入湖的方式主要有物理入湖和虛擬入湖兩種","attrs":{}},{"type":"text","text":",根據數據消費的場景和需求,一個邏輯實體可以有不同的入湖方式,兩種入湖方式相互協同,共同滿足數據聯接和用戶數據消費的需求,數據管家有責任根據消費場景的不同,提供相應方式的入湖數據。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"物理入湖是指將原始數據複製到數據湖中,包括批量處理、數據複製同步、消息和流集成等方式。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"虛擬入湖是指原始數據不在數據湖中進行物理存儲,而是通過建立對應虛擬表的集成方式實現入湖,實時性強,一般面向小數據量應用,大批量的數據操作可能會影響源系統。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.4 結構化數據入湖","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結構化數據是指由二維表結構來邏輯表達和實現的數據,嚴格遵循數據格式與長度規範,主要通過關係型數據庫進行存儲和管理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"觸發結構化數據入湖的場景有兩種:第一,企業數據管理組織基於業務需求主動規劃和統籌;第二,響應數據消費方的需求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結構化數據入湖過程包括:數據入湖需求分析及管理、檢查數據入湖條件和評估入湖標準、實施數據入湖、註冊元數據。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.5 非結構化數據入湖","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非結構化數據包括無格式的文本、各類格式的文檔、圖像、音頻、視頻等多樣異構的格式文件。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非結構化數據入湖包括基本特徵元數據入湖、文件解析內容入湖、文件關係入湖和原始文件入湖4種方式。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"3 數據主題聯接:將數據轉換爲“信息”","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"華爲在數據湖的基礎上通過建立數據聯接層,基於不同的分析場景,通過5類聯接方式將跨域的數據聯接起來,將數據由“原材料”加工成“半成品”和“成品”,支撐不同場景的數據消費需求。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"多維模型","attrs":{}},{"type":"text","text":"是面向業務的多視角、多維度的分析,通過明確的業務關係,建立基於事實表、維度表以及相互間聯接關係,實現多維數據查詢和分析。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"圖模型","attrs":{}},{"type":"text","text":"面向數據間的關聯影響分析,通過建立數據對象以及數據實例之間的關係,幫助業務快速定位關聯影響。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"標籤","attrs":{}},{"type":"text","text":"是對特定業務範圍的圈定。在業務場景的上下文背景中,運用抽象、歸納、推理等算法計算並生成目標對象特徵的表示符號,是用戶主觀觀察、認識和描述對象的一個角度。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"指標","attrs":{}},{"type":"text","text":"是對業務結果、效率和質量的度量。依據明確的業務規則,通過數據計算得到衡量目標總體特徵的統計數值,能客觀表徵企業某一業務活動中的業務狀況。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"算法模型","attrs":{}},{"type":"text","text":"是面向智能分析的場景,通過數學建模對現實世界進行抽象、模擬和仿真,提供支撐業務判斷和決策的高級分析方法。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章