愛奇藝數據倉庫平臺和服務建設實踐

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先介紹一下愛奇藝公司整體的業務情況以及數據倉庫1.0的設計和出現的問題,針對數倉1.0的缺陷,是如何演進到數倉2.0架構以及數倉2.0需要解決的問題和需要達成的目標。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/05\/yy\/05ddc1653349e6889ec3a9f1fe6aceyy.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這張圖非常清晰的展示了愛奇藝的產品矩陣,早期愛奇藝是視頻業務,後來從視頻業務周邊衍生出來一些新的業務,以視頻業務爲主圍繞着核心IP,衍生出短視頻、小視頻、奇巴布、愛奇藝閱讀、叭噠、泡泡、奇秀直播、愛奇藝知識、體育、電商等衆多業務,從蘋果樹到蘋果園構建了泛娛樂生態矩陣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到產品矩陣中涉及的業務很多,每個業務都會產生自己的數據,同時也有着自己獨特的產品形態。既要滿足在某個特定業務場景下進行面向業務的數據探查和分析,還要基於跨多個業務場景下,從多個業務共性的角度去提取、淬鍊通用的數據,實現跨業務橫向的探索分析,從而實現指導業務、對業務進行數據賦能的目標。同時,每個業務之間也會相互輔助、作用,導致每個業務之間會有頻繁的數據交互。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"數據倉庫 1.0"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c9\/38\/c9e6bdbb601b1c4e16df7fe21598yy38.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據倉庫1.0的架構圖如上,整體分層分爲5個部分,最下面是原始數據層,再上面分別是明細層,聚合層以及應用層,右邊是面向整個數倉的維度層,用於管理一致性維度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原始數據層用於保存原始數據,數據來源於各種數據生產系統,主要分三個部分:Pingback投遞,在每個業務產品進行統一規範的埋點,然後將採集到的數據進行上報,最後通過自動化處理將埋點數據進行解析、存儲;業務數據庫,主要是業務後端產生的數據,例如會員訂單、文學訂單等等,經過數據集成的手段將業務庫的數據直接同步到原始數據層進行保存;第三方外部數據,主要來自企業外部的數據源。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"明細層用於還原業務過程,保存最細粒度的數據,對原始數據按照不同的模式進行ETL處理,完成數據清洗和部分業務邏輯處理等過程。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"聚合層存放的是非明細的數據,通常是經過各種計算以後得到的輕度聚合和重度聚合數據,主要採用維度建模方法進行構建。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用層是爲了滿足業務需要而產生的結果化數據,具有很強的定製性,主要提供給相關數據應用、外部系統,以及對特定數據有需求的人員使用。是數據倉庫和外部的接口,主要對接其他系統,如業務庫、報表系統等。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章