爱奇艺数据仓库平台和服务建设实践

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先介绍一下爱奇艺公司整体的业务情况以及数据仓库1.0的设计和出现的问题,针对数仓1.0的缺陷,是如何演进到数仓2.0架构以及数仓2.0需要解决的问题和需要达成的目标。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/05\/yy\/05ddc1653349e6889ec3a9f1fe6aceyy.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这张图非常清晰的展示了爱奇艺的产品矩阵,早期爱奇艺是视频业务,后来从视频业务周边衍生出来一些新的业务,以视频业务为主围绕着核心IP,衍生出短视频、小视频、奇巴布、爱奇艺阅读、叭哒、泡泡、奇秀直播、爱奇艺知识、体育、电商等众多业务,从苹果树到苹果园构建了泛娱乐生态矩阵。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到产品矩阵中涉及的业务很多,每个业务都会产生自己的数据,同时也有着自己独特的产品形态。既要满足在某个特定业务场景下进行面向业务的数据探查和分析,还要基于跨多个业务场景下,从多个业务共性的角度去提取、淬炼通用的数据,实现跨业务横向的探索分析,从而实现指导业务、对业务进行数据赋能的目标。同时,每个业务之间也会相互辅助、作用,导致每个业务之间会有频繁的数据交互。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"数据仓库 1.0"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c9\/38\/c9e6bdbb601b1c4e16df7fe21598yy38.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"数据仓库1.0的架构图如上,整体分层分为5个部分,最下面是原始数据层,再上面分别是明细层,聚合层以及应用层,右边是面向整个数仓的维度层,用于管理一致性维度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原始数据层用于保存原始数据,数据来源于各种数据生产系统,主要分三个部分:Pingback投递,在每个业务产品进行统一规范的埋点,然后将采集到的数据进行上报,最后通过自动化处理将埋点数据进行解析、存储;业务数据库,主要是业务后端产生的数据,例如会员订单、文学订单等等,经过数据集成的手段将业务库的数据直接同步到原始数据层进行保存;第三方外部数据,主要来自企业外部的数据源。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"明细层用于还原业务过程,保存最细粒度的数据,对原始数据按照不同的模式进行ETL处理,完成数据清洗和部分业务逻辑处理等过程。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"聚合层存放的是非明细的数据,通常是经过各种计算以后得到的轻度聚合和重度聚合数据,主要采用维度建模方法进行构建。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"应用层是为了满足业务需要而产生的结果化数据,具有很强的定制性,主要提供给相关数据应用、外部系统,以及对特定数据有需求的人员使用。是数据仓库和外部的接口,主要对接其他系统,如业务库、报表系统等。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章