数仓缓慢变化维深层讲解

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前言","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 维度缓慢变化为","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"SCD","attrs":{}}],"attrs":{}},{"type":"text","text":"(Slowly Changing Dimensions)","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"一些维度表的数据不是静态的,而是会随着时间而缓慢地变化","attrs":{}}],"attrs":{}},{"type":"text","text":"(这里的缓慢是相对事实表而言,事实表数据变化的速度比维度表快,如果还不知道什么是事实表和维度表请看→","attrs":{}},{"type":"link","attrs":{"href":"https://blog.csdn.net/qq_43791724/article/details/112120563","title":""},"content":[{"type":"text","text":"数仓模型设计详细讲解","attrs":{}}]},{"type":"text","text":")把处理维度表数据历史变化的问题,称为缓慢变化维问题,简称SCD问题。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ae/ae2c93a87b2ff682a48cfd1931c5b905.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"举例说明","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 例如:用根据用户维度,统计不同出生年份的消费金额占比。(80后、90后、00后)。而期间,用户可能去","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"修改","attrs":{}}],"attrs":{}},{"type":"text","text":"用户数据,例如:","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"将出生日期改成了 1992年。此时,用户维度表就发生了变化","attrs":{}}],"attrs":{}},{"type":"text","text":"。当然这个变化相对事实表的变换要慢。但这个用户维度表的变化,就是缓慢变化维。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7b/7be69223aacbc6c643ed7896d62edda1.png","alt":"在这里插入图片描述","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这个用户的数据不是一直不变,而是有可能发生变化。例如:用户修改了出生日期、或者用户修改了住址。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":" 一、SCD问题的几种解决方案","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下为解决缓慢变化维问题的几种办法:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"保留原始值","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"改写属性值","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"增加维度新行","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"增加维度新列","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"添加历史表","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.1 保留原始值","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"某一个属性值绝不会变化。事实表始终按照该原始值进行分组。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如:出生日期的数据,始终按照用户第一次填写的数据为准","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.2 改变属性值","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"对其相应需要","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"重写维度行中的旧值,以当前值替换。因此其始终反映最近的情况","attrs":{}}],"attrs":{}},{"type":"text","text":"。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当一个维度值的数据源发生变化,并且不需要在维度表中保留变化历史时,通常用新数据来覆盖旧数据。这样的处理使属性所反映的中是最新的赋值。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用户维度表","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"修改前:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/64/6403936876eb77dd591636f585f0eb3e.png","alt":"在这里插入图片描述","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"修改后:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2d/2dd7425ab32008f9ffeb708e98e61fad.png","alt":"在这里插入图片描述","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这种方法有个前提,用户不关心这个数据的变化","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这样处理,易于实现,但是","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"没有保留历史数据,无法分析历史变化信息","attrs":{}}],"attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.3 增加维度新行","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"数据仓库系统的目标之一是正确地表示历史","attrs":{}}],"attrs":{}},{"type":"text","text":"。典型代表就是","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"拉链表","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"保留历史的数据,并插入新的数据","attrs":{}}],"attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用户维度表","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"修改前:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bd/bdd8fc6d5716b2f7070df7418bb7b01e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"修改后:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/be/bed548c046fb61803b51546071f52bd2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.4 增加维度新列","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"用不同的字段来保存不同的值,就是在表中增加一个字段,这个字段用来保存变化后的当前值","attrs":{}}],"attrs":{}},{"type":"text","text":",而原来的值则被称为变化前的值。总的来说,这种方法通过添加字段来保存变化后的痕迹。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用户维度表","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"修改前:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bd/bdd8fc6d5716b2f7070df7418bb7b01e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"修改后","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a0/a0be88fad7d2d78109402cb14ec07447.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.5 使用历史表","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"另外建一个表来保存历史记录","attrs":{}}],"attrs":{}},{"type":"text","text":",这种方式就是将历史数据与当前数据完全分开来,在维度中只保存当前最新的数据。 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"用户维度表","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b6/b6d8dd01bd2614e78ad83ab2f0b4de21.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"用户维度历史表","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/99/99d5e58ceb3297fc100441c6bfa05972.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这种方式的优点是可以同时分析当前及前一次变化的属性值,缺点是只保留了最后一次变化信息。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"思考","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 我在这里给大家提个场景题,比如我们在淘宝上够买了一件商品,从下单-支付-发货-配送-确认收货这个几步流。需求:统计出在发送到配置过程中转了几次?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"小结:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 今天给大家分享了SCD解决方案,但是其实以上的解决方案不是很好,其实数仓有一个非常好的解决缓慢变化维","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"拉链表","attrs":{}}],"attrs":{}},{"type":"text","text":"既保留了历史数据又不会造成数据冗余,拉链表我们下期讲。信自己,努力和汗水总会能得到回报的。我是大数据老哥,我们下期见~~~","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"获取Flink面试题,Spark面试题,程序员必备软件,hive面试题,Hadoop面试题,Docker面试题,\n简历模板等资源请去GitHub自行下载 https://github.com/lhh2002/Framework-Of-BigData","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章