數倉緩慢變化維深層講解

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前言","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 維度緩慢變化爲","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"SCD","attrs":{}}],"attrs":{}},{"type":"text","text":"(Slowly Changing Dimensions)","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"一些維度表的數據不是靜態的,而是會隨着時間而緩慢地變化","attrs":{}}],"attrs":{}},{"type":"text","text":"(這裏的緩慢是相對事實表而言,事實表數據變化的速度比維度錶快,如果還不知道什麼是事實表和維度表請看→","attrs":{}},{"type":"link","attrs":{"href":"https://blog.csdn.net/qq_43791724/article/details/112120563","title":""},"content":[{"type":"text","text":"數倉模型設計詳細講解","attrs":{}}]},{"type":"text","text":")把處理維度表數據歷史變化的問題,稱爲緩慢變化維問題,簡稱SCD問題。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ae/ae2c93a87b2ff682a48cfd1931c5b905.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"舉例說明","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 例如:用根據用戶維度,統計不同出生年份的消費金額佔比。(80後、90後、00後)。而期間,用戶可能去","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"修改","attrs":{}}],"attrs":{}},{"type":"text","text":"用戶數據,例如:","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"將出生日期改成了 1992年。此時,用戶維度表就發生了變化","attrs":{}}],"attrs":{}},{"type":"text","text":"。當然這個變化相對事實表的變換要慢。但這個用戶維度表的變化,就是緩慢變化維。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7b/7be69223aacbc6c643ed7896d62edda1.png","alt":"在這裏插入圖片描述","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個用戶的數據不是一直不變,而是有可能發生變化。例如:用戶修改了出生日期、或者用戶修改了住址。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":" 一、SCD問題的幾種解決方案","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下爲解決緩慢變化維問題的幾種辦法:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"保留原始值","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"改寫屬性值","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"增加維度新行","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"增加維度新列","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"添加歷史表","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.1 保留原始值","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"某一個屬性值絕不會變化。事實表始終按照該原始值進行分組。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如:出生日期的數據,始終按照用戶第一次填寫的數據爲準","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.2 改變屬性值","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對其相應需要","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"重寫維度行中的舊值,以當前值替換。因此其始終反映最近的情況","attrs":{}}],"attrs":{}},{"type":"text","text":"。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當一個維度值的數據源發生變化,並且不需要在維度表中保留變化歷史時,通常用新數據來覆蓋舊數據。這樣的處理使屬性所反映的中是最新的賦值。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶維度表","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"修改前:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/64/6403936876eb77dd591636f585f0eb3e.png","alt":"在這裏插入圖片描述","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"修改後:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2d/2dd7425ab32008f9ffeb708e98e61fad.png","alt":"在這裏插入圖片描述","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種方法有個前提,用戶不關心這個數據的變化","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣處理,易於實現,但是","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"沒有保留歷史數據,無法分析歷史變化信息","attrs":{}}],"attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.3 增加維度新行","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"數據倉庫系統的目標之一是正確地表示歷史","attrs":{}}],"attrs":{}},{"type":"text","text":"。典型代表就是","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"拉鍊表","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"保留歷史的數據,並插入新的數據","attrs":{}}],"attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶維度表","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"修改前:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bd/bdd8fc6d5716b2f7070df7418bb7b01e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"修改後:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/be/bed548c046fb61803b51546071f52bd2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.4 增加維度新列","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"用不同的字段來保存不同的值,就是在表中增加一個字段,這個字段用來保存變化後的當前值","attrs":{}}],"attrs":{}},{"type":"text","text":",而原來的值則被稱爲變化前的值。總的來說,這種方法通過添加字段來保存變化後的痕跡。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶維度表","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"修改前:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bd/bdd8fc6d5716b2f7070df7418bb7b01e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"修改後","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a0/a0be88fad7d2d78109402cb14ec07447.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.5 使用歷史表","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"另外建一個表來保存歷史記錄","attrs":{}}],"attrs":{}},{"type":"text","text":",這種方式就是將歷史數據與當前數據完全分開來,在維度中只保存當前最新的數據。 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"用戶維度表","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b6/b6d8dd01bd2614e78ad83ab2f0b4de21.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"用戶維度歷史表","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/99/99d5e58ceb3297fc100441c6bfa05972.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種方式的優點是可以同時分析當前及前一次變化的屬性值,缺點是隻保留了最後一次變化信息。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"思考","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 我在這裏給大家提個場景題,比如我們在淘寶上夠買了一件商品,從下單-支付-發貨-配送-確認收貨這個幾步流。需求:統計出在發送到配置過程中轉了幾次?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"小結:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 今天給大家分享了SCD解決方案,但是其實以上的解決方案不是很好,其實數倉有一個非常好的解決緩慢變化維","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"拉鍊表","attrs":{}}],"attrs":{}},{"type":"text","text":"既保留了歷史數據又不會造成數據冗餘,拉鍊表我們下期講。信自己,努力和汗水總會能得到回報的。我是大數據老哥,我們下期見~~~","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"獲取Flink面試題,Spark面試題,程序員必備軟件,hive面試題,Hadoop面試題,Docker面試題,\n簡歷模板等資源請去GitHub自行下載 https://github.com/lhh2002/Framework-Of-BigData","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章