所謂拉鍊,就是記錄歷史。記錄一個事物從開始,一直到當前狀態的所有變化的信息。
在歷史表中對客戶的一生的記錄可能就這樣幾條記錄,避免了按每一天記錄客戶狀態造成的海量存儲的問題:
(NAME)人名 (START-DATE)開始日期 (END-DT)結束日期 (STAT)狀態
client 19000101 19070901 H在家
client 19070901 19130901 A小學
client 19130901 19160901 B初中
client 19160901 19190901 C高中
client 19190901 19230901 D大學
client 19230901 19601231 E公司
client 19601231 29991231 H退休在家
上面的每一條記錄都是不算末尾的,比如到19070901,client已經在A,而不是H了。所以除最後一條記錄因爲狀態到目前都未改變的,其餘的記錄實際上在END-DT那天,都不在是該條記錄END-DT那天的狀態。這種現象可以理解爲算頭不算尾。
算法:(拉鍊表算法其實就是以前遇到過的緩慢變化維的其中一種情況,用存儲過程實現的話稍微麻煩點。)
1採集當日全量數據到ND(NewDay)表;
2可從歷史表中取出昨日全量數據存儲到OD(OldDay)表;
3(ND-OD)就是當日新增和變化的數據,也就是當天的增量,用W_I表示;
4(OD-ND)爲狀態到此結束需要封鏈的數據,用W_U表示;
5將W_I表的內容全部插入到歷史表中,這些是新增記錄,start_date爲當天,而end_date爲max值;
6對歷史表進行W_U部份的更新操作,start_date保持不變,而end_date改爲當天,也就是關鏈操作;
下面爲具體例子:
-
OD(在第一天就等於HIS)
-
用戶標誌 狀態 開始時間 結束時間
-
1 1 200712 299901
-
2 2 200712 299901
-
3 3 200712 299901
-
4 4 200712 299901
-
5 5 200712 299901
-
-
ND
-
用戶標誌 狀態 開始時間 結束時間
-
1 2 200801 299901
-
2 2 200801 299901
-
3 4 200801 299901
-
4 4 200801 299901
-
5 6 200801 299901
-
-
W_I=ND-OD ( 將W_I表的內容全部插入到歷史表中,這些是新增記錄 )
-
用戶標誌 狀態 開始時間 結束時間
-
1 2 200801 299901
-
3 4 200801 299901
-
5 6 200801 299901
-
-
W_U=OD-ND ( 對歷史表進行W_U部份的更新操作,start_date保持不變,而end_date改爲當天 )
-
用戶標誌 狀態 開始時間 結束時間
-
1 1 200712 299901
-
3 3 200712 299901
-
5 5 200712 299901
-
-
INSERT操作把I插入到HIS
-
用戶標誌 狀態 開始時間 結束時間
-
1 1 200712 299901
-
2 2 200712 299901
-
3 3 200712 299901
-
4 4 200712 299901
-
5 5 200712 299901
-
1 2 200801 299901
-
3 4 200801 299901
-
5 6 200801 299901
-
</span>
-
update操作按U更新HIS
-
用戶標誌 狀態 開始時間 結束時間
-
1 1 200712 200801
-
2 2 200712 299901
-
3 3 200712 200801
-
4 4 200712 299901
-
5 5 200712 200801
-
1 2 200801 299901
-
3 4 200801 299901
-
5 6 200801 299901
轉載 :http://blog.csdn.NET/paopaomm/article/details/7491400
另一個操作SQL的例子
-
一個實際例子(teradata)
-
1、定義兩個臨時表,一個爲當日全量數據,另一個爲需要新增或更新的數據;
-
CREATE VOLATILE TABLE VT_xxxx_NEW AS xxxx WITH NO DATA ON COMMIT PRESERVE ROWS;
-
CREATE VOLATILE SET TABLE VT_xxxx_CHG,NO LOG AS xxxx WITH NO DATA ON COMMIT PRESERVE ROWS;
-
-
-
-
2、獲取當日全量數據
-
INSERT INTO VT_xxxx_NEW(xx) SELECT (xx,cur_date, max_date) FROM xxxx_sorce; ND
-
-
-
-
3、抽取新增或有變化的數據,從xxxx_NEW臨時表到xxxx_CHG臨時表;
-
INSERT INTO VT_xxxx_CHG(xx)
-
SELECT xx FROM VT_xxxx_NEW
-
WHERE (xx) NOT IN (select xx from xxxx_HIS where end_date='max_date');
-
-
-
-
4、更新歷史表的失效記錄的end_date爲max值
-
UPDATE A1 FROM xxxx_HIS A1, VT_xxxx_CHG A2
-
SET End_Date='current_date'
-
WHERE A1.xx=A2.xx AND A1.End_Date='max_date';
-
5、將新增或者有變化的數據插入目標表*/
-
INSERT INTO xxxx_HIS SELECT * FROM VT_xxxx_CHG;
自己編寫的例子:
-
/**拉鍊表: 也就是一個 記錄歷史 表,用於記錄事物從 最開始的狀態 到 當前狀態 所有變化的信息 */
-
select * from emp ;
-
-
-
DROP TABLE old_tb_his;
-
drop table new_tb;
-
-
create table old_tb_his(
-
id number(10,0),
-
status varchar2(20),
-
start_date varchar2(20),
-
end_date varchar2(20)
-
);
-
-
insert into old_tb_his values(1,'1', '200712' , '299901');
-
insert into old_tb_his values(2,'2', '200712' , '299901');
-
insert into old_tb_his values(3,'3', '200712' , '299901');
-
insert into old_tb_his values(4,'4', '200712' , '299901');
-
insert into old_tb_his values(5,'5', '200712' , '299901');
-
-
COMMIT;
-
select * from old_tb_his;
-
-
-
CREATE TABLE NEW_TB AS SELECT * FROM old_tb_his WHERE 2 =1 ;
-
insert into NEW_TB values(1,'2', '200801' , '299901');
-
insert into NEW_TB values(2,'2', '200801' , '299901');
-
insert into NEW_TB values(3,'4', '200801' , '299901');
-
insert into NEW_TB values(4,'4', '200801' , '299901');
-
insert into NEW_TB values(5,'6', '200801' , '299901');
-
COMMIT;
-
-
SELECT * FROM NEW_TB;
-
-
/*
-
merge into old_tb_his
-
using NEW_TB
-
on (old_tb_his.id = NEW_TB.id and old_tb_his.status = new_tb.status )
-
when matched then update set old_tb_his.end_date = NEW_TB.start_date
-
when not matched then insert values(NEW_TB.id, NEW_TB.status, NEW_TB.start_date,NEW_TB.end_date);
-
*/
-
-
/**用不了 這個函數是匹配就更新 不匹配添加
-
而拉鍊算法可以看作是 不匹配的更新 不匹配的也添加
-
-
merge into old_tb_his
-
using NEW_TB
-
on (old_tb_his.id = NEW_TB.id and old_tb_his.status = new_tb.status )
-
when not matched then update set old_tb_his.end_date = NEW_TB.start_date ;
-
-
*/
-
-
-
-
select * from old_tb_his;
-
SELECT * FROM NEW_TB;
-
-
-
-
-
CREATE GLOBAL TEMPORARY TABLE old_tb_his_temp
-
(
-
id number(10,0),
-
status varchar2(20),
-
start_date varchar2(20),
-
end_date varchar2(20)
-
)
-
ON COMMIT DELETE ROWS ;
-
-
-
CREATE GLOBAL TEMPORARY TABLE new_tb_temp
-
(
-
id number(10,0),
-
status varchar2(20),
-
start_date varchar2(20),
-
end_date varchar2(20)
-
)
-
ON COMMIT DELETE ROWS ;
-
-
-
-
insert into old_tb_his_temp
-
select *
-
from new_tb t
-
where t.id not in (select id
-
from (select t1.id, t1.status, t1.end_date
-
from old_tb_his t1
-
intersect
-
select t2.id, t2.status, t2.end_date
-
from new_tb t2));
-
-
-
-
-
insert into new_tb_temp
-
select *
-
from old_tb_his t
-
where t.id not in (select id
-
from (select t1.id, t1.status, t1.end_date
-
from old_tb_his t1
-
intersect
-
select t2.id, t2.status, t2.end_date
-
from new_tb t2));
-
-
-
select * from old_tb_his_temp;
-
-
select * from new_tb_temp;
-
-
-
-
commit;
-
-
-
-
-
INSERT INTO old_tb_his
-
SELECT * FROM old_tb_his_temp ;
-
-
-
select * from old_tb_his ;
-
-
-
-
merge into old_tb_his
-
using old_tb_his_temp on (old_tb_his.id = old_tb_his_temp.id and old_tb_his.status <> old_tb_his_temp.status )
-
when matched then update set old_tb_his.end_date = old_tb_his_temp.start_date ;
-
-
-
update old_tb_his
-
set old_tb_his.end_date = (select old_tb_his_temp.start_date from old_tb_his_temp where old_tb_his_temp.id = old_tb_his.id)
-
where exists(
-
select 1 from old_tb_his_temp where old_tb_his.id = old_tb_his_temp.id
-
and old_tb_his.status <> old_tb_his_temp.status
-
)
-
-
commit;
-
-
-
select * from old_tb_his
-
-
-
select * from emp;
-
-
-
-
update emp set empno = 7777 where ename = upper('smith') ;
-
-
-
merge into t2
-
using t1 on (t2.id = t1.id and t2.status <> t1.status )
-
when matched then update set t2.end_date = t1.start_date ;
-
-
update t2
-
set t2.end_date = (select t1.start_date from t1 where t1.id = t2.id)
-
where exists(
-
select 1 from t1 where t2.id = t1.id
-
and t2.status <> t1.status
-
)