hive拉鍊表
拉鍊表優缺點
節省空間,尤其是數據量很大的時候;
對於訂單事務性的數據,查看歷史操作記錄非常方便,比如說需要查看某一個時間點或者時間段的歷史快照信息,查看某一個訂單在歷史某一個時間點的狀態,查看某一個用戶在過去某一段時間內,更新過幾次等等
不過僅適合基於歷史數據更新頻率比較低的場景,如果每天1000w訂單,每天更新1000次以上
Demo嘗試與實現
做一個訂單分析的拉鍊表
準備訂單事務表
CREATE TABLE `orders`(
`orderid` int,
`createtime` string,
`modifiedtime` string,
`status` string)
PARTITIONED BY (
`p_event_date` string)
準備訂單新增表(這個表可以保留久一些便於回算數據,數據回滾)
CREATE TABLE `default.t_ods_orders_inc`(
`orderid` int,
`createtime` string,
`modifiedtime` string,
`status` string)
PARTITIONED BY (
`p_event_date` string)
準備訂單歷史表(這個表保留一段時間即可)
CREATE TABLE `default.t_dw_orders_his`(
`orderid` int,
`createtime` string,
`modifiedtime` string,
`status` string,
`dw_start_date` string,
`dw_end_date` string)
PARTITIONED BY (
`p_event_date` string)
準備訂單數據
>show partitions `default`.`orders`;
p_event_date=2015-08-21
p_event_date=2015-08-22
p_event_date=2015-08-23
>SELECT * from `default`.`orders`;
1 2015-08-18 2015-08-18 創建 2015-08-21
2 2015-08-18 2015-08-18 創建 2015-08-21
3 2015-08-19 2015-08-21 支付 2015-08-21
4 2015-08-19 2015-08-21 完成 2015-08-21
5 2015-08-19 2015-08-20 支付 2015-08-21
6 2015-08-20 2015-08-20 創建 2015-08-21
7 2015-08-20 2015-08-21 支付 2015-08-21
8 2015-08-21 2015-08-21 創建 2015-08-21
1 2015-08-18 2015-08-22 支付 2015-08-22
2 2015-08-18 2015-08-22 完成 2015-08-22
3 2015-08-19 2015-08-21 支付 2015-08-22
4 2015-08-19 2015-08-21 完成 2015-08-22
5 2015-08-19 2015-08-20 支付 2015-08-22
6 2015-08-20 2015-08-22 支付 2015-08-22
7 2015-08-20 2015-08-21 支付 2015-08-22
8 2015-08-21 2015-08-22 支付 2015-08-22
9 2015-08-22 2015-08-22 創建 2015-08-22
10 2015-08-22 2015-08-22 支付 2015-08-22
1 2015-08-18 2015-08-23 完成 2015-08-23
2 2015-08-18 2015-08-22 完成 2015-08-23
3 2015-08-19 2015-08-23 完成 2015-08-23
4 2015-08-19 2015-08-21 完成 2015-08-23
5 2015-08-19 2015-08-23 完成 2015-08-23
6 2015-08-20 2015-08-22 支付 2015-08-23
7 2015-08-20 2015-08-21 支付 2015-08-23
8 2015-08-21 2015-08-23 完成 2015-08-23
9 2015-08-22 2015-08-22 創建 2015-08-23
10 2015-08-22 2015-08-22 支付 2015-08-23
11 2015-08-23 2015-08-23 創建 2015-08-23
12 2015-08-23 2015-08-23 創建 2015-08-23
13 2015-08-23 2015-08-23 支付 2015-08-23
計算訂單新增數據
>INSERT overwrite TABLE t_ods_orders_inc PARTITION (p_event_date="${date}")
SELECT orderid,createtime,modifiedtime,status
FROM orders
WHERE (createtime = "${date}" OR modifiedtime = "${date}") and p_event_date="${date}";
>show partitions `default`.`t_ods_orders_inc`;
p_event_date=2015-08-20
p_event_date=2015-08-21
p_event_date=2015-08-22
>SELECT * from `default`.`t_ods_orders_inc`;
status p_event_date orderid createtime modifiedtime status p_event_date
1 2015-08-18 2015-08-18 創建 2015-08-20
2 2015-08-18 2015-08-18 創建 2015-08-20
3 2015-08-19 2015-08-21 支付 2015-08-20
4 2015-08-19 2015-08-21 完成 2015-08-20
5 2015-08-19 2015-08-20 支付 2015-08-20
6 2015-08-20 2015-08-20 創建 2015-08-20
7 2015-08-20 2015-08-21 支付 2015-08-20
3 2015-08-19 2015-08-21 支付 2015-08-21
4 2015-08-19 2015-08-21 完成 2015-08-21
7 2015-08-20 2015-08-21 支付 2015-08-21
8 2015-08-21 2015-08-21 創建 2015-08-21
1 2015-08-18 2015-08-22 支付 2015-08-22
6 2015-08-20 2015-08-22 支付 2015-08-22
8 2015-08-21 2015-08-22 支付 2015-08-22
2 2015-08-18 2015-08-22 完成 2015-08-22
10 2015-08-22 2015-08-22 支付 2015-08-22
9 2015-08-22 2015-08-22 創建 2015-08-22
根據訂單數據和每日新增數據計算最新一天拉鍊表
(訂單數據以21日開始,21日以前的數據全部看作20日新增數據)
準備20日訂單新增數據
INSERT overwrite TABLE t_ods_orders_inc PARTITION (p_event_date = "2015-08-20")
SELECT orderid,createtime,modifiedtime,status
FROM orders
WHERE createtime <= "2015-08-20";
根據20日訂單新增數據計算20日拉鍊表結果
INSERT overwrite TABLE t_dw_orders_his partition(p_event_date="2020-08-20")
SELECT orderid,createtime,modifiedtime,status,
createtime AS dw_start_date,
"9999-12-31" AS dw_end_date
FROM t_ods_orders_inc
WHERE p_event_date = "2020-08-20";
根據21日新增數據計算21日拉鍊表結果,union all 上半部分判斷昨日拉鍊表的數據是否需要更新有效期(9999-12-31爲當前有效數據)union all下半部分即合併今日新增數據並全部設置爲當前有效狀態
INSERT overwrite TABLE t_dw_orders_his partition(p_event_date="2020-08-21")
SELECT
orderid,
createtime,
modifiedtime,
status,
dw_start_date,
dw_end_date
FROM (
select
a.orderid,
a.createtime,
a.modifiedtime,
a.status,
a.dw_start_date,
case
when b.orderid is not null and a.dw_end_date>"2015-08-21" then "2015-08-20"
else a.dw_end_date
end as dw_end_date
from ( select * from t_dw_orders_his where p_event_date="2015-08-20")a
left join (select * from t_ods_orders_inc where p_event_date="2015-08-21")b
on (a.orderid=b.orderid)
union all
select
orderid,
createtime,
modifiedtime,
status,
modifiedtime AS dw_start_date,
'9999-12-31' AS dw_end_date
from t_ods_orders_inc where p_event_date="2015-08-21"
)c
order by orderid,dw_start_date;
>SELECT * from t_dw_orders_his WHERE p_event_date="2015-08-21"
1 2015-08-18 2015-08-18 創建 2015-08-18 9999-12-31 2015-08-21
2 2015-08-18 2015-08-18 創建 2015-08-18 9999-12-31 2015-08-21
3 2015-08-19 2015-08-21 支付 2015-08-19 2015-08-20 2015-08-21
3 2015-08-19 2015-08-21 支付 2015-08-21 9999-12-31 2015-08-21
4 2015-08-19 2015-08-21 完成 2015-08-19 2015-08-20 2015-08-21
4 2015-08-19 2015-08-21 完成 2015-08-21 9999-12-31 2015-08-21
5 2015-08-19 2015-08-20 支付 2015-08-19 9999-12-31 2015-08-21
6 2015-08-20 2015-08-20 創建 2015-08-20 9999-12-31 2015-08-21
7 2015-08-20 2015-08-21 支付 2015-08-20 2015-08-20 2015-08-21
7 2015-08-20 2015-08-21 支付 2015-08-21 9999-12-31 2015-08-21
8 2015-08-21 2015-08-21 創建 2015-08-21 9999-12-31 2015-08-21
同上,根據22日訂單新增數據計算22日拉鍊表結果
INSERT overwrite TABLE t_dw_orders_his partition(p_event_date="2020-08-22")
SELECT
orderid,
createtime,
modifiedtime,
status,
dw_start_date,
dw_end_date
FROM (
select
a.orderid,
a.createtime,
a.modifiedtime,
a.status,
a.dw_start_date,
case
when b.orderid is not null and a.dw_end_date>"2015-08-22" then "2015-08-21"
else a.dw_end_date
end as dw_end_date
from ( select * from t_dw_orders_his where p_event_date="2015-08-21")a
left join (select * from t_ods_orders_inc where p_event_date="2015-08-22")b
on (a.orderid=b.orderid)
union all
select
orderid,
createtime,
modifiedtime,
status,
modifiedtime AS dw_start_date,
'9999-12-31' AS dw_end_date
from t_ods_orders_inc where p_event_date="2015-08-22"
)c
order by orderid,dw_start_date;
>SELECT * from t_dw_orders_his WHERE p_event_date="2015-08-22"
1 2015-08-18 2015-08-18 創建 2015-08-18 2015-08-21 2015-08-22
1 2015-08-18 2015-08-22 支付 2015-08-22 9999-12-31 2015-08-22
2 2015-08-18 2015-08-18 創建 2015-08-18 2015-08-21 2015-08-22
2 2015-08-18 2015-08-22 完成 2015-08-22 9999-12-31 2015-08-22
3 2015-08-19 2015-08-21 支付 2015-08-19 2015-08-20 2015-08-22
3 2015-08-19 2015-08-21 支付 2015-08-21 9999-12-31 2015-08-22
4 2015-08-19 2015-08-21 完成 2015-08-19 2015-08-20 2015-08-22
4 2015-08-19 2015-08-21 完成 2015-08-21 9999-12-31 2015-08-22
5 2015-08-19 2015-08-20 支付 2015-08-19 9999-12-31 2015-08-22
6 2015-08-20 2015-08-20 創建 2015-08-20 2015-08-21 2015-08-22
6 2015-08-20 2015-08-22 支付 2015-08-22 9999-12-31 2015-08-22
7 2015-08-20 2015-08-21 支付 2015-08-20 2015-08-20 2015-08-22
7 2015-08-20 2015-08-21 支付 2015-08-21 9999-12-31 2015-08-22
8 2015-08-21 2015-08-21 創建 2015-08-21 2015-08-21 2015-08-22
8 2015-08-21 2015-08-22 支付 2015-08-22 9999-12-31 2015-08-22
9 2015-08-22 2015-08-22 創建 2015-08-22 9999-12-31 2015-08-22
10 2015-08-22 2015-08-22 支付 2015-08-22 9999-12-31 2015-08-22
同上,根據23日訂單新增數據計算23日拉鍊表結果.....就這麼個意思