hive拉鍊表與拉鍊表簡單實現

hive拉鍊表

拉鍊表優缺點

節省空間,尤其是數據量很大的時候;

對於訂單事務性的數據,查看歷史操作記錄非常方便,比如說需要查看某一個時間點或者時間段的歷史快照信息,查看某一個訂單在歷史某一個時間點的狀態,查看某一個用戶在過去某一段時間內,更新過幾次等等

不過僅適合基於歷史數據更新頻率比較低的場景,如果每天1000w訂單,每天更新1000次以上

Demo嘗試與實現

做一個訂單分析的拉鍊表

準備訂單事務表

CREATE TABLE `orders`(
  `orderid` int, 
  `createtime` string, 
  `modifiedtime` string, 
  `status` string)
PARTITIONED BY ( 
  `p_event_date` string)

準備訂單新增表(這個表可以保留久一些便於回算數據,數據回滾)

CREATE TABLE `default.t_ods_orders_inc`(
  `orderid` int, 
  `createtime` string, 
  `modifiedtime` string, 
  `status` string)
PARTITIONED BY ( 
  `p_event_date` string)

準備訂單歷史表(這個表保留一段時間即可)

CREATE TABLE `default.t_dw_orders_his`(
  `orderid` int, 
  `createtime` string, 
  `modifiedtime` string, 
  `status` string, 
  `dw_start_date` string, 
  `dw_end_date` string)
PARTITIONED BY ( 
  `p_event_date` string)

準備訂單數據

>show partitions `default`.`orders`;
p_event_date=2015-08-21
p_event_date=2015-08-22
p_event_date=2015-08-23
>SELECT * from  `default`.`orders`;
1	2015-08-18	2015-08-18	創建	2015-08-21
2	2015-08-18	2015-08-18	創建	2015-08-21
3	2015-08-19	2015-08-21	支付	2015-08-21
4	2015-08-19	2015-08-21	完成	2015-08-21
5	2015-08-19	2015-08-20	支付	2015-08-21
6	2015-08-20	2015-08-20	創建	2015-08-21
7	2015-08-20	2015-08-21	支付	2015-08-21
8	2015-08-21	2015-08-21	創建	2015-08-21
1	2015-08-18	2015-08-22	支付	2015-08-22
2	2015-08-18	2015-08-22	完成	2015-08-22
3	2015-08-19	2015-08-21	支付	2015-08-22
4	2015-08-19	2015-08-21	完成	2015-08-22
5	2015-08-19	2015-08-20	支付	2015-08-22
6	2015-08-20	2015-08-22	支付	2015-08-22
7	2015-08-20	2015-08-21	支付	2015-08-22
8	2015-08-21	2015-08-22	支付	2015-08-22
9	2015-08-22	2015-08-22	創建	2015-08-22
10	2015-08-22	2015-08-22	支付	2015-08-22
1	2015-08-18	2015-08-23	完成	2015-08-23
2	2015-08-18	2015-08-22	完成	2015-08-23
3	2015-08-19	2015-08-23	完成	2015-08-23
4	2015-08-19	2015-08-21	完成	2015-08-23
5	2015-08-19	2015-08-23	完成	2015-08-23
6	2015-08-20	2015-08-22	支付	2015-08-23
7	2015-08-20	2015-08-21	支付	2015-08-23
8	2015-08-21	2015-08-23	完成	2015-08-23
9	2015-08-22	2015-08-22	創建	2015-08-23
10	2015-08-22	2015-08-22	支付	2015-08-23
11	2015-08-23	2015-08-23	創建	2015-08-23
12	2015-08-23	2015-08-23	創建	2015-08-23
13	2015-08-23	2015-08-23	支付	2015-08-23

計算訂單新增數據

>INSERT overwrite TABLE t_ods_orders_inc PARTITION (p_event_date="${date}")
SELECT orderid,createtime,modifiedtime,status
FROM orders
WHERE (createtime = "${date}" OR modifiedtime = "${date}") and p_event_date="${date}";

>show partitions  `default`.`t_ods_orders_inc`;
p_event_date=2015-08-20
p_event_date=2015-08-21
p_event_date=2015-08-22
>SELECT * from   `default`.`t_ods_orders_inc`;
status p_event_date orderid createtime modifiedtime status p_event_date
1	2015-08-18	2015-08-18	創建	2015-08-20
2	2015-08-18	2015-08-18	創建	2015-08-20
3	2015-08-19	2015-08-21	支付	2015-08-20
4	2015-08-19	2015-08-21	完成	2015-08-20
5	2015-08-19	2015-08-20	支付	2015-08-20
6	2015-08-20	2015-08-20	創建	2015-08-20
7	2015-08-20	2015-08-21	支付	2015-08-20
3	2015-08-19	2015-08-21	支付	2015-08-21
4	2015-08-19	2015-08-21	完成	2015-08-21
7	2015-08-20	2015-08-21	支付	2015-08-21
8	2015-08-21	2015-08-21	創建	2015-08-21
1	2015-08-18	2015-08-22	支付	2015-08-22
6	2015-08-20	2015-08-22	支付	2015-08-22
8	2015-08-21	2015-08-22	支付	2015-08-22
2	2015-08-18	2015-08-22	完成	2015-08-22
10	2015-08-22	2015-08-22	支付	2015-08-22
9	2015-08-22	2015-08-22	創建	2015-08-22

根據訂單數據和每日新增數據計算最新一天拉鍊表

(訂單數據以21日開始,21日以前的數據全部看作20日新增數據)

準備20日訂單新增數據

INSERT overwrite TABLE t_ods_orders_inc PARTITION (p_event_date = "2015-08-20")
SELECT orderid,createtime,modifiedtime,status
FROM orders
WHERE createtime <= "2015-08-20";

根據20日訂單新增數據計算20日拉鍊表結果

INSERT overwrite TABLE t_dw_orders_his partition(p_event_date="2020-08-20")
SELECT orderid,createtime,modifiedtime,status,
createtime AS dw_start_date,
"9999-12-31" AS dw_end_date
FROM t_ods_orders_inc
WHERE p_event_date = "2020-08-20";

根據21日新增數據計算21日拉鍊表結果,union all 上半部分判斷昨日拉鍊表的數據是否需要更新有效期(9999-12-31爲當前有效數據)union all下半部分即合併今日新增數據並全部設置爲當前有效狀態

INSERT overwrite TABLE t_dw_orders_his partition(p_event_date="2020-08-21")	
SELECT 
	orderid,
	createtime,
	modifiedtime,
	status,
	dw_start_date,
	dw_end_date
FROM (
	select 
		a.orderid,
		a.createtime,
		a.modifiedtime,
		a.status,
		a.dw_start_date,
		case 
			when b.orderid is not null and a.dw_end_date>"2015-08-21" then "2015-08-20"
			else a.dw_end_date
			end as  dw_end_date 
		from ( select * from t_dw_orders_his where p_event_date="2015-08-20")a
		left join (select * from t_ods_orders_inc where p_event_date="2015-08-21")b
		on (a.orderid=b.orderid)
		union all
		select 
		orderid,
		createtime,
		modifiedtime,
		status,
		 modifiedtime AS dw_start_date,
		'9999-12-31' AS dw_end_date 
		from t_ods_orders_inc where p_event_date="2015-08-21"
		)c
order by orderid,dw_start_date;
>SELECT * from  t_dw_orders_his WHERE p_event_date="2015-08-21"
1	2015-08-18	2015-08-18	創建	2015-08-18	9999-12-31	2015-08-21
2	2015-08-18	2015-08-18	創建	2015-08-18	9999-12-31	2015-08-21
3	2015-08-19	2015-08-21	支付	2015-08-19	2015-08-20	2015-08-21
3	2015-08-19	2015-08-21	支付	2015-08-21	9999-12-31	2015-08-21
4	2015-08-19	2015-08-21	完成	2015-08-19	2015-08-20	2015-08-21
4	2015-08-19	2015-08-21	完成	2015-08-21	9999-12-31	2015-08-21
5	2015-08-19	2015-08-20	支付	2015-08-19	9999-12-31	2015-08-21
6	2015-08-20	2015-08-20	創建	2015-08-20	9999-12-31	2015-08-21
7	2015-08-20	2015-08-21	支付	2015-08-20	2015-08-20	2015-08-21
7	2015-08-20	2015-08-21	支付	2015-08-21	9999-12-31	2015-08-21
8	2015-08-21	2015-08-21	創建	2015-08-21	9999-12-31	2015-08-21

同上,根據22日訂單新增數據計算22日拉鍊表結果

INSERT overwrite TABLE t_dw_orders_his partition(p_event_date="2020-08-22")	
SELECT 
	orderid,
	createtime,
	modifiedtime,
	status,
	dw_start_date,
	dw_end_date
FROM (
	select 
		a.orderid,
		a.createtime,
		a.modifiedtime,
		a.status,
		a.dw_start_date,
		case 
			when b.orderid is not null and a.dw_end_date>"2015-08-22" then "2015-08-21"
			else a.dw_end_date
			end as  dw_end_date 
		from ( select * from t_dw_orders_his where p_event_date="2015-08-21")a
		left join (select * from t_ods_orders_inc where p_event_date="2015-08-22")b
		on (a.orderid=b.orderid)
		union all
		select 
		orderid,
		createtime,
		modifiedtime,
		status,
		 modifiedtime AS dw_start_date,
		'9999-12-31' AS dw_end_date 
		from t_ods_orders_inc where p_event_date="2015-08-22"
		)c
order by orderid,dw_start_date;
>SELECT * from  t_dw_orders_his WHERE p_event_date="2015-08-22"
1	2015-08-18	2015-08-18	創建	2015-08-18	2015-08-21	2015-08-22
1	2015-08-18	2015-08-22	支付	2015-08-22	9999-12-31	2015-08-22
2	2015-08-18	2015-08-18	創建	2015-08-18	2015-08-21	2015-08-22
2	2015-08-18	2015-08-22	完成	2015-08-22	9999-12-31	2015-08-22
3	2015-08-19	2015-08-21	支付	2015-08-19	2015-08-20	2015-08-22
3	2015-08-19	2015-08-21	支付	2015-08-21	9999-12-31	2015-08-22
4	2015-08-19	2015-08-21	完成	2015-08-19	2015-08-20	2015-08-22
4	2015-08-19	2015-08-21	完成	2015-08-21	9999-12-31	2015-08-22
5	2015-08-19	2015-08-20	支付	2015-08-19	9999-12-31	2015-08-22
6	2015-08-20	2015-08-20	創建	2015-08-20	2015-08-21	2015-08-22
6	2015-08-20	2015-08-22	支付	2015-08-22	9999-12-31	2015-08-22
7	2015-08-20	2015-08-21	支付	2015-08-20	2015-08-20	2015-08-22
7	2015-08-20	2015-08-21	支付	2015-08-21	9999-12-31	2015-08-22
8	2015-08-21	2015-08-21	創建	2015-08-21	2015-08-21	2015-08-22
8	2015-08-21	2015-08-22	支付	2015-08-22	9999-12-31	2015-08-22
9	2015-08-22	2015-08-22	創建	2015-08-22	9999-12-31	2015-08-22
10	2015-08-22	2015-08-22	支付	2015-08-22	9999-12-31	2015-08-22

同上,根據23日訂單新增數據計算23日拉鍊表結果.....就這麼個意思

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章