【Hive】使用臨時表保留全量數據

hive使用臨時表保留全量數據

需求：
在hive環境下，a表爲全量表，b表爲增量表(只有當天跑的數據)，

假設需要將a表中有的但b表中沒有的數據仍然保留在a表，

而且需要將b表中有的但a表中沒有的數據追加到a表

方案一：
使用左外關聯先將a表中有的數據但b表中沒有的數據過濾出來，
然後再將b表的數據與過濾出來的數據合併
---------------------創建數據(在oracle演示)

--查詢b表在a表的信息
with a as(
select 1 as id, 'Lisi' as name ,'2019-10-01' as time from dual
union all
select 2 as id, 'Wangmen' as name,'2019-10-01' as time from dual
union all
select 3 as id, 'Zhaoliu' as name,'2019-10-01' as time from dual
union all
select 4 as id, 'Pangsan' as name,'2019-10-01' as time from dual
),
b as(
select 1 as id, 'Lisi' as name,'2019-10-03' as time from dual
union all
select 2 as id, 'Wangmen' as name,'2019-10-03' as time from dual
union all
select 5 as id, 'Huangsan' as name,'2019-10-03' as time from dual
)

--使用連接
select a.id, a.name,a.time
  from a
  left join b
    on a.id = b.id
 where b.id is null
union all
select b.id,b.name,b.time
  from b
;

方案二：
先將a、b表的數據合併，
然後使用分析函數row_number()進行排序，將重複的數據進行分組排序，重複的數據只保留時間最新的那一份數據即可
---------------------創建數據(在oracle演示)

--查詢b表在a表的信息
with a as(
select 1 as id, 'Lisi' as name ,'2019-10-01' as time from dual
union all
select 2 as id, 'Wangmen' as name,'2019-10-01' as time from dual
union all
select 3 as id, 'Zhaoliu' as name,'2019-10-01' as time from dual
union all
select 4 as id, 'Pangsan' as name,'2019-10-01' as time from dual
),
b as(
select 1 as id, 'Lisi' as name,'2019-10-02' as time from dual
union all
select 2 as id, 'Wangmen' as name,'2019-10-02' as time from dual
union all
select 5 as id, 'Huangsan' as name,'2019-10-02' as time from dual
)

--使用連接
SELECT id
      ,NAME
      ,TIME
      ,rr
  FROM (SELECT id
              ,NAME
              ,TIME
              ,row_number() over(PARTITION BY id ORDER BY TIME DESC) AS rr
          FROM (SELECT a.id
                      ,a.name
                      ,a.time
                  FROM a a
                UNION ALL
                SELECT b.id
                      ,b.name
                      ,b.time
                  FROM b b) c) d
 WHERE d.rr = 1
;

因數據量小爲得出哪個方案比較好，後續關注。

【Hive】使用臨時表保留全量數據

hive使用臨時表保留全量數據

【Leetcode數據庫177】獲取 Employee 表中第 n 高的薪水（Salary）

【Hive】使用臨時表保留全量數據

【SQL35】編號爲 '3' 的人是編號爲 '1'，'2' 和 '4' 的好友，所以他總共有 3 個好友，比其他人都多

【sqoop】Linux環境下使用一個腳本將hive數據導出到mysql

【Mysql報錯】輸入mysqld --install報錯Install/Remove of the Service Denied!

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結