在數據庫中,表與表之間的關聯,通過JOIN連接。可以理解爲“橫向關聯”,如果是多個大表,“橫向關聯”,效率比較慢;
“縱向關聯”:UNION每個表,再GROUPBY去重,得到“關聯”的效果。“縱向關聯”效率比“橫向關聯”強很多。
舉例:T1,T2,T3,T4,T5,每個表有5000萬條數據。
“橫向關聯”: JOIN關聯,實際是5000萬*5000萬*5000萬*5000萬*5000萬,實際是在笛卡爾集基礎上,再過濾條件。【Hadoop:超過幾個小時】
“縱向關聯”:5000萬+5000萬+5000萬+5000萬+5000萬。【Hadoop:30分內】
“縱向關聯”示例代碼:
INSERT OVERWRITE TABLET_DM_SCORE
SELECT
'20160606',
CUST_NO,
ID_NUMBER,
NAME,
MAX(SCORE1) AS SCORE1,
MAX(SCORE2) AS SCORE2,
MAX(SCORE3) AS SCORE3,
MAX(SCORE4) AS SCORE4,
MAX(SCORE5) AS SCORE5,
0 AS SCORE6,
0 AS SCORE7,
(MAX(SCORE1) + MAX(SCORE2) + MAX(SCORE3) + MAX(SCORE4) + MAX(SCORE5)) ASSCORE_ALL,
MAX(ASSET_MAVG_SUM) AS ASSET_MAVG_SUM
FROM
(SELECTCUST_NO, CERT_ID AS ID_NUMBER, CUST_NAME AS NAME,
-90000 AS SCORE1, -90000 AS SCORE2, -90000 AS SCORE3, -90000 AS SCORE4, -90000AS SCORE5, NVL(ASSET_MAVG_SUM,0) AS ASSET_MAVG_SUM
FROM T_GET_WAIVER_ALL_CUST_BPH_INFO T0
UNION ALL
SELECTCUST_NO, ID_NUMBER, NAME, SCORE AS SCORE1, -90000 AS SCORE2, -90000 AS SCORE3,-90000 AS SCORE4, -90000 AS SCORE5, -90000 AS ASSET_MAVG_SUM
FROM T_DM_ABILITY_SCORET1
UNION ALL
SELECTCUST_NO, ID_NUMBER, NAME, -90000 AS SCORE1, SCORE AS SCORE2, -90000 AS SCORE3,-90000 AS SCORE4, -90000 AS SCORE5, -90000 AS ASSET_MAVG_SUM
FROM T_DM_ACTIVETY_SCORET2
UNION ALL
SELECTCUST_NO, ID_NUMBER, NAME, -90000 AS SCORE1, -90000 AS SCORE2, SCORE AS SCORE3,-90000 AS SCORE4, -90000 AS SCORE5, -90000 AS ASSET_MAVG_SUM
FROM T_DM_BEHAVIOR_SCORET3
UNION ALL
SELECTCUST_NO, ID_NUMBER, NAME, -90000 AS SCORE1, -90000 AS SCORE2, -90000 AS SCORE3,SCORE AS SCORE4, -90000 AS SCORE5, -90000 AS ASSET_MAVG_SUM
FROM T_DM_CREDIT_SCORET4
UNION ALL
SELECTCUST_NO, ID_NUMBER, NAME, -90000 AS SCORE1, -90000 AS SCORE2, -90000 AS SCORE3,-90000 AS SCORE4, SCORE AS SCORE5, -90000 AS ASSET_MAVG_SUM
FROM T_DM_IDENTITY_SCORET5
)TA
GROUPBY CUST_NO, ID_NUMBER, NAME;
“縱向關聯”理解:第2步驟爲UNION合併表,第3步GROUP BY去重,達到“關聯”的目的。
注:不適用一對多的關係表關聯