背景:
HIve在进行行转列的过程中,如果遇到转的数组或者MAP()的情况,会出现一种特殊情况,就是数据会消失:
原数据:
SELECT
'1' AS id,
MAP() AS purchase_info
UNION ALL
SELECT
'2' AS id,
MAP() AS purchase_info
UNION ALL
SELECT
'3' AS id,
str_to_map('2019-11-28:100,2019-11-27:1') AS purchase_info
UNION ALL
SELECT
'3' AS id,
str_to_map('2019-11-28:200,2019-11-27:2') AS purchase_info
) all LATERAL VIEW OUTER EXPLODE(purchase_info) info AS purchase_date,amount
在对原数据进行行转列的时候:
SELECT
id,
info.purchase_date,
info.amount
FROM
(
SELECT
'1' AS id,
MAP() AS purchase_info
UNION ALL
SELECT
'2' AS id,
MAP() AS purchase_info
UNION ALL
SELECT
'3' AS id,
str_to_map('2019-11-28:100,2019-11-27:1') AS purchase_info
UNION ALL
SELECT
'3' AS id,
str_to_map('2019-11-28:200,2019-11-27:2') AS purchase_info
) all LATERAL VIEW EXPLODE(purchase_info)info AS purchase_date,amount
最后的结果是:
发现 purchase_info 为空的MAP的所有数据都消失了。不符合预期。
解决:如果要包含空数据,需要在lateral view
后加上outer
关键字。
SELECT
id,
info.purchase_date,
info.amount
FROM
(
SELECT
'1' AS id,
MAP() AS purchase_info
UNION ALL
SELECT
'2' AS id,
MAP() AS purchase_info
UNION ALL
SELECT
'3' AS id,
str_to_map('2019-11-28:100,2019-11-27:1') AS purchase_info
UNION ALL
SELECT
'3' AS id,
str_to_map('2019-11-28:200,2019-11-27:2') AS purchase_info
) all LATERAL VIEW OUTER EXPLODE(purchase_info)info AS purchase_date,amount
这样的结果就是:
这样的话就符合预期了。
对purchase_date和amount 做一下NULL转换就可以了。