背景:在读取hive表中某一些字段的时候,有的json字符串其中会包括数组,那么想要读取这个数组并且转换为多行该怎么操作那?
操作:
1、数据举例:
["[{\"pet_skill_avg_level\":0,\"pet_guard_star\":0,\"pet_type\":0,\"pet_step\":0,\"pet_skill_num\":0,\"pet_adv_score\":0,\"pet_level\":0,\"pet_fight_star\":0,\"pet_id\":\"0\"},{\"pet_skill_avg_level\":0,\"pet_guard_star\":0,\"pet_type\":0,\"pet_step\":0,\"pet_skill_num\":0,\"pet_adv_score\":0,\"pet_level\":0,\"pet_fight_star\":0,\"pet_id\":\"0\"},{\"pet_skill_avg_level\":0,\"pet_guard_star\":0,\"pet_type\":0,\"pet_step\":0,\"pet_skill_num\":0,\"pet_adv_score\":0,\"pet_level\":0,\"pet_fight_star\":0,\"pet_id\":\"0\"},{\"pet_skill_avg_level\":0,\"pet_guard_star\":0,\"pet_type\":0,\"pet_step\":0,\"pet_skill_num\":0,\"pet_adv_score\":0,\"pet_level\":0,\"pet_fight_star\":0,\"pet_id\":\"0\"},{\"pet_skill_avg_level\":0,\"pet_guard_star\":0,\"pet_type\":0,\"pet_step\":0,\"pet_skill_num\":0,\"pet_adv_score\":0,\"pet_level\":0,\"pet_fight_star\":0,\"pet_id\":\"0\"}]"]
这是其中一个字段的json串,格式是一个数组
最终想要的是数组中的五条数据分成五行。
实现方法:
SELECT
col
FROM
(
select
split(
regexp_replace(regexp_extract(params ['pet_info'],'^\\[(.+)\\]$',1 ),
'\\}\\,\\{', '\\}\\|\\|\\{'),
'\\|\\|'
) AS pet_info
from
db_a.dwd_event_log
where
p_date = '${DATE}'
and app_id = 165018
and event = 'pet_flow'
-- 只取一条进行查询
LIMIT 1
) info lateral view explode(info.pet_info) ss as col
步骤解释:
1、regexp_extract(params [‘pet_info’],’^\[(.+)\]$’,1
将该字段的数据 清洗成这个格式
{"pet_skill_avg_level":0,"pet_guard_star":0,"pet_type":0,"pet_step":0,"pet_skill_num":0,"pet_adv_score":0,"pet_level":0,"pet_fight_star":0,"pet_id":"0"},{"pet_skill_avg_level":0,"pet_guard_star":0,"pet_type":0,"pet_step":0,"pet_skill_num":0,"pet_adv_score":0,"pet_level":0,"pet_fight_star":0,"pet_id":"0"},{"pet_skill_avg_level":0,"pet_guard_star":0,"pet_type":0,"pet_step":0,"pet_skill_num":0,"pet_adv_score":0,"pet_level":0,"pet_fight_star":0,"pet_id":"0"},{"pet_skill_avg_level":0,"pet_guard_star":0,"pet_type":0,"pet_step":0,"pet_skill_num":0,"pet_adv_score":0,"pet_level":0,"pet_fight_star":0,"pet_id":"0"},{"pet_skill_avg_level":0,"pet_guard_star":0,"pet_type":0,"pet_step":0,"pet_skill_num":0,"pet_adv_score":0,"pet_level":0,"pet_fight_star":0,"pet_id":"0"}
2、将上面的格式每个之间的间隔符换成 ‘||’ (注意转义)
3、用 split 将字符串拆分成数组
4、用 lateral view explode 行转列,转换成多行
效果: