1.行轉列
1.1 問題引入:
如何將
a b 1,2,3
c d 4,5,6
變爲:
a b 1
a b 2
a b 3
c d 4
c d 5
c d 6
1.2 原始數據:
test.txt
a b 1,2,3
c d 4,5,6
1.3 解決方法
方案1:
drop table test_jzl_20140701_test;
建表:
create table test_jzl_20140701_test
(
col1 string,
col2 string,
col3 string
)
row format delimited fields terminated by ' '
stored as textfile;
加載數據:
load data local inpath '/home/jiangzl/shell/test.txt' into table test_jzl_20140701_test;
查看錶中所有數據:
select * from test_jzl_20140701_test
a b 1,2,3
c d 4,5,6
遍歷數組中的每一列
select col1,col2,name
from test_jzl_20140701_test
lateral view explode(split(col3,',')) col3 as name;
a b 1
a b 2
a b 3
c d 4
c d 5
c d 6
方案2:
drop table test_jzl_20140701_test1;
建表:
create table test_jzl_20140701_test1
(
col1 string,
col2 string,
col3 array<int>
)
row format delimited
fields terminated by ' '
collection items terminated by ',' //定義數組的分隔符
stored as textfile;
加載數據:
load data local inpath '/home/jiangzl/shell/test.txt' into table test_jzl_20140701_test1;
查看錶中所有數據:
select * from test_jzl_20140701_test1;
a b [1,2,3]
c d [4,5,6]
遍歷數組中的每一列:
select col1,col2,name
from test_jzl_20140701_test1
lateral view explode(col3) col3 as name;
a b 1
a b 2
a b 3
c d 4
c d 5
c d 6
1.4補充知識點:
select * from test_jzl_20140701_test;
a b 1,2,3
c d 4,5,6
select t.list[0],t.list[1],t.list[2] from (
select (split(col3,',')) list from test_jzl_20140701_test)t;
OK
1 2 3
4 5 6
--查看數組長度
select size(split(col3,',')) list from test_jzl_20140701_test;
3
3
2.列轉行
2.1問題引入:
hive如何將
a b 1
a b 2
a b 3
c d 4
c d 5
c d 6
變爲:
a b 1,2,3
c d 4,5,6
2,2原始數據:
test.txt
a b 1
a b 2
a b 3
c d 4
c d 5
c d 6
2.3 解決方法:
drop table tmp_jiangzl_test;
建表:
create table tmp_jiangzl_test
(
col1 string,
col2 string,
col3 string
)
row format delimited fields terminated by '\t'
stored as textfile;
加載數據:
load data local inpath '/home/jiangzl/shell/test.txt' into table tmp_jiangzl_test;
處理:
select col1,col2,concat_ws(',',collect_set(col3))
from tmp_jiangzl_test
group by col1,col2;
行轉列,列轉行數據組面試應該是必問的問題吧,也就是問這兩個函數的使用。