語法
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)*
數據準備
假設我們有一張表pageAds,它有兩列數據,第一列是pageid string,第二列是adid_list,即用逗號分隔的廣告ID集合。
mahao@ubuntu:~$ cat pageAds.txt
"front_page" 1,2,3
"contact_page" 3,4,5
hive> CREATE TABLE pageAds(pageid STRING,adid_list Array<INT>) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ',';
OK
Time taken: 3.458 seconds
hive> LOAD DATA LOCAL INPATH 'pageAds.txt' INTO TABLE pageAds;
Loading data to table default.pageads
OK
Time taken: 1.377 seconds
hive> SELECT * FROM pageAds;
OK
"front_page" [1,2,3]
"contact_page" [3,4,5]
Time taken: 2.127 seconds, Fetched: 2 row(s)
hive>
統計所有廣告ID出現的次數。
首先要拆分廣告ID,explode()指出要拆分的行,AS子句指出拆分後的列名:
hive> SELECT pageid,adid FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
OK
"front_page" 1
"front_page" 2
"front_page" 3
"contact_page" 3
"contact_page" 4
"contact_page" 5
按照adid分組,進行統計:
hive>SELECT adid,count(1) FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid GROUP BY adid;
OK
1 1
2 1
3 2
4 1
5 1
多個lateral view語句
一個FROM語句後可以跟多個lateral view語句,後面的lateral view語句可以引用它前面的所有表和列名,例子如下:
表的數據:
hive> select * from baseTable;
OK
[1,2] ["'a'","'b'","'c'"]
[3,4] ["'d'","'e'","'f'"]
兩個lateral view語句:
hive> select mycol1,col2 from baseTable lateral view explode(col1) tb1 as mycol1
> lateral view explode(col2) tb2 as mycol2;
OK
1 ["'a'","'b'","'c'"]
1 ["'a'","'b'","'c'"]
1 ["'a'","'b'","'c'"]
2 ["'a'","'b'","'c'"]
2 ["'a'","'b'","'c'"]
2 ["'a'","'b'","'c'"]
3 ["'d'","'e'","'f'"]
3 ["'d'","'e'","'f'"]
3 ["'d'","'e'","'f'"]
4 ["'d'","'e'","'f'"]
4 ["'d'","'e'","'f'"]
4 ["'d'","'e'","'f'"]
注意上面語句中,兩個lateral view按照出現的次序被執行。
轉自http://yugouai.iteye.com/blog/1849902