Lateral View語法

語法

lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)*  

數據準備

假設我們有一張表pageAds,它有兩列數據,第一列是pageid string,第二列是adid_list,即用逗號分隔的廣告ID集合。

mahao@ubuntu:~$ cat pageAds.txt 
"front_page"    1,2,3
"contact_page"  3,4,5


hive> CREATE TABLE pageAds(pageid STRING,adid_list Array<INT>) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ',';
OK
Time taken: 3.458 seconds
hive> LOAD DATA LOCAL INPATH 'pageAds.txt' INTO TABLE pageAds;
Loading data to table default.pageads
OK
Time taken: 1.377 seconds
hive> SELECT * FROM pageAds;
OK
"front_page"    [1,2,3]
"contact_page"  [3,4,5]
Time taken: 2.127 seconds, Fetched: 2 row(s)
hive> 

統計所有廣告ID出現的次數。

首先要拆分廣告ID,explode()指出要拆分的行,AS子句指出拆分後的列名:

hive> SELECT pageid,adid FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
OK
"front_page"    1
"front_page"    2
"front_page"    3
"contact_page"  3
"contact_page"  4
"contact_page"  5

按照adid分組,進行統計:

hive>SELECT adid,count(1) FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid GROUP BY adid;
OK
1   1
2   1
3   2
4   1
5   1
多個lateral view語句

一個FROM語句後可以跟多個lateral view語句,後面的lateral view語句可以引用它前面的所有表和列名,例子如下:

表的數據:

hive> select * from baseTable;
OK
[1,2]   ["'a'","'b'","'c'"]
[3,4]   ["'d'","'e'","'f'"]

兩個lateral view語句:

hive> select mycol1,col2 from baseTable lateral view explode(col1) tb1 as mycol1
    > lateral view explode(col2) tb2 as mycol2;
OK
1   ["'a'","'b'","'c'"]
1   ["'a'","'b'","'c'"]
1   ["'a'","'b'","'c'"]
2   ["'a'","'b'","'c'"]
2   ["'a'","'b'","'c'"]
2   ["'a'","'b'","'c'"]
3   ["'d'","'e'","'f'"]
3   ["'d'","'e'","'f'"]
3   ["'d'","'e'","'f'"]
4   ["'d'","'e'","'f'"]
4   ["'d'","'e'","'f'"]
4   ["'d'","'e'","'f'"]

注意上面語句中,兩個lateral view按照出現的次序被執行。
轉自http://yugouai.iteye.com/blog/1849902

發佈了71 篇原創文章 · 獲贊 322 · 訪問量 55萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章