hive lateral view 與 explode詳解

1.explode

hive wiki對於expolde的解釋如下:

explode() takes in an array (or a map) as an input and outputs the elements of the array (map) as separate rows. UDTFs can be used in the SELECT expression list and as a part of LATERAL VIEW.

As an example of using explode() in the SELECT expression list, consider a table named myTable that has a single column (myCol) and two rows:

這裏寫圖片描述

Then running the query:

<code class="hljs sql has-numbering"><span class="hljs-operator"><span class="hljs-keyword">SELECT</span> explode(myCol) <span class="hljs-keyword">AS</span> myNewCol <span class="hljs-keyword">FROM</span> myTable;</span></code><ul style="" class="pre-numbering"><li>1</li></ul><ul style="" class="pre-numbering"><li>1</li></ul>

will produce:
這裏寫圖片描述
The usage with Maps is similar:

<code class="hljs sql has-numbering"><span class="hljs-operator"><span class="hljs-keyword">SELECT</span> explode(myMap) <span class="hljs-keyword">AS</span> (myMapKey, myMapValue) <span class="hljs-keyword">FROM</span> myMapTable;</span></code><ul style="" class="pre-numbering"><li>1</li></ul><ul style="" class="pre-numbering"><li>1</li></ul>

總結起來一句話:explode就是將hive一行中複雜的array或者map結構拆分成多行。

使用實例:
xxx表中有一個字段mvt爲string類型,數據格式如下:

[{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”},{“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”},{“eid”:”40”,”ex”:”new_rpname_Android”,”val”:”1”,”vid”:”1”,”vr”:”var1”},{“eid”:”19”,”ex”:”hotellistlpage_Android”,”val”:”1”,”vid”:”1”,”vr”:”var01”},{“eid”:”29”,”ex”:”bookhotelpage_Android”,”val”:”0”,”vid”:”1”,”vr”:”var01”},{“eid”:”17”,”ex”:”trainMode_Android”,”val”:”1”,”vid”:”1”,”vr”:”mode_Android”},{“eid”:”44”,”ex”:”ihotelList_Android”,”val”:”1”,”vid”:”36”,”vr”:”var1”},{“eid”:”47”,”ex”:”ihotelDetail_Android”,”val”:”0”,”vid”:”38”,”vr”:”var1”}]

用explode小試牛刀一下:

<code class="hljs sql has-numbering"><span class="hljs-operator"><span class="hljs-keyword">select</span> explode(split(regexp_replace(mvt,<span class="hljs-string">'\\[|\\]'</span>,<span class="hljs-string">''</span>),<span class="hljs-string">'\\},\\{'</span>)) <span class="hljs-keyword">from</span> ods_mvt_hourly <span class="hljs-keyword">where</span> <span class="hljs-keyword">day</span>=<span class="hljs-number">20160710</span> limit <span class="hljs-number">10</span>;</span></code><ul style="" class="pre-numbering"><li>1</li></ul><ul style="" class="pre-numbering"><li>1</li></ul>

最後出來的結果如下:
{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”
“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”
“eid”:”40”,”ex”:”new_rpname_Android”,”val”:”1”,”vid”:”1”,”vr”:”var1”
“eid”:”19”,”ex”:”hotellistlpage_Android”,”val”:”1”,”vid”:”1”,”vr”:”var01”
“eid”:”29”,”ex”:”bookhotelpage_Android”,”val”:”0”,”vid”:”1”,”vr”:”var01”
“eid”:”17”,”ex”:”trainMode_Android”,”val”:”1”,”vid”:”1”,”vr”:”mode_Android”
“eid”:”44”,”ex”:”ihotelList_Android”,”val”:”1”,”vid”:”36”,”vr”:”var1”
“eid”:”47”,”ex”:”ihotelDetail_Android”,”val”:”0”,”vid”:”38”,”vr”:”var1”}
{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”
“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”

2.lateral view

hive wiki 上的解釋如下:

Lateral View Syntax

lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (‘,’ columnAlias)*
fromClause: FROM baseTable (lateralView)*

Description

Lateral view is used in conjunction with user-defined table generating functions such as explode(). As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows for each input row. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.

Example

Consider the following base table named pageAds. It has two columns: pageid (name of the page) and adid_list (an array of ads appearing on the page)
這裏寫圖片描述

An example table with two rows:
這裏寫圖片描述

and the user would like to count the total number of times an ad appears across all pages.
A lateral view with explode() can be used to convert adid_list into separate rows using the query:

<code class="hljs sql has-numbering"><span class="hljs-operator"><span class="hljs-keyword">SELECT</span> pageid, adid
<span class="hljs-keyword">FROM</span> pageAds LATERAL <span class="hljs-keyword">VIEW</span> explode(adid_list) adTable <span class="hljs-keyword">AS</span> adid;</span></code><ul style="" class="pre-numbering"><li>1</li><li>2</li></ul><ul style="" class="pre-numbering"><li>1</li><li>2</li></ul>

The resulting output will be
這裏寫圖片描述
Then in order to count the number of times a particular ad appears, count/group by can be used:

<code class="hljs sql has-numbering"><span class="hljs-operator"><span class="hljs-keyword">SELECT</span> adid, <span class="hljs-aggregate">count</span>(<span class="hljs-number">1</span>)
<span class="hljs-keyword">FROM</span> pageAds LATERAL <span class="hljs-keyword">VIEW</span> explode(adid_list) adTable <span class="hljs-keyword">AS</span> adid
<span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> adid;</span></code><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li></ul><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li></ul>

The resulting output will be
這裏寫圖片描述

由此可見,lateral view與explode等udtf就是天生好搭檔,explode將複雜結構一行拆成多行,然後再用lateral view做各種聚合。

3.實例

還是第一部分的例子,上面我們explode出來以後的數據,不是標準的json格式,我們通過lateral view與explode組合解析出標準的json格式數據:

<code class="hljs sql has-numbering"><span class="hljs-operator"><span class="hljs-keyword">SELECT</span> ecrd, <span class="hljs-keyword">CASE</span> <span class="hljs-keyword">WHEN</span> instr(mvtstr,<span class="hljs-string">'{'</span>)=<span class="hljs-number">0</span>
    <span class="hljs-keyword">AND</span> instr(mvtstr,<span class="hljs-string">'}'</span>)=<span class="hljs-number">0</span> <span class="hljs-keyword">THEN</span> concat(<span class="hljs-string">'{'</span>,mvtstr,<span class="hljs-string">'}'</span>) <span class="hljs-keyword">WHEN</span> instr(mvtstr,<span class="hljs-string">'{'</span>)=<span class="hljs-number">0</span>
    <span class="hljs-keyword">AND</span> instr(mvtstr,<span class="hljs-string">'}'</span>)><span class="hljs-number">0</span> <span class="hljs-keyword">THEN</span> concat(<span class="hljs-string">'{'</span>,mvtstr) <span class="hljs-keyword">WHEN</span> instr(mvtstr,<span class="hljs-string">'}'</span>)=<span class="hljs-number">0</span>
    <span class="hljs-keyword">AND</span> instr(mvtstr,<span class="hljs-string">'{'</span>)><span class="hljs-number">0</span> <span class="hljs-keyword">THEN</span> concat(mvtstr,<span class="hljs-string">'}'</span>) <span class="hljs-keyword">ELSE</span> mvtstr <span class="hljs-keyword">END</span> <span class="hljs-keyword">AS</span> mvt
      <span class="hljs-keyword">FROM</span> ods.ods_mvt_hourly LATERAL <span class="hljs-keyword">VIEW</span> explode(split(regexp_replace(mvt,<span class="hljs-string">'\\[|\\]'</span>,<span class="hljs-string">''</span>),<span class="hljs-string">'\\},\\{'</span>)) addTable <span class="hljs-keyword">AS</span> mvtstr
        <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">DAY</span>=<span class="hljs-string">'20160710'</span> <span class="hljs-keyword">and</span> ecrd <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">null</span> limit <span class="hljs-number">10</span></span></code><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li></ul><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li></ul>

查詢出來的結果:
xxx
{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”}
xxx
{“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”}
xxx
{“eid”:”40”,”ex”:”new_rpname_Android”,”val”:”1”,”vid”:”1”,”vr”:”var1”}
xxx
{“eid”:”19”,”ex”:”hotellistlpage_Android”,”val”:”1”,”vid”:”1”,”vr”:”var01”}
xxx
{“eid”:”29”,”ex”:”bookhotelpage_Android”,”val”:”0”,”vid”:”1”,”vr”:”var01”
xxx
{“eid”:”17”,”ex”:”trainMode_Android”,”val”:”1”,”vid”:”1”,”vr”:”mode_Android”}
xxx
{“eid”:”44”,”ex”:”ihotelList_Android”,”val”:”1”,”vid”:”36”,”vr”:”var1”}
xxx
{“eid”:”47”,”ex”:”ihotelDetail_Android”,”val”:”1”,”vid”:”38”,”vr”:”var1”}
xxx
{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”}
xxx
{“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”}

發佈了47 篇原創文章 · 獲贊 25 · 訪問量 22萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章