hive常用UDF and UDTF函數介紹-lateral view explode()

前言：

Hive是基於Hadoop中的MapReduce，提供HQL查詢的數據倉庫。這裏只大概說下Hive常用到的UDF函數，全面詳細介紹推薦官網wiki：https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF。
定義：

UDF(User-Defined-Function)，用戶自定義函數對數據進行處理。

UDTF(User-Defined Table-Generating Functions) 用來解決輸入一行輸出多行(On-to-many maping) 的需求。

UDAF(User Defined Aggregation Function)用戶自定義聚合函數，操作多個數據行，產生一個數據行。
用法：

1、UDF函數可以直接應用於select語句，對查詢結構做格式化處理後，再輸出內容。

2、編寫UDF函數的時候需要注意一下幾點：

a）自定義UDF需要繼承org.apache.hadoop.hive.ql.UDF。

b）需要實現evaluate函。

c）evaluate函數支持重載。

Hive已定義函數介紹：
1、字符串長度函數：length
語法: length(string A)
返回值: int
舉例：
[sql] view plain copy

hive> select length(‘abcedfg’) from dual;  
7

2、字符串反轉函數：reverse
語法: reverse(string A)
返回值: string
說明：返回字符串A的反轉結果

舉例：

[sql] view plain copy

hive> select reverse(‘abcedfg’) from dual;  
gfdecba

3、字符串連接函數：concat
語法: concat(string A, string B…)
返回值: string
說明：返回輸入字符串連接後的結果，支持任意個輸入字符串

舉例：

[sql] view plain copy

hive> select concat(‘abc’,'def’,'gh’) from dual;  
abcdefgh

4、帶分隔符字符串連接函數：concat_ws

語法: concat_ws(string SEP, string A, string B…)
返回值: string
說明：返回輸入字符串連接後的結果，SEP表示各個字符串間的分隔符
舉例：
[sql] view plain copy

hive> select concat_ws(‘,’,'abc’,'def’,'gh’) from dual;  
abc,def,gh

5、字符串截取函數：substr,substring

語法: substr(string A, int start),substring(string A, int start)
返回值: string
說明：返回字符串A從start位置到結尾的字符串
舉例：
[sql] view plain copy

hive> select substr(‘abcde’,3) from dual;  
cde  
hive> select substring(‘abcde’,3) from dual;  
cde  
hive> select substr(‘abcde’,-1) from dual;  （和ORACLE相同）  
e

6、字符串大小寫轉換

字符串轉大寫函數：upper,ucase

字符串轉小寫函數：lower,lcase

語法: lower(string A) lcase(string A)
返回值: string
說明：返回字符串A的小寫格式
舉例：
[sql] view plain copy

hive> select lower(‘abSEd’) from dual;  
absed  
hive> select lcase(‘abSEd’) from dual;  
absed

7、左右去除空格函數

左邊去空格函數：ltrim

右邊去空格函數：rtrim
8、正則表達式替換函數：regexp_replace

語法: regexp_replace(string A, string B, string C)
返回值: string
說明：將字符串A中的符合java正則表達式B的部分替換爲C。注意，在有些情況下要使用轉義字符
舉例：
[sql] view plain copy

hive> select regexp_replace(‘foobar’, ‘oo|ar’, ”) from dual;  
fb

9、正則表達式解析函數：regexp_extract

語法: regexp_extract(string subject, string pattern, int index)
返回值: string
說明：將字符串subject按照pattern正則表達式的規則拆分，返回index指定的字符。注意，在有些情況下要使用轉義字符
舉例：
[sql] view plain copy

hive> select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 1) from dual;  
the  
hive> select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 2) from dual;  
bar  
hive> select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 0) from dual;  
foothebar

10、URL解析函數：parse_url，parse_url_tuple（UDTF）

語法: parse_url(string urlString, string partToExtract [, string keyToExtract])，parse_url_tuple功能類似parse_url()，但它可以同時提取多個部分並返回
返回值: string
說明：返回URL中指定的部分。partToExtract的有效值爲：HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.
舉例：
[sql] view plain copy

hive> select parse_url(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1′, ‘HOST’) from dual;  
facebook.com  
hive> select parse_url_tuple('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY:k1', 'QUERY:k2');  
v1 v2

11、json解析函數：get_json_object

語法: get_json_object(string json_string, string path)
返回值: string
說明：解析json的字符串json_string,返回path指定的內容。如果輸入的json字符串無效，那麼返回NULL。
舉例：
[sql] view plain copy

hive> select  get_json_object(‘{“store”:  
>   {“fruit”:\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],  
>    “bicycle”:{“price”:19.95,”color”:”red”}  
>   },  
>  “email”:”amy@only_for_json_udf_test.net”,  
>  “owner”:”amy”  
> }  
> ‘,’$.owner’) from dual;  
amy

12、集合查找函數: find_in_set

語法: find_in_set(string str, string strList)
返回值: int
說明: 返回str在strlist第一次出現的位置，strlist是用逗號分割的字符串。如果沒有找該str字符，則返回0（只能是逗號分隔，不然返回0）
舉例：
[sql] view plain copy

hive> select find_in_set(‘ab’,'ef,ab,de’) from dual;  
2  
hive> select find_in_set(‘at’,'ef,ab,de’) from dual;  
0

13、行轉列：explode （posexplode Available as of Hive 0.13.0）

說明：將輸入的一行數組或者map轉換成列輸出
語法：explode(array (or map))
舉例：

[sql] view plain copy

hive> select explode(split(concat_ws(',','1','2','3','4','5','6','7','8','9'),',')) from test.dual;  
1  
2  
3  
4  
5  
6  
7  
8  
9

14、多行轉換：lateral view

說明：lateral view用於和json_tuple，parse_url_tuple，split, explode等UDTF一起使用，它能夠將一行數據拆成多行數據，在此基礎上可以對拆分後的數據進行聚合。
舉例：

[sql] view plain copy

hive> select s.x,sp from test.dual s lateral view explode(split(concat_ws(',','1','2','3','4','5','6','7','8','9'),',')) t as sp;  
x sp  
a 1  
b 2  
a 3

解釋一下，from後面是你的表名，在表名後面加lateral view explode。。。（你的行轉列sql），還必須要起一個別名，我這個字段的別名爲sp。然後再看看select後面的 s.*，就是原表的字段，我這裏面只有一個字段，且爲X

多個lateral view的sql類如：

[sql] view plain copy

SELECT * FROM exampleTable LATERAL VIEW explode(col1) myTable1 AS myCol1 LATERAL VIEW explode(myCol1) myTable2 AS myCol2;

抽取一行數據轉換到新表的多列樣例：

http_referer是獲取的帶參數請求路徑，其中非法字符用\做了轉義，根據路徑解析出地址，查詢條件等存入新表中，

[sql] view plain copy

drop table if exists t_ods_tmp_referurl;  
create table t_ ods _tmp_referurl as  
SELECT a.*,b.*  
FROM ods_origin_weblog a LATERAL VIEW parse_url_tuple(regexp_replace(http_referer, "\"", ""), 'HOST', 'PATH','QUERY', 'QUERY:id') b as host, path, query, query_id;

複製表，並將時間截取到日：

[sql] view plain copy

drop table if exists t_ods_tmp_detail;  
create table t_ods_tmp_detail as   
select b.*,substring(time_local,0,10) as daystr,  
substring(time_local,11) as tmstr,  
substring(time_local,5,2) as month,  
substring(time_local,8,2) as day,  
substring(time_local,11,2) as hour  
From t_ ods _tmp_referurl b;

hive常用UDF and UDTF函數介紹-lateral view explode()

linux下各個符號代表的含義

深入理解jvm--自動內存管理機制

linux上定位磁盤IO問題

svn代碼衝突解決

單鏈表每k個節點反轉一次

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結