Hive內嵌表生成函數UDTF:explode,posexplode,json_tuple,parse_url_tuple,stack

原創

涤生手记

2020-07-07 03:36

0.Hive內嵌表生成函數

Built-in Table-Generating Functions (UDTF)

普通的用戶定義函數，如concat()，接受單個輸入行並輸出單個輸出行。相反，表生成函數將單個輸入行轉換爲多個輸出行

Row-set columns types	Name(Signature)	Description
T	explode(ARRAY<T> a)	Explodes an array to multiple rows. Returns a row-set with a single column (col), one row for each element from the array. 將數組分解爲多行。返回一個單列(col)的行集，數組中的每個元素對應一行。
Tkey,Tvalue	explode(MAP<Tkey,Tvalue> m)	Explodes a map to multiple rows. Returns a row-set with a two columns (key,value) , one row for each key-value pair from the input map. (As of Hive 0.8.0.).
int,T	posexplode(ARRAY<T> a)	Explodes an array to multiple rows with additional positional column of int type (position of items in the original array, starting with 0). Returns a row-set with two columns (pos,val), one row for each element from the array. 將數組分解爲多行，其中包含int類型的附加位置列(原始數組中的位置，從0開始)，意思行轉列以後，還多一列數組元素對應下標的列。
T1,...,Tn	inline(ARRAY<STRUCT<f1:T1,...,fn:Tn>> a)	Explodes an array of structs to multiple rows. Returns a row-set with N columns (N = number of top level elements in the struct), one row per struct from the array. (As of Hive 0.10.) 將一個結構數組分解爲多行。返回一個包含N列的行集(N =結構中頂級元素的數量)，數組中的每個結構中有一行。
T1,...,Tn/r	stack(int r,T1 V1,...,Tn/r Vn)	Breaks up n values V1,...,Vn into r rows. Each row will have n/r columns. r must be constant. 分解n值V1，…，將Vn分成r行。每一行都有n/r列。r必須是常數。
string1,...,stringn	json_tuple(string jsonStr,string k1,...,string kn)	Takes JSON string and a set of n keys, and returns a tuple of n values. This is a more efficient version of the `get_json_object` UDF because it can get multiple keys with just one call.
string 1,...,stringn	parse_url_tuple(string urlStr,string p1,...,string pn)	Takes URL string and a set of n URL parts, and returns a tuple of n values. This is similar to the `parse_url()` UDF but can extract multiple parts at once out of a URL. Valid part names are: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:<KEY>.

1.案例演示

1.1 explode，一般和lateral view explode一起使用

一般用來實現表的行轉列

select explode(array('A','B','C'));
select explode(array('A','B','C')) as col;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf as col;

select explode(map('A',10,'B',20,'C',30));
select explode(map('A',10,'B',20,'C',30)) as (key,value);
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf as key,value;

1.2. posexplode的使用


select posexplode(array('A','B','C'));
select posexplode(array('A','B','C')) as (pos,val);
select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf as pos,val;

1.3 inline的使用

select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02')));
select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) as (col1,col2,col3);
select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf;
select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf as col1,col2,col3;

1.4 stack的使用

select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01');
select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (col0,col1,col2);
select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf;
select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf as col0,col1,col2;

1.5 json_turple 的使用

在Hive 0.7中引入了一個新的json_tuple() ，是UDTF。它的參數是一個JSON字符串和一組key鍵，並使用一個函數返回一個元組值。這比調用GET_JSON_OBJECT從一個JSON字符串中檢索多個鍵要高效得多。相當於加強版的get_JSON_OBJECT。它可以一次返回多個object，以元組的形式存儲。


select a.timestamp, b.*
from log a lateral view json_tuple(a.appevent, 'eventid', 'eventname') b as f1, f2;

1.6 parse_url_tuple的使用

parse_url_tuple() 是UDTF函數，類似於UDF中的parse_url()，但是可以同時提取給定URL的多個部分，以元組的形式返回數據。例如parse_url_tuple('http://facebook.com/path1/p.php?(k1=v1&k2=v2#Ref1'， 'QUERY:k1'， 'QUERY:k2')返回一個值爲'v1'，'v2'的元組。這比多次調用parse_url()更有效。所有輸入參數和輸出列類型都是string。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive內嵌表生成函數UDTF:explode,posexplode,json_tuple,parse_url_tuple,stack

0.Hive內嵌表生成函數

Built-in Table-Generating Functions (UDTF)

1.案例演示

1.1 explode，一般和lateral view explode一起使用

1.2. posexplode的使用

1.3 inline的使用

1.4 stack的使用

1.5 json_turple 的使用

1.6 parse_url_tuple的使用

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

通義千問 2.5 “客串” ChatGPT4，你分的清嗎？

Garnet：微軟官方基於.NET開源的高性能分佈式緩存存儲數據庫

Flink執行圖

Java響應式編程

評估統計算法在銀行僞造鈔票檢測中的價值

Hive內嵌集合函數：size,map_keys,map_values,array_contains,sort_array等詳解

Hive內嵌字符處理函數：regexp_extract,regexp_replace,split，replace，translate

Hive內嵌表生成函數UDTF:explode,posexplode,json_tuple,parse_url_tuple,stack

Hive內嵌字符處理函數：get_json_object，parse_url

真正讓你明白Hive調優系列3：笛卡爾乘積,小表join大表，Mapjoin等問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結