hive-內置函數（常用內置函數彙總）

原創

2020-06-16 12:33

show functions; #查看所有內置函數，共271個 show function sum; #查看sum函數的描述信息 show function extended sum; #查看內置函數的描述信息和舉例的使用方法

舉例數據表：stu

id	name	address	score	credit
01	huang	hebi,changzhou,dalian	chinese:80,math:90	3.2
02	meng	hebi,taiyuan	chinese:85,math:70	4

聚合函數（多對一：多個參數計算成一個值）

1.ceil(X):取上限，取不小於X的最小整數；

select ceil(max(credit)) from stu;

學分最大值是4，ceil(4) = 4

2.floor(X):取下限，取不大於X的最大整數；

select floor(min(credit)) from stu;

學分最小值是3.2，floor(3.2) = 3

3.abs(X):取絕對值

4.array(x,y,z):轉化爲數組

select array(1,2,3) ;

輸出：

5.map(a,b,c,d):轉化爲map，括號中的元素個數必須爲偶數個，其中奇數位是key,偶數位是value

select map(1,2,3,4) ;

輸出：

1 2

3 4

6.concat(x,y,z) / concat_ws(x,y,z):拼接函數;

select concat('hello','-','world','-','!'); select concat_ws('-', 'hello','world','!'); #均輸出：hello-world-！

7.substring(str, pos, len):截取函數

select substring('hello world',1,3) ; #輸出：hel select substring('hello world',1) ; #輸出：hello world select substring('hello world',-3,2) ; #輸出：rl select substring('hello world',-3) ; #輸出：rld

8.instr(str1, str2):查找字符str2在字符str1中第一次出現的位置

select instr('hello','l') ; #輸出：3 select instr('hello','x') ; #輸出：0，因爲str1中沒有x

9.nvl(value,default_value) / getlong(屬性值, 替代值):一般用於處理缺失值

select address[2] from stu; #輸出：dalian null select nvl(address[2],'china') from stu; #輸出：dalian china

10.if(條件?,條件爲真的返回值,條件爲假的返回值):條件函數

select if(address[2] is null, 'china', address[2]) from stu; #輸出：dalian china

炸裂函數（一對多：一個值炸裂成多個）

1.explode(x):炸裂函數

select explode(address) from stu ; # x是array數組：

輸出：

hebi

changzhou

dalian

hebi

taiyuan

select explode(score) from stu ; # x是map：

輸出：

chinese 80

math 90

chinese 85

math 70

若是炸裂開之後還需要配上其他列的信息，比如想知道數學考了90的是誰？則需要使用橫向視圖lateral view

select [表中字段], [炸裂字段] from [表名] lateral view explode(被炸裂字段) [視圖的別名] as [炸裂字段的別稱];

map結構

select name, ts.sub, ts.res from stu lateral view explode(score) ts as sub,res;#因爲score是map結構，因此需要有兩個別稱sub,res;

輸出：

huang chinese 80

huang math 90

meng chinese 85

meng math 70

array類型

select name, ts.city from stu lateral view explode(address) ts as city;#address炸裂後只有一列，因此只需要1個別稱city

輸出：

huang hebi

huang changzhou

huang dalian

meng hebi

meng taiyuan

分組排名函數

1.row_number()、rank()、dense_rank() ：分組排名，若不需要分組，則去掉partition by [字段]

row_number() over(partition by [字段] order by[字段]) #分組後給每一組中加組內排序1...n rank() over(partition by [字段] order by[字段]) #排名，同分的2個人佔 2 個名額 dense_rank() over(partition by [字段] order by[字段]) #排名，同分的2個人佔 1 個名額

舉例：數據表stu

huang 18 math

he 17 math

meng 19 chinese

ji 17 math

li 16 chinese

liu 15 math

row_number()：取每個部門中年齡最大的人的信息：

select t.sname,t.age,t.departmant, row_number() over(partition by departmant order by age desc) as rank FROM stu t;

結果：

huang 18 math 1

he 17 math 2

ji 17 math 3

liu 15 math 4

meng 19 chinese 1

li 16 chinese 2

rank()：分部門按照年齡排名（分組計數排名）：

select t.sname,t.age,t.departmant, rank() over(partition by departmant order by age desc) as rank FROM stu t;

結果：

huang 18 math 1

he 17 math 2

ji 17 math 2

liu 15 math 4 #第2 直接到第4

meng 19 chinese 1

li 16 chinese 2

dense_rank()：分部門按照年齡排名（分組順續排名）：

select t.sname,t.age,t.departmant, dense_rank() over(partition by departmant order by age desc) as rank FROM stu t;

結果：

huang 18 math 1

he 17 math 2

ji 17 math 2

liu 15 math 3 #第2 到第3,第2名有兩個人但是隻佔一個名額

meng 19 chinese 1

li 16 chinese

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

hive-內置函數（常用內置函數彙總）

正則表達式 - 去掉亂碼字符/提取字符串中的中文字符/提取字符串中的大小寫字母 - Python代碼

Power BI 數據分析可視化軟件入門教程

hive-內置函數（常用內置函數彙總）

Matplotlib - 折線圖 plot() 所有用法詳解

分組統計 - DataFrame.groupby() 所見的各種用法 - Python代碼

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結