hive學習筆記之七:內置函數

歡迎訪問我的GitHub

https://github.com/zq2599/blog_demos

內容:所有原創文章分類彙總及配套源碼,涉及Java、Docker、Kubernetes、DevOPS等;

《hive學習筆記》系列導航

  1. 基本數據類型
  2. 複雜數據類型
  3. 內部表和外部表
  4. 分區表
  5. 分桶
  6. HiveQL基礎
  7. 內置函數
  8. Sqoop
  9. 基礎UDF
  10. 用戶自定義聚合函數(UDAF)
  11. UDTF

本篇概覽

  • 本文是《hive學習筆記》系列的第七篇,前文熟悉了HiveQL的常用語句,接下來把常用的內置函數簡單過一遍,分爲以下幾部分:
  1. 數學
  2. 字符
  3. json處理
  4. 轉換
  5. 日期
  6. 條件
  7. 聚合

準備數據

  1. 本次實戰要準備兩個表:學生表和住址表,字段都很簡單,如下圖所示,學生表有個住址ID字段,是住址表裏的記錄的唯一ID:

在這裏插入圖片描述 2. 先創建住址表:

create table address (addressid int, province string, city string) 
row format delimited 
fields terminated by ',';
  1. 創建address.txt文件,內容如下:
1,guangdong,guangzhou
2,guangdong,shenzhen
3,shanxi,xian
4,shanxi,hanzhong
6,jiangshu,nanjing
  1. 加載數據到<font color="blue">address</font>表:
load data 
local inpath '/home/hadoop/temp/202010/25/address.txt' 
into table address;
  1. 創建學生表,其<font color="blue">addressid</font>字段關聯了<font color="red">address</font>表的<font color="blue">addressid</font>字段:
create table student (name string, age int, addressid int) 
row format delimited 
fields terminated by ',';
  1. 創建<font color="blue">student.txt</font>文件,內容如下:
tom,11,1
jerry,12,2
mike,13,3
john,14,4
mary,15,5
  1. 加載數據到student表:
load data 
local inpath '/home/hadoop/temp/202010/25/student.txt' 
into table student;
  1. 至此,本次操作所需數據已準備完畢,如下所示:
hive> select * from address;
OK
1	guangdong	guangzhou
2	guangdong	shenzhen
3	shanxi	xian
4	shanxi	hanzhong
6	jiangshu	nanjing
Time taken: 0.043 seconds, Fetched: 5 row(s)
hive> select * from student;
OK
tom	11	1
jerry	12	2
mike	13	3
john	14	4
mary	15	5
Time taken: 0.068 seconds, Fetched: 5 row(s)
  • 開始體驗內置函數;

總覽

  1. 進入hive控制檯;
  2. 執行命令<font color="blue">show functions;</font>顯示內置函數列表:
hive> show functions;
OK
!
!=
%
&
*
+
-
/
<
<=
<=>
<>
=
==
>
>=
^
abs
acos
add_months
and
array
array_contains
ascii
asin
assert_true
atan
avg
base64
between
bin
case
cbrt
ceil
ceiling
coalesce
collect_list
collect_set
compute_stats
concat
concat_ws
context_ngrams
conv
corr
cos
count
covar_pop
covar_samp
create_union
cume_dist
current_database
current_date
current_timestamp
current_user
date_add
date_format
date_sub
datediff
day
dayofmonth
decode
degrees
dense_rank
div
e
elt
encode
ewah_bitmap
ewah_bitmap_and
ewah_bitmap_empty
ewah_bitmap_or
exp
explode
factorial
field
find_in_set
first_value
floor
format_number
from_unixtime
from_utc_timestamp
get_json_object
greatest
hash
hex
histogram_numeric
hour
if
in
in_file
index
initcap
inline
instr
isnotnull
isnull
java_method
json_tuple
lag
last_day
last_value
lcase
lead
least
length
levenshtein
like
ln
locate
log
log10
log2
lower
lpad
ltrim
map
map_keys
map_values
matchpath
max
min
minute
month
months_between
named_struct
negative
next_day
ngrams
noop
noopstreaming
noopwithmap
noopwithmapstreaming
not
ntile
nvl
or
parse_url
parse_url_tuple
percent_rank
percentile
percentile_approx
pi
pmod
posexplode
positive
pow
power
printf
radians
rand
rank
reflect
reflect2
regexp
regexp_extract
regexp_replace
repeat
reverse
rlike
round
row_number
rpad
rtrim
second
sentences
shiftleft
shiftright
shiftrightunsigned
sign
sin
size
sort_array
soundex
space
split
sqrt
stack
std
stddev
stddev_pop
stddev_samp
str_to_map
struct
substr
substring
sum
tan
to_date
to_unix_timestamp
to_utc_timestamp
translate
trim
trunc
ucase
unbase64
unhex
unix_timestamp
upper
var_pop
var_samp
variance
weekofyear
when
windowingtablefunction
xpath
xpath_boolean
xpath_double
xpath_float
xpath_int
xpath_long
xpath_number
xpath_short
xpath_string
year
|
~
Time taken: 0.003 seconds, Fetched: 216 row(s)
  1. 以<font color="blue">lower</font>函數爲例,執行命令<font color="blue">describe function lower;</font>即可查看lower函數的說明:
hive> describe function lower;
OK
lower(str) - Returns str with all characters changed to lowercase
Time taken: 0.005 seconds, Fetched: 1 row(s)
  • 接下來從計算函數開始,體驗常用函數;
  • 先執行以下命令,使查詢結果中帶有字段名:
set hive.cli.print.header=true;

計算函數

  1. 加法<font color="blue">+</font>:
hive> select name, age, age+1 as add_value from student;
OK
name	age	add_value
tom	11	12
jerry	12	13
mike	13	14
john	14	15
mary	15	16
Time taken: 0.098 seconds, Fetched: 5 row(s)
  1. 減法(-)、乘法(*)、除法(/)的使用與加法類似,不再贅述了;
  2. 四捨五入<font color="blue">round</font>:
hive> select round(1.1), round(1.6);
OK
_c0	_c1
1.0	2.0
Time taken: 0.028 seconds, Fetched: 1 row(s)
  1. 向上取整<font color="blue">ceil</font>:
hive> select ceil(1.1);
OK
_c0
2
Time taken: 0.024 seconds, Fetched: 1 row(s)
  1. 向下取整<font color="blue">floor</font>:
hive> select floor(1.1);
OK
_c0
1
Time taken: 0.024 seconds, Fetched: 1 row(s)
  1. 平方<font color="blue">pow</font>,例如pow(2,3)表示2的三次方,等於8:
hive> select pow(2,3);
OK
_c0
8.0
Time taken: 0.027 seconds, Fetched: 1 row(s)
  1. 取模<font color="blue">pmod</font>:
hive> select pmod(10,3);
OK
_c0
1
Time taken: 0.059 seconds, Fetched: 1 row(s)

字符函數

  1. 轉小寫<font color="blue">lower</font>,轉大寫<font color="blue">upper</font>:
hive> select lower(name), upper(name) from student;
OK
_c0	_c1
tom	TOM
jerry	JERRY
mike	MIKE
john	JOHN
mary	MARY
Time taken: 0.051 seconds, Fetched: 5 row(s)
  1. 字符串長度<font color="blue">length</font>:
hive> select name, length(name) from student;
OK
tom	3
jerry	5
mike	4
john	4
mary	4
Time taken: 0.322 seconds, Fetched: 5 row(s)
  1. 字符串拼接<font color="blue">concat</font>:
hive> select concat("prefix_", name) from student;
OK
prefix_tom
prefix_jerry
prefix_mike
prefix_john
prefix_mary
Time taken: 0.106 seconds, Fetched: 5 row(s)
  1. 子串<font color="blue">substr</font>,substr(xxx,2)表示從第二位開始到右邊所有,substr(xxx,2,3)表示從第二位開始取三個字符:
hive> select substr("0123456",2);
OK
123456
Time taken: 0.067 seconds, Fetched: 1 row(s)
hive> select substr("0123456",2,3);
OK
123
Time taken: 0.08 seconds, Fetched: 1 row(s)
  1. 去掉前後空格<font color="blue">trim</font>:
hive> select trim("   123   ");
OK
123
Time taken: 0.065 seconds, Fetched: 1 row(s)

json處理(get_json_object)

爲了使用json處理的函數,先準備一些數據:

  1. 先創建表<font color="blue">t15</font>,只有一個字段用於保存字符串:
create table t15(json_raw string) 
row format delimited;
  1. 創建t15.txt文件,內容如下:
{"name":"tom","age":"10"}
{"name":"jerry","age":"11"}
  1. 加載數據到<font color="blue">t15</font>表:
load data 
local inpath '/home/hadoop/temp/202010/25/015.txt' 
into table t15;
  1. 使用<font color="blue">get_json_object</font>函數,解析<font color="red">json_raw</font>字段,分別取出指定<font color="blue">name</font>和<font color="blue">age</font>屬性:
select 
get_json_object(json_raw, "$.name"), 
get_json_object(json_raw, "$.age") 
from t15;

得到結果:

hive> select 
    > get_json_object(json_raw, "$.name"), 
    > get_json_object(json_raw, "$.age") 
    > from t15;
OK
tom	10
jerry	11
Time taken: 0.081 seconds, Fetched: 2 row(s)

日期

  1. 獲取當前日期<font color="blue">current_date</font>:
hive> select current_date();
OK
2020-11-02
Time taken: 0.052 seconds, Fetched: 1 row(s)
  1. 獲取當前時間戳<font color="blue">current_timestamp</font>:
hive> select current_timestamp();
OK
2020-11-02 10:07:58.967
Time taken: 0.049 seconds, Fetched: 1 row(s)
  1. 獲取年份<font color="blue">year</font>、月份<font color="blue">month</font>、日期<font color="blue">day</font>:
hive> select year(current_date()), month(current_date()), day(current_date());
OK
2020	11	2
Time taken: 0.054 seconds, Fetched: 1 row(s)
  1. 另外,<font color="blue">year</font>和<font color="blue">current_timestamp</font>也能搭配使用:
hive> select year(current_timestamp()), month(current_timestamp()), day(current_timestamp());
OK
2020	11	2
Time taken: 0.042 seconds, Fetched: 1 row(s)
  1. 返回日期部分<font color="blue">to_date</font>:
hive> select to_date(current_timestamp());
OK
2020-11-02
Time taken: 0.051 seconds, Fetched: 1 row(s)

條件函數

  • 條件函數的作用和java中的<font color="blue">switch</font>類似,語法是<font color="blue">case X when XX then XXX else XXXX end</font>;
  • 示例如下,作用是判斷name字段,如果等於<font color="blue">tom</font>就返回<font color="blue">tom_case</font>,如果等於<font color="blue">jerry</font>就返回<font color="blue">jerry_case</font>,其他情況都返回<font color="blue">other_case</font>:
select name,
case name when 'tom' then 'tom_case'
          when 'jerry' then 'jerry_case'
          else 'other_case'
end
from student;

結果如下:

hive> select name,
    > case name when 'tom' then 'tom_case'
    >           when 'jerry' then 'jerry_case'
    >           else 'other_case'
    > end
    > from student;
OK
tom	tom_case
jerry	jerry_case
mike	other_case
john	other_case
mary	other_case
Time taken: 0.08 seconds, Fetched: 5 row(s)

聚合函數

  1. 返回行數<font color="blue">count</font>:
select count(*) from student;

觸發MR,結果如下:

Total MapReduce CPU Time Spent: 2 seconds 170 msec
OK
5
Time taken: 20.823 seconds, Fetched: 1 row(s)
  1. 分組後組內求和<font color="blue">sum</font>:
select province, sum(1) from address group by province;

觸發MR,結果如下:

Total MapReduce CPU Time Spent: 1 seconds 870 msec
OK
guangdong	2
jiangshu	1
shanxi	2
Time taken: 19.524 seconds, Fetched: 3 row(s)
  1. 分組後,組內最小值<font color="blue">min</font>,最大值<font color="blue">max</font>,平均值<font color="blue">avg</font>:
select province, min(addressid), max(addressid), avg(addressid) from address group by province;

觸發MR,結果如下:

Total MapReduce CPU Time Spent: 1 seconds 650 msec
OK
guangdong	1	2	1.5
jiangshu	6	6	6.0
shanxi	3	4	3.5
Time taken: 20.106 seconds, Fetched: 3 row(s)
  • 至此,hive常用到內置函數咱們都體驗過一遍了,希望能給您提供一些參考,接下來的文章會體驗一個常用工具:<font color="blue">Sqoop</font>

你不孤單,欣宸原創一路相伴

  1. Java系列
  2. Spring系列
  3. Docker系列
  4. kubernetes系列
  5. 數據庫+中間件系列
  6. DevOps系列

歡迎關注公衆號:程序員欣宸

微信搜索「程序員欣宸」,我是欣宸,期待與您一同暢遊Java世界... https://github.com/zq2599/blog_demos

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章