【Hive】字符串函數

Hive版本: hive-1.1.0-cdh5.14.2

1. 首字符轉ascii碼函數：ascii

語法：ascii(string str)
返回值：int
描述：返回字符串str首字符的ascii編碼

0: jdbc:hive2://node03:10000> select ascii('hello') as col1, ascii('hehe') as col2, ascii('Hi') as col3;
+-------+-------+-------+--+
| col1  | col2  | col3  |
+-------+-------+-------+--+
| 104   | 104   | 72    |
+-------+-------+-------+--+

2. 字符串連接函數: concat

語法: concat(string|binary A, string|binary B…)
返回值: string
描述: 連接A，B…，可以是任意多個

0: jdbc:hive2://node03:10000> select concat('hello',' ','world');
+--------------+--+
|     _c0      |
+--------------+--+
| hello world  |
+--------------+--+

3. 自定義分隔符的字符串連接函數: concat_ws

語法: concat_ws(string SEP, string A, string B…)
返回值: string
描述: 以SEP爲分隔符，連接A，B…

0: jdbc:hive2://node03:10000> select concat_ws('#','welcome','to','beijing');
+---------------------+--+
|         _c0         |
+---------------------+--+
| welcome#to#beijing  |
+---------------------+--+

4. 自定義分隔符的字符串數組連接函數：concat_ws

語法：concat_ws(string SEP, array)
返回值: string
描述：以SEP爲分隔符，連接array中的字符串

0: jdbc:hive2://node03:10000> select concat_ws('#',array('welcome','to','beijing'));
+---------------------+--+
|         _c0         |
+---------------------+--+
| welcome#to#beijing  |
+---------------------+--+

5. 字符串查找函數: field

語法：field(val T,val1 T,val2 T,val3 T,…)
返回值：int
描述: 返回val在 val1,val2,val3,…出現的位置，查找不到返回0

0: jdbc:hive2://node03:10000> select field('world','say','hello','world');
+------+--+
| _c0  |
+------+--+
| 3    |
+------+--+

6. 集合查找函數: find_in_set

語法：find_in_set(string str, string strList)
返回值: int
描述: 返回str在字符串列表strList中第一次出現的位置，查找不到返回0

0: jdbc:hive2://node03:10000> select find_in_set('ab', 'abc,b,ab,c,def');
+------+--+
| _c0  |
+------+--+
| 3    |
+------+--+

7. 數字格式化函數: format_number

語法: format_number(number x, int d)
返回值: string
描述: 把數字x格式化爲逗號分隔的千分數數字(’#,###,###.##’)，並保留d個小數位

0: jdbc:hive2://node03:10000> select format_number(123456789.000, 2) as col1,
. . . . . . . . . . . . . . > format_number(123456789.000, 0) as col2,
. . . . . . . . . . . . . . > format_number(123456789.000, 5) as col3;
+-----------------+--------------+--------------------+--+
|      col1       |     col2     |        col3        |
+-----------------+--------------+--------------------+--+
| 123,456,789.00  | 123,456,789  | 123,456,789.00000  |
+-----------------+--------------+--------------------+--+

8. json解析函數：get_json_object

語法: get_json_object(string json_string, string path)
返回值: string
描述: 解析json字符串json_string,返回path指定的內容。如果json_string無效，返回null

0: jdbc:hive2://node03:10000> select  get_json_object('{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} },"email":"amy@only_for_json_udf_test.net","owner":"amy"}','$.owner');
+------+--+
| _c0  |
+------+--+
| amy  |
+------+--+

9. 字符查找函數：instr

語法: instr(string str, string substr)
返回值: int
描述：返回substr在str中第一次出現的位置，查找不到返回0

0: jdbc:hive2://node03:10000> select instr('abcdef','c');
+------+--+
| _c0  |
+------+--+
| 3    |
+------+--+

10. 字符串長度函數: length

語法：length(string A)
返回值: int
描述: 返回字符串A的長度

0: jdbc:hive2://node03:10000> select length('hello');
+------+--+
| _c0  |
+------+--+
| 5    |
+------+--+

11. 字符串查找函數：locate

語法: locate(string substr, string str[, int pos])
返回值: int
描述：從pos位置開始查找，返回substr在str中第一次出現的位置。pos缺省爲1

0: jdbc:hive2://node03:10000> select locate('c', 'cabcdef') as col1, locate('c', 'cabcdef', 2) as col2;
+-------+-------+--+
| col1  | col2  |
+-------+-------+--+
| 1     | 4     |
+-------+-------+--+

12. 轉換小寫函數: lower / lcase

語法：lower(string A) / lcase(string A)
返回值: string
描述: 把字符串A轉換爲小寫

0: jdbc:hive2://node03:10000> select lower('fOoBaR') as lower, lcase('fOoBaR') as lcase;
+---------+---------+--+
|  lower  |  lcase  |
+---------+---------+--+
| foobar  | foobar  |
+---------+---------+--+

13. 轉換大寫函數：upper / ucase

語法：upper(string A) ucase(string A)
返回值： string
描述：把字符串A轉換爲大寫

0: jdbc:hive2://node03:10000> select upper('fOoBaR') as upper, ucase('fOoBaR') as ucase;
+---------+---------+--+
|  upper  |  ucase  |
+---------+---------+--+
| FOOBAR  | FOOBAR  |
+---------+---------+--+

14. 首字母大寫函數: initcap

語法：initcap(string A)
返回值：string
描述: A的首字母轉換爲大寫，其他轉換爲小寫

0: jdbc:hive2://node03:10000> select initcap('fOoBaR');
+---------+--+
|   _c0   |
+---------+--+
| Foobar  |
+---------+--+

15. 左補齊函數: lpad

語法: lpad(string str, int len, string pad)
返回值: string
描述: 使用pad左補齊str到len位。如果len小於str的長度，則str會被在尾部截斷。如果pad爲空，則返回值也爲空

0: jdbc:hive2://node03:10000> select lpad('abcdef', 8, '#') as col1,
. . . . . . . . . . . . . . > lpad('abcdef', 5, '#') as col2,
. . . . . . . . . . . . . . > lpad('abcdef', 8, '') as col3;
+-----------+--------+-------+--+
|   col1    |  col2  | col3  |
+-----------+--------+-------+--+
| ##abcdef  | abcde  | NULL  |
+-----------+--------+-------+--+

16. 右補齊函數：rpad

語法：rpad(string str, int len, string pad)
返回值：string
描述: 使用pad右補齊str到len位。如果len小於str的長度，則str會被在尾部截斷。如果pad爲空，則返回值也爲空

0: jdbc:hive2://node03:10000> select rpad('abcdef', 8, '#') as col1,
. . . . . . . . . . . . . . > rpad('abcdef', 5, '#') as col2,
. . . . . . . . . . . . . . > rpad('abcdef', 8, '') as col3;
+-----------+--------+-------+--+
|   col1    |  col2  | col3  |
+-----------+--------+-------+--+
| abcdef##  | abcde  | NULL  |
+-----------+--------+-------+--+

17. 去空格函數: trim

語法: trim(string A)
返回值: string
描述：去掉字符串A兩端的空格

0: jdbc:hive2://node03:10000> select concat('|',' foobar ','|') as notrim,  concat('|',trim(' foobar '),'|') as trim;
+-------------+-----------+--+
|   notrim    |   trim    |
+-------------+-----------+--+
| | foobar |  | |foobar|  |
+-------------+-----------+--+

18. 左去空格函數：ltrim

語法：ltrim(string A)
返回值: string
描述：去掉字符串A左邊的空格

0: jdbc:hive2://node03:10000> select concat('|',' foobar ','|') as notrim,  concat('|',ltrim(' foobar '),'|') as ltrim;
+-------------+------------+--+
|   notrim    |   ltrim    |
+-------------+------------+--+
| | foobar |  | |foobar |  |
+-------------+------------+--+

19. 右去空格函數: rtrim

語法: rtrim(string A)
返回值：string
描述: 去掉字符串A右邊的空格

0: jdbc:hive2://node03:10000> select concat('|',' foobar ','|') as notrim,  concat('|',rtrim(' foobar '),'|') as rtrim;
+-------------+------------+--+
|   notrim    |   rtrim    |
+-------------+------------+--+
| | foobar |  | | foobar|  |
+-------------+------------+--+

20. URL解析函數: parse_url

語法：parse_url(string urlString, string partToExtract [, string keyToExtract])
返回值: string
描述: 返回URL中指定的部分。partToExtract的有效值爲：HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO

0: jdbc:hive2://node03:10000> select parse_url('https://www.tableName.com/path1/p.php?k1=v1&k2=v2#Ref1','HOST') as host,
. . . . . . . . . . . . . . > parse_url('https://www.tableName.com/path1/p.php?k1=v1&k2=v2#Ref1','QUERY','k1') as k1,
. . . . . . . . . . . . . . > parse_url('https://www.tableName.com/path1/p.php?k1=v1&k2=v2#Ref1','PROTOCOL') as protocol;
+--------------------+-----+-----------+--+
|        host        | k1  | protocol  |
+--------------------+-----+-----------+--+
| www.tableName.com  | v1  | https     |
+--------------------+-----+-----------+--+

21. 格式打印函數：printf

語法：printf(String format, Obj… args)
返回值：string
描述：返回按照printf中格式字符串format格式化後的輸入

0: jdbc:hive2://node03:10000> select printf('I am %d %s old', 28, 'years');
+--------------------+--+
|        _c0         |
+--------------------+--+
| I am 28 years old  |
+--------------------+--+

22. 正則解析函數：regexp_extract

語法: regexp_extract(string subject, string pattern, int index)
返回值: string
描述: 將字符串subject，按照pattern正則表達式拆分，返回index指定的部分

0: jdbc:hive2://node03:10000> select regexp_extract('foothebar', 'foo(.*?)(bar)', 1) as pattern1,
. . . . . . . . . . . . . . > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) as pattern2,
. . . . . . . . . . . . . . > regexp_extract('foothebar', 'foo(.*?)(bar)', 0) as pattern0;
+-----------+-----------+------------+--+
| pattern1  | pattern2  |  pattern0  |
+-----------+-----------+------------+--+
| the       | bar       | foothebar  |
+-----------+-----------+------------+--+

23. 正則替換函數: regexp_replace

語法: regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)
返回值: string
描述: 將原字符串INITIAL_STRING中，符合PATTERN規則的替換爲REPLACEMENT

0: jdbc:hive2://node03:10000> select regexp_replace('hello world, hello hive','world|hive','NextAction');
+-------------------------------------+--+
|                 _c0                 |
+-------------------------------------+--+
| hello NextAction, hello NextAction  |
+-------------------------------------+--+

24. 重複函數: repeat

語法: repeat(string str, int n)
返回值: string
描述: 重複n次字符串str

0: jdbc:hive2://node03:10000> select repeat('I love China.',3);
+------------------------------------------+--+
|                   _c0                    |
+------------------------------------------+--+
| I love China.I love China.I love China.  |
+------------------------------------------+--+

25. 反轉函數：reverse

語法: reverse(string A)
返回值: string
描述: 反轉字符串A中的字符順序

0: jdbc:hive2://node03:10000> select reverse('NextAction');
+-------------+--+
|     _c0     |
+-------------+--+
| noitcAtxeN  |
+-------------+--+

26. 句子轉單詞函數：sentences

語法：sentences(string str, string lang, string locale)
返回值: array<array>
描述：將句子拆分爲包含單詞的數組，lang和locale是可選參數

0: jdbc:hive2://node03:10000> select sentences('Hello there! How are you?');
+------------------------------------------+--+
|                   _c0                    |
+------------------------------------------+--+
| [["Hello","there"],["How","are","you"]]  |
+------------------------------------------+--+

27. 空格生成函數：space

語法：space(int n)
返回值: string
描述：返回n個空格

0: jdbc:hive2://node03:10000> select concat('hello world', space(5), '!');
+--------------------+--+
|        _c0         |
+--------------------+--+
| hello world     !  |
+--------------------+--+

28. 字符串分割函數：split

語法: split(string str, string pat)
返回值: array
描述：按照pat分隔字符串str

0: jdbc:hive2://node03:10000> select split('abxefxmn', 'x');
+-------------------+--+
|        _c0        |
+-------------------+--+
| ["ab","ef","mn"]  |
+-------------------+--+

29. 字符串轉鍵值對函數：str_to_map

語法: str_to_map(text[, delimiter1, delimiter2])
返回值: map<string,string>
描述：將text分割爲K-V鍵值對。delimiter1將text分割爲多個K-V對，delimiter2再將K-V對分割。delimiter1缺省爲逗號(’,’)，delimiter2缺省爲冒號(’:’)

0: jdbc:hive2://node03:10000> select str_to_map('name:NextAction,age:28,location:China', ',', ':');
+----------------------------------------------------+--+
|                        _c0                         |
+----------------------------------------------------+--+
| {"location":"China","name":"NextAction","age":"28"} |
+----------------------------------------------------+--+

30. 字符串截取函數：substr / substring

語法: substr(string|binary A, int start, int len) / substring(string|binary A, int start, int len)
返回值: string
描述：將字符串A，從start位置開始，截取len位。如缺省len，默認截取到最後

0: jdbc:hive2://node03:10000> select substr('NextAction', 5) as sub_5,
. . . . . . . . . . . . . . > substr('NextAction', 5,1) as sub_5_1;
+---------+----------+--+
|  sub_5  | sub_5_1  |
+---------+----------+--+
| Action  | A        |
+---------+----------+--+

31. 字符替換函數: translate

0: jdbc:hive2://node03:10000> select translate('N1xtA23ion', '123', 'ect');
+-------------+--+
|     _c0     |
+-------------+--+
| NextAction  |
+-------------+--+