1.Hive內嵌函數說明
官方地址:hive函數大全官方地址
Hive字符處理函數系列1:hive內嵌字符串函數1
String Functions
Return Type |
Name(Signature) |
Description |
---|---|---|
regexp_extract(string subject, string pattern, int index) |
Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method.. 抽取字符串subject中符合正則表達式pattern的第index個部分的子字符串,注意些預定義字符的使用,如第二個參數如果使用'\s'將被匹配到s,'\\s'纔是匹配空格 hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 2) ---注意2是匹配分組的 'bar.' |
|
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT) |
Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. For example, regexp_replace("foobar", "oo|ar", "") returns 'fb.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc.. 按照Java正則表達式PATTERN將字符串INTIAL_STRING中符合條件的部分成REPLACEMENT所指定的字符串,如裏REPLACEMENT這空的話,抽符合正則的部分將被去掉 如:regexp_replace("foobar", "oo|ar", "") = 'fb.' 注意些預定義字符的使用,如第二個參數如果使用'\s'將被匹配到s,'\\s'纔是匹配空格 |
|
string | replace(string A, string OLD, string NEW) | Returns the string A with all non-overlapping occurrences of OLD replaced with NEW (as of Hive 1.3.0 and 2.1.0). Example: select replace("ababab", "abab", "Z"); returns "Zab". |
split(string str, string pat) |
Splits str around pat (pat is a regular expression).. 按照正則表達式pat來分割字符串str,並將分割後的數組字符串的形式返回 |
|
string |
translate(string|char|varchar input, string|char|varchar from, string|char|varchar to) |
Translates the input string by replacing the characters present in the Char/varchar support added as of Hive 0.14.0. 解釋: hive>select translate('abcdefga','abc','wo') wodefgw 注意,結果不是wodefga, hive>select replace('abcdefga','abc','wo') wodefga 注意,兩個結果,這就是replace和translate的區別。
|
map<string,string> |
str_to_map(text[, delimiter1, delimiter2]) |
Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ',' for delimiter1 and ':' for delimiter2. 解釋: 使用兩個分隔符將文本拆分爲鍵值對。 分隔符1將文本分成K-V對,分隔符2分割每個K-V對。對於分隔符1默認分隔符是 ',',對於分隔符2默認分隔符是 '='。 hive> select
hive> |
2.split,regexp_replace,regexp_extract的使用
2.1 split函數,支持使用正則表達式對字符串進行切割,返回值爲數組
SELECT
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\?') [0] AS A,
--對url進行使用?進行切割,返回值是數組,這裏取?前面的值。
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\?') [1] AS A1,
SPLIT('http://facebook.com/index.html','\\?') [0] AS B ,
SPLIT('http://facebook.com/index.html','\\?') [1] AS B1,
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\\w\\/') [0] AS C1
FROM FDM_SOR.T_PLPLFIS_TB_LOAN_APPLY_CEBANK_ED
注意:所有正則表達式中的預定義字符比如?,},|等需要在這裏用\\進行反轉義才能表達本意。比如正則表達式中\w表示匹配字母,所以也屬於預定義字符,單獨的\w表示匹配的是字母w,而\\\w才表示匹配字母。
2.2. regexp_replace函數,比較簡單,難的是裏面參數正則表達式的書寫。
select
case when regexp_replace(uniscid,'[0-9A-HJ-NPQRTUWXY]{2}\\d{6}[0-9A-HJ-NPQRTUWXY]{10}','~~fbietl~~') = '~~fbietl~~' then uniscid
else null end uniscid,
from fdm_sor.aaaaaaaaaaaaaaa;
2.3 replace和translate的區別
- replace:字符串級別的代替
- translate:字符級別的代替
hive>select translate('abcdefga','abc','wo') wodefgw 注意,結果不是wodefga, hive>select replace('abcdefga','abc','wo') wodefga 注意,兩個結果,這就是replace和translate的區別。