Hive內嵌字符處理函數:regexp_extract,regexp_replace,split,replace,translate

1.Hive內嵌函數說明

        官方地址:hive函數大全官方地址

       Hive字符處理函數系列1:hive內嵌字符串函數1

String Functions

Return Type

Name(Signature)

Description

 

 

regexp_extract(string subject, string pattern, int index)

Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method..

抽取字符串subject中符合正則表達式pattern的第index個部分的子字符串,注意些預定義字符的使用,如第二個參數如果使用'\s'將被匹配到s,'\\s'纔是匹配空格

hive>   select   regexp_extract('foothebar', 'foo(.*?)(bar)', 2)   ---注意2是匹配分組的

             'bar.'

  regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)

Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. For example, regexp_replace("foobar", "oo|ar", "") returns 'fb.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc..

按照Java正則表達式PATTERN將字符串INTIAL_STRING中符合條件的部分成REPLACEMENT所指定的字符串,如裏REPLACEMENT這空的話,抽符合正則的部分將被去掉  如:regexp_replace("foobar", "oo|ar", "") = 'fb.' 注意些預定義字符的使用,如第二個參數如果使用'\s'將被匹配到s,'\\s'纔是匹配空格

string replace(string A, string OLD, string NEW) Returns the string A with all non-overlapping occurrences of OLD replaced with NEW (as of Hive 1.3.0 and 2.1.0). Example: select replace("ababab", "abab", "Z"); returns "Zab".
  split(string str, string pat)

Splits str around pat (pat is a regular expression)..

按照正則表達式pat來分割字符串str,並將分割後的數組字符串的形式返回

string

translate(string|char|varchar input, string|char|varchar from, string|char|varchar to)

Translates the input string by replacing the characters present in the from string with the corresponding characters in the to string. This is similar to the translate function in PostgreSQL. If any of the parameters to this UDF are NULL, the result is NULL as well. (Available as of Hive 0.10.0, for string types)

Char/varchar support added as of Hive 0.14.0.

解釋:

hive>select translate('abcdefga','abc','wo')

       wodefgw   注意,結果不是wodefga,

hive>select replace('abcdefga','abc','wo')

       wodefga   注意,兩個結果,這就是replace和translate的區別。

       

 

     

map<string,string>

str_to_map(text[, delimiter1, delimiter2])

Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ',' for delimiter1 and ':' for delimiter2.

解釋:

使用兩個分隔符將文本拆分爲鍵值對。

分隔符1將文本分成K-V對,分隔符2分割每個K-V對。對於分隔符1默認分隔符是 ',',對於分隔符2默認分隔符是 '='

hive> select 
str_to_map(concat(path_id,':',filter_name ))   ---是map類型的
from FDM_SOR.T_FIBA_MULTI_UBA_CFG_PATH_DETAIL_D 
group by path_id,filter_Name 

 

hive>

2.split,regexp_replace,regexp_extract的使用

2.1 split函數,支持使用正則表達式對字符串進行切割,返回值爲數組

SELECT 
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\?') [0] AS A,  
--對url進行使用?進行切割,返回值是數組,這裏取?前面的值。
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\?') [1] AS A1,
SPLIT('http://facebook.com/index.html','\\?') [0]   AS B ,
SPLIT('http://facebook.com/index.html','\\?') [1] AS B1,
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\\w\\/') [0] AS C1
FROM  FDM_SOR.T_PLPLFIS_TB_LOAN_APPLY_CEBANK_ED

 注意:所有正則表達式中的預定義字符比如?,},|等需要在這裏用\\進行反轉義才能表達本意。比如正則表達式中\w表示匹配字母,所以也屬於預定義字符,單獨的\w表示匹配的是字母w,而\\\w才表示匹配字母。

2.2. regexp_replace函數,比較簡單,難的是裏面參數正則表達式的書寫。

 select 
case when regexp_replace(uniscid,'[0-9A-HJ-NPQRTUWXY]{2}\\d{6}[0-9A-HJ-NPQRTUWXY]{10}','~~fbietl~~') = '~~fbietl~~' then uniscid 
       else null end uniscid,
from fdm_sor.aaaaaaaaaaaaaaa;

2.3 replace和translate的區別

  1. replace:字符串級別的代替
  2. translate:字符級別的代替
    hive>select translate('abcdefga','abc','wo')
    
           wodefgw   注意,結果不是wodefga,
    
    hive>select replace('abcdefga','abc','wo')
    
           wodefga   注意,兩個結果,這就是replace和translate的區別。

     

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章