Hive内嵌字符处理函数:regexp_extract,regexp_replace,split,replace,translate

1.Hive内嵌函数说明

        官方地址:hive函数大全官方地址

       Hive字符处理函数系列1:hive内嵌字符串函数1

String Functions

Return Type

Name(Signature)

Description

 

 

regexp_extract(string subject, string pattern, int index)

Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method..

抽取字符串subject中符合正则表达式pattern的第index个部分的子字符串,注意些预定义字符的使用,如第二个参数如果使用'\s'将被匹配到s,'\\s'才是匹配空格

hive>   select   regexp_extract('foothebar', 'foo(.*?)(bar)', 2)   ---注意2是匹配分组的

             'bar.'

  regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)

Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. For example, regexp_replace("foobar", "oo|ar", "") returns 'fb.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc..

按照Java正则表达式PATTERN将字符串INTIAL_STRING中符合条件的部分成REPLACEMENT所指定的字符串,如里REPLACEMENT这空的话,抽符合正则的部分将被去掉  如:regexp_replace("foobar", "oo|ar", "") = 'fb.' 注意些预定义字符的使用,如第二个参数如果使用'\s'将被匹配到s,'\\s'才是匹配空格

string replace(string A, string OLD, string NEW) Returns the string A with all non-overlapping occurrences of OLD replaced with NEW (as of Hive 1.3.0 and 2.1.0). Example: select replace("ababab", "abab", "Z"); returns "Zab".
  split(string str, string pat)

Splits str around pat (pat is a regular expression)..

按照正则表达式pat来分割字符串str,并将分割后的数组字符串的形式返回

string

translate(string|char|varchar input, string|char|varchar from, string|char|varchar to)

Translates the input string by replacing the characters present in the from string with the corresponding characters in the to string. This is similar to the translate function in PostgreSQL. If any of the parameters to this UDF are NULL, the result is NULL as well. (Available as of Hive 0.10.0, for string types)

Char/varchar support added as of Hive 0.14.0.

解释:

hive>select translate('abcdefga','abc','wo')

       wodefgw   注意,结果不是wodefga,

hive>select replace('abcdefga','abc','wo')

       wodefga   注意,两个结果,这就是replace和translate的区别。

       

 

     

map<string,string>

str_to_map(text[, delimiter1, delimiter2])

Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ',' for delimiter1 and ':' for delimiter2.

解释:

使用两个分隔符将文本拆分为键值对。

分隔符1将文本分成K-V对,分隔符2分割每个K-V对。对于分隔符1默认分隔符是 ',',对于分隔符2默认分隔符是 '='

hive> select 
str_to_map(concat(path_id,':',filter_name ))   ---是map类型的
from FDM_SOR.T_FIBA_MULTI_UBA_CFG_PATH_DETAIL_D 
group by path_id,filter_Name 

 

hive>

2.split,regexp_replace,regexp_extract的使用

2.1 split函数,支持使用正则表达式对字符串进行切割,返回值为数组

SELECT 
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\?') [0] AS A,  
--对url进行使用?进行切割,返回值是数组,这里取?前面的值。
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\?') [1] AS A1,
SPLIT('http://facebook.com/index.html','\\?') [0]   AS B ,
SPLIT('http://facebook.com/index.html','\\?') [1] AS B1,
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\\w\\/') [0] AS C1
FROM  FDM_SOR.T_PLPLFIS_TB_LOAN_APPLY_CEBANK_ED

 注意:所有正则表达式中的预定义字符比如?,},|等需要在这里用\\进行反转义才能表达本意。比如正则表达式中\w表示匹配字母,所以也属于预定义字符,单独的\w表示匹配的是字母w,而\\\w才表示匹配字母。

2.2. regexp_replace函数,比较简单,难的是里面参数正则表达式的书写。

 select 
case when regexp_replace(uniscid,'[0-9A-HJ-NPQRTUWXY]{2}\\d{6}[0-9A-HJ-NPQRTUWXY]{10}','~~fbietl~~') = '~~fbietl~~' then uniscid 
       else null end uniscid,
from fdm_sor.aaaaaaaaaaaaaaa;

2.3 replace和translate的区别

  1. replace:字符串级别的代替
  2. translate:字符级别的代替
    hive>select translate('abcdefga','abc','wo')
    
           wodefgw   注意,结果不是wodefga,
    
    hive>select replace('abcdefga','abc','wo')
    
           wodefga   注意,两个结果,这就是replace和translate的区别。

     

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章