Hive內嵌字符處理函數：regexp_extract,regexp_replace,split，replace，translate

1.Hive內嵌函數說明

String Functions

Return Type	Name(Signature)	Description
	regexp_extract(string subject, string pattern, int index)	Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method.. 抽取字符串subject中符合正則表達式pattern的第index個部分的子字符串，注意些預定義字符的使用，如第二個參數如果使用'\s'將被匹配到s,'\\s'纔是匹配空格 hive> select regexp_extract('foothebar', 'foo(.?)(bar)', 2) ---注意2是匹配分組的 'bar.'**
	regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)	Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. For example, regexp_replace("foobar", "oo\|ar", "") returns 'fb.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc.. 按照Java正則表達式PATTERN將字符串INTIAL_STRING中符合條件的部分成REPLACEMENT所指定的字符串，如裏REPLACEMENT這空的話，抽符合正則的部分將被去掉如：regexp_replace("foobar", "oo\|ar", "") = 'fb.' 注意些預定義字符的使用，如第二個參數如果使用'\s'將被匹配到s,'\\s'纔是匹配空格
string	replace(string A, string OLD, string NEW)	Returns the string A with all non-overlapping occurrences of OLD replaced with NEW (as of Hive 1.3.0 and 2.1.0). Example: select replace("ababab", "abab", "Z"); returns "Zab".
	split(string str, string pat)	Splits str around pat (pat is a regular expression).. 按照正則表達式pat來分割字符串str,並將分割後的數組字符串的形式返回
string	translate(string\|char\|varchar input, string\|char\|varchar from, string\|char\|varchar to)	Translates the input string by replacing the characters present in the `from` string with the corresponding characters in the `to` string. This is similar to the `translate` function in PostgreSQL. If any of the parameters to this UDF are NULL, the result is NULL as well. (Available as of Hive 0.10.0, for string types) Char/varchar support added as of Hive 0.14.0. 解釋： hive>select translate('abcdefga','abc','wo') wodefgw 注意，結果不是wodefga， hive>select replace('abcdefga','abc','wo') wodefga 注意，兩個結果，這就是replace和translate的區別。
map<string,string>	str_to_map(text[, delimiter1, delimiter2])	Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ',' for delimiter1 and ':' for delimiter2. 解釋：使用兩個分隔符將文本拆分爲鍵值對。分隔符1將文本分成K-V對，分隔符2分割每個K-V對。對於分隔符1默認分隔符是 ','，對於分隔符2默認分隔符是 '='。 hive> select str_to_map(concat(path_id,':',filter_name )) ---是map類型的 from FDM_SOR.T_FIBA_MULTI_UBA_CFG_PATH_DETAIL_D group by path_id,filter_Name hive>

2.split,regexp_replace,regexp_extract的使用

2.1 split函數，支持使用正則表達式對字符串進行切割，返回值爲數組

SELECT 
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\?') [0] AS A,  
--對url進行使用？進行切割，返回值是數組，這裏取？前面的值。
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\?') [1] AS A1,
SPLIT('http://facebook.com/index.html','\\?') [0]   AS B ,
SPLIT('http://facebook.com/index.html','\\?') [1] AS B1,
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\\w\\/') [0] AS C1
FROM  FDM_SOR.T_PLPLFIS_TB_LOAN_APPLY_CEBANK_ED

注意：所有正則表達式中的預定義字符比如？，}，|等需要在這裏用\\進行反轉義才能表達本意。比如正則表達式中\w表示匹配字母，所以也屬於預定義字符，單獨的\w表示匹配的是字母w，而\\\w才表示匹配字母。

2.2. regexp_replace函數，比較簡單，難的是裏面參數正則表達式的書寫。

 select 
case when regexp_replace(uniscid,'[0-9A-HJ-NPQRTUWXY]{2}\\d{6}[0-9A-HJ-NPQRTUWXY]{10}','~~fbietl~~') = '~~fbietl~~' then uniscid 
       else null end uniscid,
from fdm_sor.aaaaaaaaaaaaaaa;

2.3 replace和translate的區別

replace：字符串級別的代替

translate：字符級別的代替

hive>select translate('abcdefga','abc','wo')

       wodefgw   注意，結果不是wodefga，

hive>select replace('abcdefga','abc','wo')

       wodefga   注意，兩個結果，這就是replace和translate的區別。

Hive內嵌字符處理函數：regexp_extract,regexp_replace,split，replace，translate

1.Hive內嵌函數說明

String Functions

2.split,regexp_replace,regexp_extract的使用

Hive內嵌集合函數：size,map_keys,map_values,array_contains,sort_array等詳解

Hive內嵌字符處理函數：regexp_extract,regexp_replace,split，replace，translate

Hive內嵌表生成函數UDTF:explode,posexplode,json_tuple,parse_url_tuple,stack

Hive內嵌字符處理函數：get_json_object，parse_url

真正讓你明白Hive調優系列3：笛卡爾乘積,小表join大表，Mapjoin等問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結