Hive內嵌字符處理函數:get_json_object,parse_url

 

1.Hive內嵌函數對Json字符和網址的解析處理

Return Type

Name(Signature)

Description

string parse_url(string urlString, string partToExtract [, string keyToExtract])

Returns the specified part from the URL. Valid values for partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. For example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') returns 'facebook.com'. Also a value of a particular key in QUERY can be extracted by providing the key as the third argument, for example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') returns 'v1'..

尖叫提示:

返回從URL中抽取指定部分的內容,參數url是URL字符串,而參數partToExtract是要抽取的部分,這個參數包含(HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO,

例如:parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') ='facebook.com',如果參數partToExtract值爲QUERY則必須指定第三個參數key  如:parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') =‘v1’

        parse_url(‘http://facebook.com/path/p1.php?query=1‘, ‘HOST’)返回’facebook.com’ ,
        parse_url(‘http://facebook.com/path/p1.php?query=1‘, ‘PATH’)返回’/path/p1.php’ ,
        parse_url(‘http://facebook.com/path/p1.php?query=1‘, ‘QUERY’)返回’query=1’

 

hive>

select  
parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k2') ,
parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') ,
parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY')

結果如下:

v2     v1    k1=v1&k2=v2

string

get_json_object(string json_string, string path)

Extracts json object from a json string based on json path specified, and returns json string of the extracted json object. It will return null if the input json string is invalid. NOTE: The json path can only have the characters [0-9a-z_], i.e., no upper-case or special characters. Also, the keys *cannot start with numbers.* This is due to restrictions on Hive column names..

從指定路徑上的JSON字符串抽取出JSON對象,並返回這個對象的JSON格式,如果輸入的JSON是非法的將返回NULL,注意此路徑上JSON字符串只能由數字 字母 下劃線組成且不能有大寫字母和特殊字符,且key不能由數字開頭,這是由於Hive對列名的限制

 

2.案例演示

1.get_Json_object 

1.src_json表中數據格式:字段名json,如下

{"store":
  {"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
   "bicycle":{"price":19.95,"color":"red"}
  },
 "email":"amy@only_for_json_udf_test.net",
 "owner":"amy"
}

2.代碼演示

hive> SELECT get_json_object(src_json.json, '$.owner') FROM src_json;
amy
 
hive> SELECT get_json_object(src_json.json, '$.store.fruit\[0]') FROM src_json;
{"weight":8,"type":"apple"}
 
hive> SELECT get_json_object(src_json.json, '$.non_exist_key') FROM src_json;
NULL

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章