關於hive自定義JsonSerde處理json

Hive自身提供UDF函數對json數據格式解析的函數,即get_json_object(…)與json_tuple(…)支持json數據的操作,但是使用效果並不是非常理想。如果可以像普通hive建表指定字段映射到json中的key就太好了!幸好hive本身提供了數據序列化反序列化的接口Serde,開發者只需要自定義實現Serde接口實現自己的邏輯即可。下面介紹的是通過開源工具Hive-JSON-Serde-develop來實現的序列化反序列化操作實例。

步驟:

1、下載  hive-json-serde-0.2.jar

2、將jar包放入lib下或者自己新建自己的jar包存儲文件夾

3、在hive文件夾的conf文件夾中將 hive-env.sh.template改爲hive-env.sh去掉最後一行註釋加入你的jar包所在路徑

4、編寫hive建表語句

create table json_tab (
    `_area` string,
    `_name` string,
    `_sex`  string,
    `_uuid` string,
)
-- 指定Serde類
row format serde 'org.openx.data.jsonserde.JsonSerDe'

stored as textfile
-- 指定json數據位置
location '/data/test/json/';


注意如果沒有導包會出這樣的錯

Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498788221191_0007, Tracking URL = http://zj-db0236deMacBook-Pro.local:8088/proxy/application_1498788221191_0007/
Kill Command = /Users/zj-db0236/Downloads/hadoop-2.7.2/bin/hadoop job  -kill job_1498788221191_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-30 11:14:17,128 Stage-1 map = 0%,  reduce = 0%
2017-06-30 11:15:17,675 Stage-1 map = 0%,  reduce = 0%
2017-06-30 11:16:18,346 Stage-1 map = 0%,  reduce = 0%
2017-06-30 11:16:37,869 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_1498788221191_0007 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1498788221191_0007_m_000000 (and more) from job job_1498788221191_0007

Task with the most failures(4): 
-----
Task ID:
  task_1498788221191_0007_m_000000

URL:
  http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1498788221191_0007&tipid=task_1498788221191_0007_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154)
	... 22 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.JsonSerde not found
	at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:335)
	at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:353)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:123)
	... 22 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.JsonSerde not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
	at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:305)
	... 24 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec





發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章