JsonSerde - a read/write SerDe for JSON Data

Build Status:

master : Build Status
develop:Build Status
This library enables Apache Hive to read and write in JSON format. It includes support for serialization and deserialization (SerDe) as well as JSON conversion UDF.

Features

Read data stored in JSON format
Convert data to JSON format during INSERT INTO


Support for JSON arrays and maps
Support for nested data structures
Support for Cloudera’s Distribution Including Apache Hadoop (CDH)
Support for multiple versions of Hadoop
Installation

Download the latest binaries (json-serde-X.Y.Z-jar-with-dependencies.jar and json-udf-X.Y.Z-jar-with-dependencies.jar) from congiu.net/hive-json-serde. Choose the correct verson for CDH 4, CDH 5 or Hadoop 2.3. Place the JARs into hive/lib or use ADD JAR in Hive.

JSON Data Files

Upload JSON files to HDFS with hadoop fs -put or LOAD DATA LOCAL. JSON records in data files must appear one per line, an empty line would produce a NULL record. This is because Hadoop partitions files as text using CR/LF as a separator to distribute work.

The following example will work.

{ “key” : 10 }
{ “key” : 20 }
The following example will not work.

{
“key” : 10
}
{
“key” : 20
}
Loading a JSON File and Querying Data

Uses json-serde/src/test/scripts/test-without-cr-lf.json.

~$ cat test.json

{“text”:”foo”,”number”:123}
{“text”:”bar”,”number”:345}

$ hadoop fs -put -f test.json /user/data/test.json

$ hive

hive> CREATE DATABASE test;

hive> CREATE EXTERNAL TABLE test ( text string )
ROW FORMAT SERDE ‘org.openx.data.jsonserde.JsonSerDe’
LOCATION ‘/user/data’;

hive> SELECT * FROM test;
OK

foo 123
bar 345
Querying Complex Fields

Uses json-serde/src/test/scripts/data.txt.

hive> CREATE DATABASE test;

hive> CREATE TABLE test (
one boolean,
three array,
two double,
four string )
ROW FORMAT SERDE ‘org.openx.data.jsonserde.JsonSerDe’
STORED AS TEXTFILE;

hive> LOAD DATA LOCAL INPATH ‘data.txt’ OVERWRITE INTO TABLE test;

hive> select three[1] from test;

gold
yellow
If you have complex json it can be tedious to create tables manually. Try hive-json-schema to build your schema from data.

See json-serde/src/test/scripts for more examples.

Defining Nested Structures

ADD JAR json-serde-1.3.7-SNAPSHOT-jar-with-dependencies.jar;

CREATE TABLE json_nested_test (
country string,
languages array,
religions map

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章