Build Status:
master : Build Status
develop:Build Status
This library enables Apache Hive to read and write in JSON format. It includes support for serialization and deserialization (SerDe) as well as JSON conversion UDF.
Features
Read data stored in JSON format
Convert data to JSON format during INSERT INTO
Support for JSON arrays and maps
Support for nested data structures
Support for Cloudera’s Distribution Including Apache Hadoop (CDH)
Support for multiple versions of Hadoop
Installation
Download the latest binaries (json-serde-X.Y.Z-jar-with-dependencies.jar and json-udf-X.Y.Z-jar-with-dependencies.jar) from congiu.net/hive-json-serde. Choose the correct verson for CDH 4, CDH 5 or Hadoop 2.3. Place the JARs into hive/lib or use ADD JAR in Hive.
JSON Data Files
Upload JSON files to HDFS with hadoop fs -put or LOAD DATA LOCAL. JSON records in data files must appear one per line, an empty line would produce a NULL record. This is because Hadoop partitions files as text using CR/LF as a separator to distribute work.
The following example will work.
{ “key” : 10 }
{ “key” : 20 }
The following example will not work.
{
“key” : 10
}
{
“key” : 20
}
Loading a JSON File and Querying Data
Uses json-serde/src/test/scripts/test-without-cr-lf.json.
~$ cat test.json
{“text”:”foo”,”number”:123}
{“text”:”bar”,”number”:345}
$ hadoop fs -put -f test.json /user/data/test.json
$ hive
hive> CREATE DATABASE test;
hive> CREATE EXTERNAL TABLE test ( text string )
ROW FORMAT SERDE ‘org.openx.data.jsonserde.JsonSerDe’
LOCATION ‘/user/data’;
hive> SELECT * FROM test;
OK
foo 123
bar 345
Querying Complex Fields
Uses json-serde/src/test/scripts/data.txt.
hive> CREATE DATABASE test;
hive> CREATE TABLE test (
one boolean,
three array,
two double,
four string )
ROW FORMAT SERDE ‘org.openx.data.jsonserde.JsonSerDe’
STORED AS TEXTFILE;
hive> LOAD DATA LOCAL INPATH ‘data.txt’ OVERWRITE INTO TABLE test;
hive> select three[1] from test;
gold
yellow
If you have complex json it can be tedious to create tables manually. Try hive-json-schema to build your schema from data.
See json-serde/src/test/scripts for more examples.
Defining Nested Structures
ADD JAR json-serde-1.3.7-SNAPSHOT-jar-with-dependencies.jar;
CREATE TABLE json_nested_test (
country string,
languages array,
religions map