最近在項目裏面需要從kafka推送的數據讀取數據,spark streaming處理,由於推過來的是Json字符串,需要轉換成dataFrame做進一步處理,但是Json字符串字段很多,而且還不固定;我想轉換代碼如下:
val NewsDF = sqlContext.createDataFrame(NewsRdd,classOf[News])
但是怎麼吧json字符串轉換成JavaBean呢。這麼多的字段,有沒有偷懶的方法呢?
jackson提供了很好的方法,曾經嘗試使用scala中的ObjectMapper,但是我的json是嵌套的複雜類型,出現了一個錯誤,在網上沒有找到很好的方法,無奈放棄了。
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.List;
import org.apache.commons.lang3.StringUtils;
import org.codehaus.jackson.JsonNode;
import org.codehaus.jackson.JsonParser;
import org.codehaus.jackson.JsonProcessingException;
import org.codehaus.jackson.map.DeserializationConfig;
import org.codehaus.jackson.map.ObjectMapper;
import org.codehaus.jackson.map.SerializationConfig;
import org.codehaus.jackson.map.annotate.JsonSerialize;
import org.codehaus.jackson.map.util.JSONPObject;
import org.codehaus.jackson.type.JavaType;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class JsonMapper {
private static Logger logger = LoggerFactory.getLogger(JsonMapper.class);
private ObjectMapper mapper;
public JsonMapper(JsonSerialize.Inclusion inclusion) {
mapper = new ObjectMapper();
//設置輸出時包含屬性的風格
mapper.setSerializationInclusion(inclusion);
//設置輸入時忽略在JSON字符串中存在但Java對象實際沒有的屬性
mapper.configure(DeserializationConfig.Feature.FAIL_ON_UNKNOWN_PROPERTIES, false);
//禁止使用int代表Enum的order()來反序列化Enum,非常危險
mapper.configure(DeserializationConfig.Feature.FAIL_ON_NUMBERS_FOR_ENUMS, true);
mapper.configure(JsonParser.Feature.ALLOW_SINGLE_QUOTES, true);
}
/**
* 創建輸出全部屬性到Json字符串的Mapper.
*/
public static JsonMapper buildNormalMapper() {
return new JsonMapper(JsonSerialize.Inclusion.ALWAYS);
}
/**
* 創建只輸出非空屬性到Json字符串的Mapper.
*/
public static JsonMapper buildNonNullMapper() {
return new JsonMapper(JsonSerialize.Inclusion.NON_NULL);
}
/**
* 創建只輸出初始值被改變的屬性到Json字符串的Mapper.
* */
public static JsonMapper buildNonDefaultMapper() {
return new JsonMapper(JsonSerialize.Inclusion.NON_DEFAULT);
}
/**
* 創建只輸出非Null且非Empty(如List.isEmpty)的屬性到Json字符串的Mapper.
*/
public static JsonMapper buildNonEmptyMapper() {
return new JsonMapper(JsonSerialize.Inclusion.NON_EMPTY);
}
/**
* 如果對象爲Null, 返回"null".
* 如果集合爲空集合, 返回"[]".
*/
public String toJson(Object object) {
try {
return mapper.writeValueAsString(object);
} catch (IOException e) {
throw NestedException.wrap(e);
}
}
/**
* 如果JSON字符串爲Null或"null"字符串, 返回Null.
* 如果JSON字符串爲"[]", 返回空集合.
*
* 如需讀取集合如List/Map, 且不是List<String>這種簡單類型時,先使用函數constructParametricType構造類型.
* @see #constructParametricType(Class, Class...)
*/
public <T> T fromJson(String jsonString, Class<T> clazz) {
if (StringUtils.isEmpty(jsonString)) {
return null;
}
try {
return mapper.readValue(jsonString, clazz);
} catch (IOException e) {
throw NestedException.wrap(e);
}
}
/**
* 如果JSON字符串爲Null或"null"字符串, 返回Null.
* 如果JSON字符串爲"[]", 返回空集合.
*
* 如需讀取集合如List/Map, 且不是List<String>這種簡單類型時,先使用函數constructParametricType構造類型.
* @see #constructParametricType(Class, Class...)
*/
@SuppressWarnings("unchecked")
public <T> T fromJson(String jsonString, JavaType javaType) {
if (StringUtils.isEmpty(jsonString)) {
return null;
}
try {
return (T) mapper.readValue(jsonString, javaType);
} catch (IOException e) {
throw NestedException.wrap(e);
}
}
@SuppressWarnings("unchecked")
public <T> T fromJson(String jsonString, Class<?> parametrized, Class<?>... parameterClasses) {
return (T) this.fromJson(jsonString, constructParametricType(parametrized, parameterClasses));
}
@SuppressWarnings("unchecked")
public <T> List<T> fromJsonToList(String jsonString, Class<T> classMeta){
return (List<T>) this.fromJson(jsonString,constructParametricType(List.class, classMeta));
}
@SuppressWarnings("unchecked")
public <T> T fromJson(JsonNode node, Class<?> parametrized, Class<?>... parameterClasses) {
JavaType javaType = constructParametricType(parametrized, parameterClasses);
try {
return (T) mapper.readValue(node, javaType);
} catch (IOException e) {
throw NestedException.wrap(e);
}
}
@SuppressWarnings("unchecked")
public <T> T pathAtRoot(String json, String path, Class<?> parametrized, Class<?>... parameterClasses){
JsonNode rootNode = parseNode(json);
JsonNode node = rootNode.path(path);
return (T) fromJson(node, parametrized, parameterClasses);
}
@SuppressWarnings("unchecked")
public <T> T pathAtRoot(String json, String path, Class<T> clazz){
JsonNode rootNode = parseNode(json);
JsonNode node = rootNode.path(path);
return (T) fromJson(node, clazz);
}
/**
* 構造泛型的Type如List<MyBean>, 則調用constructParametricType(ArrayList.class,MyBean.class)
* Map<String,MyBean>則調用(HashMap.class,String.class, MyBean.class)
*/
public JavaType constructParametricType(Class<?> parametrized, Class<?>... parameterClasses) {
return mapper.getTypeFactory().constructParametricType(parametrized, parameterClasses);
}
/**
* 當JSON裡只含有Bean的部分屬性時,更新一個已存在Bean,只覆蓋該部分的屬性.
*/
@SuppressWarnings("unchecked")
public <T> T update(T object, String jsonString) {
try {
return (T) mapper.readerForUpdating(object).readValue(jsonString);
} catch (JsonProcessingException e) {
logger.warn("update json string:" + jsonString + " to object:" + object + " error.", e);
} catch (IOException e) {
logger.warn("update json string:" + jsonString + " to object:" + object + " error.", e);
}
return null;
}
/**
* 輸出JSONP格式數據.
*/
public String toJsonP(String functionName, Object object) {
return toJson(new JSONPObject(functionName, object));
}
/**
* 設定是否使用Enum的toString函數來讀寫Enum,
* 為False時時使用Enum的name()函數來讀寫Enum, 默認為False.
* 注意本函數一定要在Mapper創建後, 所有的讀寫動作之前調用.
*/
public void setEnumUseToString(boolean value) {
mapper.configure(SerializationConfig.Feature.WRITE_ENUMS_USING_TO_STRING, value);
mapper.configure(DeserializationConfig.Feature.READ_ENUMS_USING_TO_STRING, value);
}
/**
* 取出Mapper做進一步的設置或使用其他序列化API.
*/
public ObjectMapper getMapper() {
return mapper;
}
public JsonNode parseNode(String json){
try {
return mapper.readValue(json, JsonNode.class);
} catch (IOException e) {
throw NestedException.wrap(e);
}
}
/**
* 輸出全部屬性
* @param object
* @return
*/
public static String toNormalJson(Object object){
return new JsonMapper(JsonSerialize.Inclusion.ALWAYS).toJson(object);
}
/**
* 輸出非空屬性
* @param object
* @return
*/
public static String toNonNullJson(Object object){
return new JsonMapper(JsonSerialize.Inclusion.NON_NULL).toJson(object);
}
/**
* 輸出初始值被改變部分的屬性
* @param object
* @return
*/
public static String toNonDefaultJson(Object object){
return new JsonMapper(JsonSerialize.Inclusion.NON_DEFAULT).toJson(object);
}
/**
* 輸出非Null且非Empty(如List.isEmpty)的屬性
* @param object
* @return
*/
public static String toNonEmptyJson(Object object){
return new JsonMapper(JsonSerialize.Inclusion.NON_EMPTY).toJson(object);
}
public void setDateFormat(String dateFormat){
mapper.setDateFormat(new SimpleDateFormat(dateFormat));
}
public static String toLogJson(Object object){
JsonMapper jsonMapper = new JsonMapper(JsonSerialize.Inclusion.NON_EMPTY);
jsonMapper.setDateFormat("yyyy-MM-dd HH:mm:ss");
return jsonMapper.toJson(object);
}
該方法提供了很好的擴展性,也支持複雜類型的特殊情況,包括內嵌的List,或者內嵌的Json對象等都支持。很強大,能讓我偷懶~~~~
val news = rdd.map(_._2).map( x => JsonMapper.buildNormalMapper.fromJson(x.trim, classOf[News]))
val sqlContext = new SQLContext(sc)
val newsDF = sqlContext.createDataFrame(news,classOf[News])
這樣子就很舒服,一個一個的去new,然後去set很多的值,不存在的。這樣子你只需要邊緣OB一波就行啦。