Hive建表語句如下:
CREATE EXTERNAL TABLE IF NOT EXISTS
students ( id int, name string, gender string, birthday Date, clazz string, phone string, loc string)
COMMENT 'student details'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\r\n' STORED AS TEXTFILE;
FAILED: SemanticException 5:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''\r\n''
這個的大意是僅支持 ‘\n’ 作爲行分割符,(不太理解,現在hive到2點多的版本爲什麼還不支持自定義行分隔符,不支持爲什麼還設置這個LINES TERMINATED BY 配置)
但如果我們的內容必須要其他分割符,我們可以通過設置mapred來間接設置分割符。
在hive cil中輸入:set textinputformat.record.linesep=\r\n;
建表語句可以中別寫 LINES TERMINATED BY
CREATE EXTERNAL TABLE IF NOT EXISTS
students ( id int, name string, gender string, birthday Date, clazz string, phone string, loc string)
COMMENT 'student details'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
完美解決,加載數據,顯示正常;
原因:在 hive cil 中輸入 describe extended students ;
裏面有兩條重要信息:
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
這表示:
默認狀態下,hive的輸入和輸出調用的類。
類TextInputFormat在hadoop-mapreduce-client-core-***.jar中。
重點看類中getRecordReader方法,該方法返回LineRecordReader對象。並且該方法中已經實現了接收自定義字符串作爲換行符的代碼,只要建表前在hive的CLI界面上輸入set textinputformat.record.delimiter=<自定義換行字符串>;即可實現自定義多字符換行符。
public class TextInputFormat extends FileInputFormat<LongWritable, Text>
implements JobConfigurable {
* * *
public RecordReader<LongWritable, Text> getRecordReader( InputSplit genericSplit, JobConf job,Reporter reporter) throws IOException {
reporter.setStatus(genericSplit.toString());
String delimiter = job.get("textinputformat.record.delimiter");
byte[] recordDelimiterBytes = null;
if (null != delimiter) {
recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);
}
return new LineRecordReader(job, (FileSplit) genericSplit, recordDelimiterBytes);
}
}