Hive建表出現 LINES TERMINATED BY only supports newline '\n' right now.解決辦法

原創

2020-02-25 17:01

Hive建表語句如下：

CREATE EXTERNAL  TABLE IF NOT EXISTS 
students ( id int, name string, gender string, birthday Date, clazz string, phone string, loc string) 
COMMENT 'student details' 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
LINES TERMINATED BY '\r\n' STORED AS TEXTFILE;

FAILED: SemanticException 5:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''\r\n''

這個的大意是僅支持 ‘\n’ 作爲行分割符，（不太理解，現在hive到2點多的版本爲什麼還不支持自定義行分隔符，不支持爲什麼還設置這個LINES TERMINATED BY 配置）
但如果我們的內容必須要其他分割符，我們可以通過設置mapred來間接設置分割符。

在hive cil中輸入：set textinputformat.record.linesep=\r\n;

建表語句可以中別寫 LINES TERMINATED BY

CREATE EXTERNAL  TABLE IF NOT EXISTS 
students ( id int, name string, gender string, birthday Date, clazz string, phone string, loc string) 
COMMENT 'student details' 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
STORED AS TEXTFILE;

完美解決，加載數據，顯示正常；

原因：在 hive cil 中輸入 describe extended students ;

裏面有兩條重要信息：
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
這表示：
默認狀態下，hive的輸入和輸出調用的類。
類TextInputFormat在hadoop-mapreduce-client-core-***.jar中。
重點看類中getRecordReader方法，該方法返回LineRecordReader對象。並且該方法中已經實現了接收自定義字符串作爲換行符的代碼，只要建表前在hive的CLI界面上輸入set textinputformat.record.delimiter=<自定義換行字符串>;即可實現自定義多字符換行符。

public class TextInputFormat extends FileInputFormat<LongWritable, Text>
  implements JobConfigurable {

  * * * 

  public RecordReader<LongWritable, Text> getRecordReader( InputSplit genericSplit, JobConf job,Reporter reporter)  throws IOException {
      reporter.setStatus(genericSplit.toString());
    String delimiter = job.get("textinputformat.record.delimiter");
    byte[] recordDelimiterBytes = null;
    if (null != delimiter) {
      recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);
    }
    return new LineRecordReader(job, (FileSplit) genericSplit,   recordDelimiterBytes);
  }
}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive建表出現 LINES TERMINATED BY only supports newline '\n' right now.解決辦法

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

關於接口協議，你必須要知道這些！

一鍵自動化博客發佈工具,用過的人都說好(頭條篇)

01 穩定性（一）如何應對事故並做好覆盤？

美團一面：項目中有 10000 個 if else 如何優化？想了半天，被問懵了！

FolkMq v1.4.6 發佈（可以內嵌的消息中間件）

京東面試：如何進行JVM調優？

線程池那些坑爹的參數-核心線程數&最大線程數&工作隊列

Stream流常用方法總結

梯度下降的原理與實踐

RESTful API講解

一元線性迴歸，多元線性迴歸，邏輯迴歸

git配置(gitee)

【踩坑無數】本地CDH5.14安裝指南

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結