pandas read_csv 錯誤： Buffer overflow caught - possible malformed input file.

一、錯誤現象

  File "/root/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1213, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:7988)
  File "pandas/parser.pyx", line 788, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8244)
  File "pandas/parser.pyx", line 842, in pandas.parser.TextReader._read_rows (pandas/parser.c:8970)
  File "pandas/parser.pyx", line 829, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8838)
  File "pandas/parser.pyx", line 1833, in pandas.parser.raise_parser_error (pandas/parser.c:22649)
pandas.parser.CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

二、錯誤原因

因此：

在Windows當源數據行尾有兩個字符“\r\n”時，且使用pandas的默認to_csv（默認line_terminator : string, default ‘n’）時，就會多出來一個“\r”字符。

說明：

\r —回車符Carriage return，光標移動到行首
\n—換行符line feed，光標移動到下一行New line

對於換行這個動作:
unix/linux下一般只有一個0x0A表示換行(“\n”),每行結尾只有“<換行>”，採用換行符LF表示下一行，即“\n”；
windows下一般都是0x0D和0x0A兩個字符(“\r\n”)，每行結尾是“<回車><換行>”，即“\r\n”；
蘋果機(MAC OS系統)則採用回車符CR表示下一行(\r),每行結尾是“<回車>”,採用回車符CR表示下一行，即“\r”。

三、錯誤解決方案（源頭上）

在使用to_csv時，使用“\r\n”作爲行分割符，讀取時使用默認read_csv讀取

寫：
data.to_csv("./temp/company_hold.csv", line_terminator="\r\n", index=False)

讀：
pandas.read_csv("./temp/company_hold.csv")

四、附錄

其它網友從結果層面提供的解決方案：

pandas.read_csv('./temp/test.csv', lineterminator='\n')

添加參數lineterminator='\n'，使用\n作爲換行符!讓跟\r沒有關係就ok了。

https://blog.csdn.net/qq_23392341/article/details/76851183

https://blog.csdn.net/leiting_imecas/article/details/68928553

pandas read_csv 錯誤： Buffer overflow caught - possible malformed input file.

hive任務RMContainerAllocator: REDUCE capability required is more than the supported max container

Python3讀取Hbase包hbase-thrift異常處理

Python連接Kafka問題彙總

在使用pandas 0.23.4對日期進行分組排序時報錯

【轉】推薦系統算法總結（一）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結