文件編碼檢測chardet及亂碼處理

原創

2020-06-19 23:12

def save_data(line):
    with open("new微博評論.csv","a+",newline="",encoding="utf-8") as f:
        f.write(line)
f = open("微博評論.csv","rb")#二進制格式讀文件
i = 0
while True:
    i += 1
    # print(i)
    line = f.readline()
    if not line:
        break
    else:
        try:
            n_line = line.decode('utf8')
            save_data(n_line)
        except Exception as e:
            print(type(e),e)
            print("=========================")
            print(i,line)

編碼檢查chardet

import chardet
 
def judge(data):
    return chardet.detect(data)["encoding"]
 
def error(e,q=1):
    input(e)
    if q:
        exit(0)
 
def trans(path):
    data = open(path, "rb").read()
    coding = judge(data)
    if coding == "GB2312":
        coding = "GBK"
    try:
        arr = [i.rstrip() for i in data.decode(coding).split("\n")]
        if len(arr) == 1:
            return [i for i in arr[0].split("\r")]
        return arr
    except Exception as e:
        print(e)
        error("[!] 無法使用此文本,請使用utf8編碼的文本")
 
print(trans("123.txt"))

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

pymongo.errors.CursorNotFound: cursor id 1058082xxxxxxxx not found mongo索引超時

一，超時原因數據量太大，mongo 的性能處理不過來數據在處理過程中太耗時二，解決方案爲find() 函數設置 no_cursor_timeout = True，表示遊標連接不會主動關閉（需要手動關閉） items

2020-07-08 12:43:13

python下載地址，windows和mac的都有下載地址

下載安裝的路徑我截圖了，這個是windows的，mac在最下面：地址：https://pan.baidu.com/s/1X7dB_D_xqL878cMeYSvCFw 提取碼：eofi 提取二維碼：下載成功後，新建文件夾，做

2020-07-08 07:54:27

Sublime text3修改tab鍵爲縮進四個空格

在用sublime寫python腳本時，如果混用空格和tab，可能會報錯，因此可以設置將tab改爲4個空格，以便統一格式。添加上圖紅框處代碼即可 # 設置保存時自動轉換 "expand_tabs_on_save": true

weixin_43178406

2020-07-06 18:37:46

datetime的操作

from datetime import date, time, datetime, timedelta, tzinfo 1. datetime模塊簡介 python中關於時間的格式：時間對象格式struct_time（

weixin_43178406

2020-07-06 18:01:18

sklearn中f1_score參數解析

1. f1_score sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average=‘binary’, sample_weight=None

weixin_43178406

2020-07-06 18:01:18

python使用flask封裝restful API

1. 簡介使用flask封裝，簡單來講就是將python文件引入flask。這樣的結果就是在網頁上輸入一個url就能得到結果。下面就講如何進行實現。 2. 任務1：求兩數之和 2.1 代碼講解新建一個server.py的文件，

weixin_43178406

2020-07-06 18:01:18

java與python類對比

1. 構造器方法和變量 super、self、this 4. python代碼實例 class Test(): # 類屬性 country = '中國' province = '陝西省' c

weixin_43178406

2020-07-06 18:01:18

os操作

os常用的爲: os.walk os.listdir os.path模塊假如test文件夾下有a、b兩個文件夾和一個ex.txt文本文檔。a文件夾下有a.txt和b.txt兩個文本文檔。

weixin_43178406

2020-07-06 18:01:18

jupyter notebook相關問題

1. Password or token 打開jupyter notebook，有時會出現下圖：如果不記得密碼，可以win+r後輸入jupyter notebook password修改密碼，連續輸入兩次密碼即可修改，然後重啓

weixin_43178406

2020-07-06 18:01:18

xlsxwriter/ExcelWriter

1. 將數據保存到excel result如下圖所示： from openpyxl.utils import get_column_letter result_copy = result.copy() # result是一個Da

weixin_43178406

2020-07-06 18:01:18

df.rank & pd.pivot_table & pd.read_excel & df添加行 &調整df行列順序(reindex再理解）

1. df.rank df.rank針對指定的序列進行排序（從大到小或從小到大），並返回排名的序列（從第一名到最後一名） rank有兩個重要參數：ascending、method。 ascending：爲True時，表示按從小到

weixin_43178406

2020-07-06 18:01:18

Paddle_程序員必備的數學知識_轉發

程序員——必備數學知識 !!!Attention 本博客轉發至百度aistudio的＜深度學習７日入門－cv疫情檢測＞，課程非常棒！本人力推！博客轉發地址：https://aistudio.baidu.com/aistudio

2020-07-06 10:23:55

一篇博客入門pandas模塊

一篇blog入門pandas pandas之於python就猶如屠龍刀之於江湖，沒有pandas的python之路註定是艱難的，而pandas的知識點兼具複雜和多樣這篇文章將帶領我們入門pandas，讓我們學會基礎的數據處理。

2020-07-06 10:23:55

無參裝飾器函數和帶參裝飾器函數

python裝飾器：下邊幾個裝飾器帶參數和不帶參數例子詳解：我們都知道，python中函數是可以被當做參數進行傳遞的，所以最直接的裝飾器實例如下： ① def decorator(func):#裝飾器函數 print('

ITcainiaoyizhan

2020-07-06 03:50:10

Flask，Django項目收發郵件及python的email和smtplib模塊收發郵件

Flask項目發送郵件： ①flask用其中的插件flask_mail發送郵件先在setting中相應的配置環境中設置郵箱服務器、密碼： MAIL_SERVER = "smtp.qq.com" #郵箱服務器

ITcainiaoyizhan

2020-07-06 03:50:10

24小時熱門文章

最新文章

最新評論文章