解決python3 UnicodeEncodeError: 'gbk' codec can't encode character '\U0001f608' in position。。。

原創

2020-06-28 03:39

1、問題描述：

爬蟲後的網頁保存文件的時候，將uft-8的編碼寫入文檔，並輸出的時候，出現這了這個報錯，說gbk無法編碼\U0001f608

UnicodeEncodeError: 'gbk' codec can't encode character '\U0001f608' in position 76036: illegal multibyte sequence

2、測試代碼：

import urllib.request

res = urllib.request.urlopen('http://www.baidu.com')
htmlBytes = res.read()
print(htmlBytes.decode('utf-8'))

with open("test.html",'w') as f:
    f.write(htmlBytes.decode('utf-8'))

運行後：
print打印正確，寫入文件錯誤

3、錯誤分析：

通過查看網頁源碼

這說明網頁的確用的是utf-8
而open函數默認的編碼格式不是utf-8才導致保存失敗，我們只需要設置open函數的編碼格式爲utf-8就可以了。
查看open函數文檔
encoding is the name of the encoding used to decode or encode the
file. This should only be used in text mode. The default encoding is
platform dependent, but any encoding supported by Python can be
passed. See the codecs module for the list of supported encodings.
由此我這個時windows下默認編碼格式是gbk的，所以需要設置一下。

4、解決辦法：

4.1改變終端輸出的編碼格式爲utf-8：

#import io
#import sys
#sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') #改變標準輸出的默認編碼

4.2改變寫文件的編碼格式爲utf-8：

 with open(filename,'w',encoding="utf-8") as f:
        f.write(data)

4.3 修改後的代碼運行正常：

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

解決python3 UnicodeEncodeError: 'gbk' codec can't encode character '\U0001f608' in position。。。

1、問題描述：

2、測試代碼：

3、錯誤分析：

4、解決辦法：

4.1改變終端輸出的編碼格式爲utf-8：

4.2改變寫文件的編碼格式爲utf-8：

4.3 修改後的代碼運行正常：

[轉帖]cpupower

今天，昨天，近七天，近30天，近90天，js封裝

【安卓錯誤】Error while executing: am start -n 解決方案

【工具】損壞視頻文件修復MP4/MOV格式

JSP導入工程常見的問題彙總

Python3 錯誤：PermissionError: [Errno 13] Permission denied 如何解決？「xsl，csv」成功解決

Python安裝插件/pycharm安裝插件的方法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

解決python3 UnicodeEncodeError: 'gbk' codec can't encode character '\U0001f608' in position。。。

1、問題描述：

2、 測試代碼：

3、錯誤分析：

4、解決辦法：

4.1改變終端輸出的編碼格式爲utf-8：

4.2改變寫文件的編碼格式爲utf-8：

4.3 修改後的代碼運行正常：

2、測試代碼：