爬蟲之lxml報錯：ValueError: Unicode strings with encoding declaration are not supported. Please use bytes

文章目錄

最終的解決方法：

說明：

先說明一下，不要問我網站，因爲工作原因，網站不會給你，還望諒解。如果你使用lxml提取數據是報的錯誤和標題差不多，可以來參考參考我的解決方法，因爲我也是第一次遇見這種問題，所以記錄下來。

問題以及解決過程。

今天測試一個網站，然後遇見一個問題，使用reqest請求，直接使用resp.text，返回的數據是沒有問題的。測試代碼如下：

resp = requests.get(url,headers=headers)
resp_text = resp.text
html = etree.HTML(resp_text)

然後我是etree.HTML()提取函數就報錯。報錯就在html = etree.HTML(resp_text)這一行。

然後我又使用chardet的測試字節的編碼格式是gb2312，
測試代碼：

resp_text = resp.content
ren = chardet.detect(resp_text)
print(ren)

然後以爲需要顯解碼，但是直接text也打印正常呀，沒辦法，試試吧。

resp_text = resp.content.decode('gb2312')
tml = etree.HTML(resp_text)

報錯還是tml = etree.HTML(resp_text)這一行。

報錯代碼：

  File "src\lxml\etree.pyx", line 3170, in lxml.etree.HTML
  File "src\lxml\parser.pxi", line 1872, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

弄的這個問題心碎

最終的解決方法：

這也是嘗試好多次，才得以成功。經過多次測試，原來還需要將解密的字符串，在python中使用utf-8編碼一下傳入就可以了。

resp = requests.get(url,headers=headers)
resp_text = resp.content.decode('gb2312')
html = etree.HTML(resp_text.encode('utf-8'))

或者

resp = requests.get(url,headers=headers)
resp_text = resp.text
html = etree.HTML(resp_text.encode('utf-8'))

至此問題解決，之前直接傳入字符串是每頁問題的，估計是這個網站的編碼格式的問題，下次再遇見這種問題，優先嚐試這個解決方法了。

一個小問題弄了快倆小時了，唉。。。

如果幫助到你了，歡迎點個贊哈

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

爬蟲之lxml報錯：ValueError: Unicode strings with encoding declaration are not supported. Please use bytes

文章目錄

說明：

問題以及解決過程。

最終的解決方法：

window的dos命令學習筆記五

window的dos命令學習筆記三

window的dos命令學習筆記四

爬蟲之lxml報錯：ValueError: Unicode strings with encoding declaration are not supported. Please use bytes

python 之釘釘羣監控信息

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結