拿到HTML網頁
html = etree.HTML(content)
編寫規則
html = etree.HTML(content)
divs = html.xpath('//div[@class="rank"]//span[@class="span"]')
print(type(divs))
print(divs)
divs返回一個列表,無法直接打印出數據:
<class 'list'>
[<Element span at 0x16d2edb2848>]
etree.HTML():構造了一個XPath解析對象並對HTML文本進行自動修正。
etree.tostring():輸出修正後的結果,類型是bytes
html = etree.HTML(content)
divs = html.xpath('//div[@class="rank"]//span[@class="span"]')
d = etree.tostring(divs,encoding='utf-8').encode('utf-8')
print(d)
報錯:TypeError: Type ‘list’ cannot be serialized.
Traceback (most recent call last):
File "E:/pycharm2019/Test/test.py", line 14, in <module>
d = etree.tostring(divs)
File "src/lxml/etree.pyx", line 3443, in lxml.etree.tostring
TypeError: Type 'list' cannot be serialized.
翻了很多都沒有找到同樣問題的解決,於是突然想起規則末尾加 /text()
html = etree.HTML(content)#HTML網頁
divs = html.xpath('//div[@class="rank"]//span[@class="span"]/text()')#XPATH提取數據
print(divs)#輸出數據
直接得到目標數據(根本不需要那句etree.tostring…被視頻教程誤導了)