問題: (python)Xpath如何提取html標籤(HTML標籤和內容)
描述:
<div>
<table>
<tr>
<td>Row value 1</td>
<td>Row value 2</td>
</tr>
<tr>
<td>Row value 3</td>
<td>Row value 4</td>
</tr>
<tr>
<td>Row value 1</td>
<td>Row value 1</td>
</tr>
</table>
</div>
如何把table標籤提取出來,結果如下:
<table>
<tr>
<td>Row value 1</td>
<td>Row value 2</td>
</tr>
<tr>
<td>Row value 3</td>
<td>Row value 4</td>
</tr>
<tr>
<td>Row value 1</td>
<td>Row value 1</td>
</tr>
</table>
代碼如下:
selector = etree.HTML(html)
content = selector.xpath('//div/table')[0]
print(content)
# <Element div at 0x1bce7463548>
# 即:如何將Element對象轉成str類型
解決方案1:
BeautifulSoup的find
解決方案2:
from lxml.html import fromstring, tostring
# fromstring返回一個HtmlElement對象
# selector = fromstring(html)
selector = etree.HTML(html)
content = selector.xpath('//div/table')[0]
print(content)
# tostring方法即可返回原始html標籤
original_html = tostring(content)
解決方案3:
[div/table]就行吧貌似
解決方案4
from lxml import etree
div = etree.HTML(html)
table = div.xpath('//div/table')[0]
content = etree.tostring(table,print_pretty=True, method='html') # 轉爲字符串
以上介紹了“ (python)Xpath如何提取html標籤(HTML標籤和內容)”的問題解答,希望對有需要的網友有所幫助。
本文網址鏈接:http://www.codes51.com/itwd/4510100.html