python-爬蟲基礎-lxml.etree(1)

原創

Aldeo

2020-06-22 15:33

這是一個關於使用 lxml.etree 進行 XML 處理的教程。本文簡要介紹了 ElementTree API 的主要概念，以及一些簡單的增強功能。

導入 lxml.etree 的常用方法如下:

from lxml import etree

The Element class

（1）創建跟節點

Element 是 ElementTree API 的主容器對象。大多數 XML 樹功能都是通過這個類訪問的。元素很容易通過 Element factory 創建:

root = etree.Element("root")
#元素的 XML 標記名稱通過標記屬性訪問:
print(root.tag)

返回值

root

（2）創建子元素

創建子元素並將它們添加到父元素中

#方法1
root.append( etree.Element("child1") )
child2 = etree.SubElement(root, "child2")
child3 = etree.SubElement(root, "child3")
#方法2
print(etree.tostring(root, pretty_print=True))

返回結果

b'<root>\n  <child1/>\n  <child2/>\n  <child3/>\n</root>\n'

（3）訪問元素列表

下面代碼介紹訪問子元素，查詢子元素個數，判斷是否存在子元素，以及增加子元素到指定位置等。

#訪問子元素
#打印某個節點
child = root[0]
print(child.tag)
#打印節點元素的子元素個數
print(len(root))
#打印某個節點的index
print(root.index(root[1]))
#root賦值給children
children = list(root)
for child in children:
    print(child.tag)
#某個節點前增加元素
root.insert(0, etree.Element("child0"))
start = root[:1]
end   = root[-1:]
print(start[0].tag)
print(end[0].tag)

#判斷是否有子元素
print(etree.iselement(root))  # test if it's some kind of Element
if len(root):                 # test if it has children
    print("The root element has children")
#遍歷節點
for child in root:
    print(child.tag)

返回結果

3
1
child1
child2
child3
child0
child3
True
The root element has children
child0
child1
child2
child3

（4）元素屬性字典

#屬性字典
#直接在元素工廠創建屬性
root = etree.Element("root", interesting="totally")
print(etree.tostring(root))
#打印屬性的value
print(root.get("interesting"))
print(root.get("hello"))
#創建屬性
root.set("hello", "Huhu")
print(root.get("hello"))
etree.tostring(root)
#打印屬性的key
print(sorted(root.keys()))
#遍歷打印所有屬性
for name, value in sorted(root.items()):
    print('%s = %r' % (name, value))
'''
對於需要查找項目或者有其他原因需要獲得一個類似於“真實”
字典的對象的情況，例如，爲了傳遞它，可以使用 attrib 屬性:
'''
attributes = root.attrib
print(attributes["interesting"])
attributes["hello"] = "Aldeo Zhang"
print(attributes["hello"])
print(root.get("hello"))
'''
請注意，attrib 是由 Element 本身支持的類似 dict 的對象。 
這意味着元素的任何變化都反映在屬性中，反之亦然。 這還意味着，
只要使用其中一個元素的屬性，XML 樹就會一直存在於內存中。 
要獲得不依賴於 XML 樹的屬性的獨立快照，請將其複製到 dict 中:
'''
d = dict(root.attrib)
sorted(d.items())

返回結果

b'<root interesting="totally"/>'
totally
None
Huhu
['hello', 'interesting']
hello = 'Huhu'
interesting = 'totally'
totally
Aldeo Zhang
Aldeo Zhang

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python-爬蟲基礎-lxml.etree(1)

The Element class

（1）創建跟節點

（2）創建子元素

（3）訪問元素列表

（4）元素屬性字典

Android啓動過程-萬字長文(Android14)

【SQL進階】CASE語句的使用

這種嵌套字典類型的數據，我想把它讀取到df裏，如何操作？

微調真的能讓LLM學到新東西嗎:引入新知識可能讓模型產生更多的幻覺

iNeuOS工業互聯網操作系統，增加電力IEC104協議

微服務實踐k8s&dapr開發部署實驗（3）訂閱發佈

kbgressdb之數據結構V0.2

Spring註解@NoNull、@NotEmpty、@NotBlank註解無效以及嵌套對象的無效解決辦法

7個設計模式在Spring中的應用

python-爬蟲基礎-lxml.etree(2)

全國各省手機號測試用例

python-爬蟲基礎-lxml.etree(6)-The E-factory

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結