python-爬蟲基礎-lxml.etree(7)-ElementPath

原創

Aldeo

2019-09-07 17:03

Elementtree 庫附帶了一個簡單的類似 xpath 的路徑語言 ElementPath。主要區別在於可以在 ElementPath 表達式中使用{ namespace }標記表示法。但是，像值比較和函數這樣的高級特性是不可用的。

除了完整的 XPath 實現之外，lxml.etree 還以 ElementTree 的相同方式支持 ElementPath 語言，甚至使用(幾乎)相同的實現。這個 API 提供了四個方法，你可以在元素和元素樹上找到:

iterfind() Iterfind () iterates over all Elements that match the path expression 遍歷匹配該表達式的所有元素
findall() Findall () returns a list of matching Elements 返回匹配元素的列表
find() Find () efficiently returns only the first match 只返回第一個匹配項
findtext() Findtext () returns the 返回.text 。文本 content of the first match 第一場比賽的內容

以下是一些例子:

>>> root = etree.XML("<root><a x='123'>aText<b/><c/><b/></a></root>")

查找元素的子元素:

>>> print(root.find("b"))
None
>>> print(root.find("a").tag)
a

在樹的任何地方找到一個元素:

>>> print(root.find(".//b").tag)
b
>>> [ b.tag for b in root.iterfind(".//b") ]
['b', 'b']

查找具有特定屬性的元素:

>>> print(root.findall(".//a[@x]")[0].tag)
a
>>> print(root.findall(".//a[@y]"))
[]

在 lxml 3.4中，有一個新的幫助器爲元素生成結構化 ElementPath 表達式:

>>> tree = etree.ElementTree(root)
>>> a = root[0]
>>> print(tree.getelementpath(a[0]))
a/b[1]
>>> print(tree.getelementpath(a[1]))
a/c
>>> print(tree.getelementpath(a[2]))
a/b[2]
>>> tree.find(tree.getelementpath(a[2])) == a[2]
True

只要樹沒有被修改，這個路徑表達式就表示一個給定元素的標識符，這個標識符以後可以用來在同一個樹中找到它。與 XPath 相比，ElementPath 表達式具有自包含的優勢，即使對於使用名稱空間的文檔也是如此。

Iter ()方法是一種特殊情況，它只根據名稱查找樹中的特定標記，而不是根據路徑查找。這意味着以下命令在成功案例中是等效的:

>>> print(root.find(".//b").tag)
b
>>> print(next(root.iterfind(".//b")).tag)
b
>>> print(next(root.iter("b")).tag)
b

請注意。如果沒有找到匹配，find ()方法只返回 None，而其他兩個示例將引發 StopIteration 異常。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python-爬蟲基礎-lxml.etree(7)-ElementPath

Spring註解@NoNull、@NotEmpty、@NotBlank註解無效以及嵌套對象的無效解決辦法

7個設計模式在Spring中的應用

python-爬蟲基礎-lxml.etree(2)

全國各省手機號測試用例

python-爬蟲基礎-lxml.etree(6)-The E-factory

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結