python——爬蟲基礎知識

原創

2020-07-03 22:51

用BeautifulSoup 解析html和xml字符串

#!/usr/bin/python
# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
import re

#待分析字符串
html_doc = """
<html>
<head>
    <title>The Dormouse's story</title>
</head>
<body>
<p class="title aq">
    <b>
        The Dormouse's story
    </b>
</p>

<p class="story">Once upon a time there were three little sisters; and their names were
    <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
    <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> 
    and
    <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
    and they lived at the bottom of a well.
</p>

<p class="story">...</p>
"""


# html字符串創建BeautifulSoup對象
soup = BeautifulSoup(html_doc, 'html.parser', from_encoding='utf-8')

#輸出第一個 title 標籤
print soup.title

#輸出第一個 title 標籤的標籤名稱
print soup.title.name

#輸出第一個 title 標籤的包含內容
print soup.title.string

#輸出第一個 title 標籤的父標籤的標籤名稱
print soup.title.parent.name

#輸出第一個  p 標籤
print soup.p

#輸出第一個  p 標籤的 class 屬性內容
print soup.p['class']

#輸出第一個  a 標籤的  href 屬性內容
print soup.a['href']
'''
soup的屬性可以被添加,刪除或修改. 再說一次, soup的屬性操作方法與字典一樣
'''
#修改第一個 a 標籤的href屬性爲 http://www.baidu.com/
soup.a['href'] = 'http://www.baidu.com/'

#給第一個 a 標籤添加 name 屬性
soup.a['name'] = u'百度'

#刪除第一個 a 標籤的 class 屬性爲
del soup.a['class']

##輸出第一個  p 標籤的所有子節點
print soup.p.contents

#輸出第一個  a 標籤
print soup.a

#輸出所有的  a 標籤，以列表形式顯示
print soup.find_all('a')

#輸出第一個 id 屬性等於  link3 的  a 標籤
print soup.find(id="link3")

#獲取所有文字內容
print(soup.get_text())

#輸出第一個  a 標籤的所有屬性信息
print soup.a.attrs


for link in soup.find_all('a'):
    #獲取 link 的  href 屬性內容
    print(link.get('href'))

#對soup.p的子節點進行循環輸出    
for child in soup.p.children:
    print(child)

#正則匹配，名字中帶有b的標籤
for tag in soup.find_all(re.compile("b")):
    print(tag.name)

詳細手冊：

https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python——爬蟲基礎知識

【面試準備】又一次失敗的面試經歷，題目離譜～資深軟件測試工程師

python——爬蟲基礎知識

C++的函數c_str()的用法

01揹包問題【f [i][j] 爲恰好等於情況——數組實現】

01揹包問題【f [i][j] 爲不超過 j 情況——數組實現】

算法【二分搜索？啥時候用】

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結