Python -BS4詳細介紹

Python 在處理html方面有很多的優勢,一般情況下是要先學習正則表達式的.
在應用過程中有很多模塊是非常方便的,先嚐試使用BeautifulSoup和Urllib進行網頁的處理,僅供學習.
首先列舉所需要導入的模塊:
from bs4 import BeautifulSoup # 處理獲取的網頁信息
import bs4                                    # 用於判讀各類類型
import os                                      #系統模塊,詳細信息整理於下一章節
import re                                      # 正則表達式,其實用不到
import time                                  # 時間模塊,用於設置超時處理等
from urllib import request              # 用於獲取網頁信息
相關操作:
url = 'HTTP://XXXX' # 定義網頁地址
respons = request.urlopen(url,data=None,timeout=2) # 打開地址
data = respons.read().decode('utf-8') # 讀取網頁信息
soup = BeautifulSoup(data, "html5lib")                          # 用BeautifulSoup 解析
href = soup.find_all('a',target = "XXXX") # BS4最重要的函數,獲取相關節點兒,詳細信息自行學習
###
剩下的就自己處理就行了.

於2018-6-5 補充如下：

關於解析器引用官方文檔截圖：

對象：

1. tag

tag中最重要的屬性: name和attributes

tag.name 和tag["XXX"]

2. tag.string 和 tag.strings 獲取字符內容

3. find_all( name , attrs , recursive , text , **kwargs )

name：tag的name

attrs ：屬性

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python -BS4詳細介紹

webservice:"The test form is only available for requests from the local machine"

Python -BS4詳細介紹

PB12.6發佈webservice-檢測到有潛在危險的 Request.Form 值

ORACLE EXP命令相關介紹

IIS配置PB開發的web設置回收機制

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結