HTML內容查找方法-Python

MOOC課程學習筆記
課程鏈接:https://www.bilibili.com/video/BV1ME411E7jE?p=1

目標網站的標籤結構

<html>

<head>
	<title>This is a python demo page</title>
</head>

<body>
	<p class="title"><b>The demo python introduces several python courses.</b></p>
	<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to
		professional by tracking the following courses:
		<a href="http://www.icourse163.org/course/BIT-268001" class="py1" id="link1">Basic Python</a> and <a
			href="http://www.icourse163.org/course/BIT-1001870001" class="py2" id="link2">Advanced Python</a>.</p>
</body>

</html>

內容查找方法

import requests
import re
from bs4 import BeautifulSoup
r = requests.get("https://python123.io/ws/demo.html")
soup = BeautifulSoup(r.text,'html.parser')
#查找HTML中的a標籤
print(soup.findAll("a")) 
#查找HTML中的a與b標籤
print(soup.findAll(['a','b'])) 
# #findAll參數爲True時返回所有標籤
for tag in soup.findAll(True):
    print(tag.name)
# #利用正則表達式查找以b爲開頭的標籤
for tag in soup.findAll(re.compile('b')):
    print(tag.name)
#查找p中包含course屬性的標籤
for tag in soup.findAll('p',attrs='course'):
    print(tag)
#查找屬性域中包含link1的標籤
for tag in soup.findAll(id='link1'):
    print(tag)
#利用正則表達式查找屬性域中所有包含link的標籤
for tag in soup.findAll(id=re.compile('link')):
    print(tag)
#在字符串區域中檢索指定字符串
print(soup.findAll(string = 'Basic Python'))
print(soup.findAll(text="Basic Python"))
print(soup.findAll(text=re.compile('python')))
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章