MOOC課程學習筆記
課程鏈接:https://www.bilibili.com/video/BV1ME411E7jE?p=1
目標網站的標籤結構
<html>
<head>
<title>This is a python demo page</title>
</head>
<body>
<p class="title"><b>The demo python introduces several python courses.</b></p>
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to
professional by tracking the following courses:
<a href="http://www.icourse163.org/course/BIT-268001" class="py1" id="link1">Basic Python</a> and <a
href="http://www.icourse163.org/course/BIT-1001870001" class="py2" id="link2">Advanced Python</a>.</p>
</body>
</html>
內容查找方法
import requests
import re
from bs4 import BeautifulSoup
r = requests.get("https://python123.io/ws/demo.html")
soup = BeautifulSoup(r.text,'html.parser')
#查找HTML中的a標籤
print(soup.findAll("a"))
#查找HTML中的a與b標籤
print(soup.findAll(['a','b']))
# #findAll參數爲True時返回所有標籤
for tag in soup.findAll(True):
print(tag.name)
# #利用正則表達式查找以b爲開頭的標籤
for tag in soup.findAll(re.compile('b')):
print(tag.name)
#查找p中包含course屬性的標籤
for tag in soup.findAll('p',attrs='course'):
print(tag)
#查找屬性域中包含link1的標籤
for tag in soup.findAll(id='link1'):
print(tag)
#利用正則表達式查找屬性域中所有包含link的標籤
for tag in soup.findAll(id=re.compile('link')):
print(tag)
#在字符串區域中檢索指定字符串
print(soup.findAll(string = 'Basic Python'))
print(soup.findAll(text="Basic Python"))
print(soup.findAll(text=re.compile('python')))