Matplotlib Python 畫圖教程 (莫煩Python)
《零基礎入門學習Python》(小甲魚) P54-64
HTML
from urllib.request import urlopen
html = urlopen(URL).read().decode('utf-8') # 中文需decode()
print(html)
讀取網頁,然後用正則表達式選取內容。
BeautifulSoup
sudo pip3 install beautifulsoup4
sudo pip3 install lxml
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, features='lxml')
print(soup.h1)
all_href = soup.find_all('a')
all_href = [l['href'] for l in all_href]
print('\n', all_href)
BeautifulSoup CSS
month = soup.find_all('li', {"class": "month"})
for m in month:
print(m.get_text())
BeautifulSoup 正則
img_links = soup.find_all("img", {"src": re.compile('.*?\.jpg')})
Requests
sudo pip3 install requests
- get
- post
下載
from urllib.request import urlretrieve
import requests
下載大文件
爬蟲加速
- 多進程分佈式爬蟲
- 異步加載 Asyncio
高級爬蟲
- Selenium
- Scrapy