小白學Python ——最終測試：爬取數據

原創

zhaoluwei

2019-01-18 15:41

爬取一本三國演義：

import urllib.request

from bs4 import BeautifulSoup

import time

# 首先向第一個url發送請求，得到相應

url = 'http://www.shicimingju.com/book/sanguoyanyi.html'

headers ={

'User-Agent':' Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'

}

# 構建請求對象

request = urllib.request.Request(url=url, headers=headers)

# 發送請求得到響應

response = urllib.request.urlopen(request)

# 通過bs4進行解析響應

soup = BeautifulSoup(response.read(), 'html.parser')

print(soup)

# 解析內容，得到所有的章節標題還有每個章節的鏈接

oa_list = soup.select('.book-mulu > ul > li > a')

# print(oa_list)

# print(len(oa_list))

# 打開文件

fp = open('三國演義.txt', 'w', encoding='utf8')

# 循環遍歷oa_list，以次得到每一個a的內容和herf

for oa in oa_list:

# 獲取標題

title = oa.text

print('正在爬取--%s--....' % title)

# 獲取每一個a的鏈接

herf = 'http://www.shicimingju.com' + oa['href']

# 構建請求對象

title_request = urllib.request.Request(url = herf,headers = headers)

# 發送請求，得到響應

title_response = urllib.request.urlopen(title_request)

#解析響應

title_soup = BeautifulSoup(title_response.read(),'html.parser')

# 解析獲得內容

content = title_soup.select('.chapter_content')[0].text

# print(content)

# exit()

# 將title和content寫入到文件中

fp.write(title + content)

print('結束爬取--%s--' % title)

time.sleep(2)

# 關閉文件

fp.close()

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

小白學Python ——最終測試：爬取數據

爬取一本三國演義：

985 碩士程序員，空窗 4 個月沒有 Offer！

營銷系統黑名單優化：位圖的應用解析

一文搞懂 Spring 循環依賴

我真的從測試轉成了開發......

nginx添加相應配置，通過瀏覽器訪問或curl時返回客戶端對應公網IP

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

python內置函數——sorted

[oeasy]python020在遊戲中體驗數值自由_勇闖地下城_終端文字遊戲

爲何我建議你學會抄代碼

抖音面試：說說延遲任務的調度算法？

安裝pytorch,以及pip安裝好pytorch後在pycharm中報錯後的處理辦法

刪除列表中指定的所有重複元素------Python

Python實訓代碼2----12

小白學Python ——最終測試：爬取數據

小白學Python ——day12

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結