python用字符串操作20行代碼簡單爬蟲入門+案例（爬取一章《三體》小說）

原創

盖世英雄Zz

2020-06-20 20:36

三體是一部超級優秀的科幻小說

所需要的簡單的方法

1、#導入專用包

import urllib.request

2、try…except..

try:
   語句1....
except Exception as e:
    語句2...
嘗試執行語句1，執行不成功就執行語句2

3、urlopen獲取內容

response =urllib.request.urlopen(webList)
#獲取webList頁面的內容

4、read()讀取

response.read()
#讀取獲取的內容

5、decode解碼

decode('UTF-8')
#用utf-8的方式解碼

6、替換方法

html = html.expandtabs()
#html內容替換所有的製表符爲空

html =html.replace(' ','')
#替換掉所有的空格

7、獲取長度

lenth = len(html)
#獲取文檔的長度

8、find()查找方法

lenth = len(html)
#獲取文檔的長度

9、字符串的截取

html[0:index2]
#對整篇字符串進行截取

10、寫入 open..write

writeFile =open('三體.txt','w')
writeFile.write(htm)
#寫入文件

案例爬取一章《三體》小說

#導入專用包
import urllib.request
#需要連接的頁面
webList ='http://www.51shucheng.net/kehuan/santi/santi1/174.html'
#用try嘗試去連接
try:
    response =urllib.request.urlopen(webList)
    #如果能成功連接，並獲取內容，response就是我們所獲取的那個頁面
except Exception as e:
    print('獲取失敗')
    #否則就打印出‘獲取失敗’
html = str(response.read().decode('UTF-8'))
# 把獲取的內容讀取出來，並且用UTF-8解碼
html = html.expandtabs()
#替換掉所有的TAB符號
html =html.replace(' ','')
#替換掉所有的空格
print(html)
#可以打印出來預覽一下，方便進行定位
lenth = len(html)
#獲取文檔的長度
html = html[html.find('neirong">',0,lenth)+9:]
index =html.find('跟鞋。</p>',0)+3
index2 = html.find('眷戀着天空。</p>')
index3 =html.find('<p>“紅色聯合”的戰士們歡呼起來')
#找到一些關鍵位置，獲取索引，方便下邊進行定位
htm =str(html[0:index2]+html[index3:index])
#對整篇字符串進行截取
htm = htm.replace('<p>','    ')
htm = htm.replace('</p>','\n')
#把文中的<p></p>替換掉
writeFile =open('三體.txt','w')
writeFile.write(htm)
#寫入文件
print('寫入完成')

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python用字符串操作20行代碼簡單爬蟲入門+案例（爬取一章《三體》小說）

三體是一部超級優秀的科幻小說

所需要的簡單的方法

1、#導入專用包

2、try…except..

3、urlopen獲取內容

4、read()讀取

5、decode解碼

6、替換方法

7、獲取長度

8、find()查找方法

9、字符串的截取

10、寫入 open..write

案例爬取一章《三體》小說

python用字符串操作20行代碼簡單爬蟲入門+案例（爬取一章《三體》小說）

3、flask第三站-模板

What the Fuck?年薪30萬的碼農不如公務員

python正則表達式簡單爬蟲入門+案例（爬取貓眼電影TOP榜）

Celery: Unrecoverable error: AttributeError(“Can't pickle local object 'Pool.init.

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結