老司機帶你30行代碼爬糗事百科成人版

原創

2018-10-13 00:29

學了大概一個月爬蟲了，看着人家爬東爬西的，自己也按捺不住終於寫好了自己的爬蟲，而且是福利哦。
這裏我們主要用到了requests庫，推薦大家用python 3.0+以上版本。

import urllib
import re
import requests
from requests.exceptions import RequestException
#這裏是我們要爬的網址，爲了示例只爬取20頁
for j in range(1, 20):
    url = 'http://www.qiubaichengren.com/' + str(j) + '.html'
    #得到網頁源代碼
    def get_page_index(url):

        try:
            response=requests.get(url)
            if response.status_code==200:
                return response.content.decode('gbk')
            else:
                return None
        except RequestException:
            print('its error')
            return None
    def download_img(html):
    #這裏使用正則匹配出我們要拿到圖片的網址
        pattern = re.compile('<img alt=.*? src="(.*?)".*? />', re.S)
        items = re.findall(pattern, html)
        x=0
        for item in items:
            print('正在下載中....')
            bytes = requests.get(item)
            f = open("f:/qiushibaike/" + str(x) + '.jpg', 'wb')
            f.write(bytes.content)
            x = x + 1
    def main():
        html=get_page_index(url)
        download_img(html)

    if __name__=='__main__':
        main()

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

老司機帶你30行代碼爬糗事百科成人版

HTML頁面關於高分屏的設置

北歐瑞典挪威芬蘭瑞士TikTok海外網紅與YouTube博主的合作模式

歐洲英國德國法國TikTok與YouTube海外網紅達人的完美合作策略

druid數據源 xml配置

老司機帶你30行代碼爬糗事百科成人版

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結