python 利用asynico+aiohttp模塊實現簡單的異步爬蟲

原創

2020-06-04 12:46

看了很多大佬的博客關於這點自己懂得太少，aiohttp這個庫的應用不是很熟練，比照別人的代碼自己也先實踐以後，後續需要看官方文檔來補充這點知識。

中文文檔
https://segmentfault.com/p/1210000013564725

自己比照別人代碼寫一個關於用aiohttp來實現的爬蟲代碼。

目標網站：
	http://www.ivsky.com/tupian/ziranfengguang/
	簡單爬取天堂圖片網的照片
邏輯就不講了，直接上個代碼

import time
import aiohttp
import asyncio
from scrapy import Selector

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36'
}


# 獲取網頁（文本信息）
async def fetch(session, url):
    async with session.get(url, headers=headers) as response:
        return await response.text(encoding='utf-8')

# 獲取每一頁的所有圖片路徑
async def url_parse(html):
    selector = Selector(text=html)
    url_list = selector.xpath('//ul[@class="ali"]//li//img/@src').extract()
    return url_list

# 進行圖片的下載
async def down_img(session, url_list):
    for each_url in img_list:
        print('程序正在採集%s' % each_url)
        async with session.get(each_url, headers=headers) as response:
            img_response = await response.read()
            with open('./image/%s.jpg' % time.time(), 'wb') as file:
                file.write(img_response)


# 開始執行抓取
async def start(url):
    async with aiohttp.ClientSession() as session:
        html = await fetch(session, url)  # 得到每一頁的html
        url_list = await url_parse(html)  # 解析得到每一頁的圖片url
        await down_img(session, url_list) # 進行圖片的下載


if __name__ == '__main__':
    each_url = "http://www.ivsky.com/tupian/ziranfengguang/index_{page}.html"
    full_urllist = [each_url.format(page=i) for i in range(1, 20)]
    event_loop = asyncio.get_event_loop()
    tasks = [start(url) for url in full_urllist]
    tasks = asyncio.wait(tasks)
    event_loop.run_until_complete(tasks)  # 等待任務結束

後續需要掌握這塊aiohttp庫的知識，今天只是分享了一下代碼。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python 利用asynico+aiohttp模塊實現簡單的異步爬蟲

MySQL Cluste—分佈式數據庫集羣搭建

阿里雲ECS服務器Ubuntu16.04 安裝MySQL並遠程訪問

Python--使用線程--批量文件的移動以及FTP共享自己文件夾

Python-基於布隆過濾器下URL去重實例。

python 利用asynico+aiohttp模塊實現簡單的異步爬蟲

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結