python異步爬蟲

參考微信公衆號

1、定義協程
  • (1)使用async定義了一個方法
  • (2)調用該方法,返回一個coroutine協成對象
  • (3)使用get_event_loop()方法創建一個事件循環loop
  • (4)調用loop對象的run_until_complete()方法將協程註冊到事件循環loop中,然後啓動
import asyncio
async def execute(x):
    print('Number:', x)
coroutine = execute(1)
print('Coroutine:', coroutine)
print('After calling execute')
loop = asyncio.get_event_loop()
loop.run_until_complete(coroutine)
print('After calling loop')
# Coroutine: <coroutine object execute at 0x0000000002919E48>
# After calling execute
# Number: 1
# After calling loop

(5)定義task對象,調用loop對象的create_task()方法將coroutine對象轉化爲task對象,隨後我們打印輸出一下,發現它是 pending 狀態。接着我們將 task 對象添加到事件循環中得到執行,隨後我們再打印輸出一下 task 對象,發現它的狀態就變成了 finished。

import asyncio
async def execute(x):
    print('Number:', x)
    return x
coroutine = execute(1)
print('Coroutine:', coroutine)
print('After calling execute')
loop = asyncio.get_event_loop()
task = loop.create_task(coroutine)
print('Task:', task)
loop.run_until_complete(task)
print('Task:', task)
print('After calling loop')
# Coroutine: <coroutine object execute at 0x0000000002919E48>
# After calling execute
# Task: <Task pending coro=<execute() running at E:/project/python基礎.py:4>>
# Number: 1
# Task: <Task finished coro=<execute() done, defined at E:/project/python基礎.py:4> result=1>
# After calling loop
  • (6)定義task對象的另一種方式通過asyncio的ensure_future()方法,返回的也是task對象
import asyncio
async def execute(x):
    print('Number:', x)
    return x
coroutine = execute(1)
print('Coroutine:', coroutine)
print('After calling execute')
task = asyncio.ensure_future(coroutine)
print('Task:', task)
loop = asyncio.get_event_loop()
loop.run_until_complete(task)
print('Task:', task)
print('After calling loop')
# Coroutine: <coroutine object execute at 0x0000000002929E48>
# After calling execute
# Task: <Task pending coro=<execute() running at E:/project/python基礎.py:2>>
# Number: 1
# Task: <Task finished coro=<execute() done, defined at E:/project/python基礎.py:2> result=1>
# After calling loop
2、綁定回調
  • (1)調用add_done_callback()方法爲某個task綁定一個回調方法。我們將 callback() 方法傳遞給了封裝好的 task 對象,這樣當 task 執行完畢之後就可以調用 callback() 方法了,同時 task 對象還會作爲參數傳遞給 callback() 方法,調用 task 對象的 result() 方法就可以獲取返回結果了
import asyncio
import requests
async def request():
    url = 'https://www.baidu.com'
    status = requests.get(url)
    return status
def callback(task):
    print('Status:', task.result())
coroutine = request()
task = asyncio.ensure_future(coroutine)
task.add_done_callback(callback)
print('Task:', task)
loop = asyncio.get_event_loop()
loop.run_until_complete(task)
print('Task:', task)
# Task: <Task pending coro=<request() running at E:/project/python基礎.py:4> cb=[callback() at E:/project/python基礎.py:8]>
# Status: <Response [200]>
# Task: <Task finished coro=<request() done, defined at E:/project/python基礎.py:4> result=<Response [200]>>
  • (2)直接調用task運行完畢之後直接調用result()方法獲取結果
import asyncio
import requests
async def request():
    url = 'https://www.baidu.com'
    status = requests.get(url)
    return status
coroutine = request()
task = asyncio.ensure_future(coroutine)
print('Task:', task)
loop = asyncio.get_event_loop()
loop.run_until_complete(task)
print('Task:', task)
print('Task Result:', task.result())
# Task: <Task pending coro=<request() running at E:/project/python基礎.py:3>>
# Task: <Task finished coro=<request() done, defined at E:/project/python基礎.py:3> result=<Response [200]>>
# Task Result: <Response [200]>
3、多任務協程
  • 定義一個task列表,然後使用asyncio的wait()方法即可執行;我們使用一個 for 循環創建了五個 task,組成了一個列表,然後把這個列表首先傳遞給了 asyncio 的 wait() 方法,然後再將其註冊到時間循環中,就可以發起五個任務了。最後我們再將任務的運行結果輸出出來
import asyncio
import requests
async def request():
    url = 'https://www.baidu.com'
    status = requests.get(url)
    return status
tasks = [asyncio.ensure_future(request()) for _ in range(5)]
print('Tasks:', tasks)
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
for task in tasks:
    print('Task Result:', task.result())
# Tasks: [<Task pending coro=<request() running at E:/project/python基礎.py:3>>, <Task pending coro=<request() running at E:/project/python基礎.py:3>>, <Task pending coro=<request() running at E:/project/python基礎.py:3>>, <Task pending coro=<request() running at E:/project/python基礎.py:3>>, <Task pending coro=<request() running at E:/project/python基礎.py:3>>]
# Task Result: <Response [200]>
# Task Result: <Response [200]>
# Task Result: <Response [200]>
# Task Result: <Response [200]>
# Task Result: <Response [200]>
4、協程實現
  • (1)使用 await 可以將耗時等待的操作掛起,讓出控制權。當協程執行的時候遇到 await,時間循環就會將本協程掛起,轉而去執行別的協程,直到其他的協程掛起或執行完畢。
import asyncio
import requests
import time
start = time.time()
async def get(url):
    return requests.get(url)
async def request():
    url = 'https://www.baidu.com'
    print('Waiting for', url)
    response = await get(url)
    print('Get response from', url, 'Result', response.status_code)
tasks = [asyncio.ensure_future(request()) for _ in range(5)]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
end = time.time()
print('Cost time:', end - start)
# Waiting for https://www.baidu.com
# Get response from https://www.baidu.com Result 200
# Waiting for https://www.baidu.com
# Get response from https://www.baidu.com Result 200
# Waiting for https://www.baidu.com
# Get response from https://www.baidu.com Result 200
# Waiting for https://www.baidu.com
# Get response from https://www.baidu.com Result 200
# Waiting for https://www.baidu.com
# Get response from https://www.baidu.com Result 200
# Cost time: 0.45502614974975586
5、使用aiohttp
  • aiohttp是一個支持異步請求的庫,利用它和asyncio配合我們可以非常方便的實現異步請求操作。
import asyncio
import aiohttp
import time
start = time.time()
async def get(url):
    session = aiohttp.ClientSession()
    response = await session.get(url)
    result = await response.text()
    await session.close()
    return result
async def request():
    url = 'http://www.newsmth.net/nForum/#!mainpage'
    print('Waiting for', url)
    result = await get(url)
    print('Get response from', url, 'Result:', result)
tasks = [asyncio.ensure_future(request()) for _ in range(5)]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
end = time.time()
print('Cost time:', end - start)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章