python異步IO併發單線程協程gevent

轉載自 https://www.liaoxuefeng.com/wiki/1016959663602400/1017959540289152

在學習異步IO模型前，我們先來了解協程。

協程，又稱微線程，纖程。英文名Coroutine。

協程的概念很早就提出來了，但直到最近幾年纔在某些語言（如Lua）中得到廣泛應用。

子程序，或者稱爲函數，在所有語言中都是層級調用，比如A調用B，B在執行過程中又調用了C，C執行完畢返回，B執行完畢返回，最後是A執行完畢。

所以子程序調用是通過棧實現的，一個線程就是執行一個子程序。

子程序調用總是一個入口，一次返回，調用順序是明確的。而協程的調用和子程序不同。

協程看上去也是子程序，但執行過程中，在子程序內部可中斷，然後轉而執行別的子程序，在適當的時候再返回來接着執行。

注意，在一個子程序中中斷，去執行其他子程序，不是函數調用，有點類似CPU的中斷。比如子程序A、B：

def A():
    print('1')
    print('2')
    print('3')

def B():
    print('x')
    print('y')
    print('z')

假設由協程執行，在執行A的過程中，可以隨時中斷，去執行B，B也可能在執行過程中中斷再去執行A，結果可能是：

1
2
x
y
3
z

但是在A中是沒有調用B的，所以協程的調用比函數調用理解起來要難一些。

看起來A、B的執行有點像多線程，但協程的特點在於是一個線程執行，那和多線程比，協程有何優勢？

最大的優勢就是協程極高的執行效率。因爲子程序切換不是線程切換，而是由程序自身控制，因此，沒有線程切換的開銷，和多線程比，線程數量越多，協程的性能優勢就越明顯。

第二大優勢就是不需要多線程的鎖機制，因爲只有一個線程，也不存在同時寫變量衝突，在協程中控制共享資源不加鎖，只需要判斷狀態就好了，所以執行效率比多線程高很多。

因爲協程是一個線程執行，那怎麼利用多核CPU呢？最簡單的方法是多進程+協程，既充分利用多核，又充分發揮協程的高效率，可獲得極高的性能。

Python對協程的支持是通過generator實現的。

在generator中，我們不但可以通過for循環來迭代，還可以不斷調用next()函數獲取由yield語句返回的下一個值。

但是Python的yield不但可以返回一個值，它還可以接收調用者發出的參數。

來看例子：

傳統的生產者-消費者模型是一個線程寫消息，一個線程取消息，通過鎖機制控制隊列和等待，但一不小心就可能死鎖。

如果改用協程，生產者生產消息後，直接通過yield跳轉到消費者開始執行，待消費者執行完畢後，切換回生產者繼續生產，效率極高：

def consumer():
    r = ''
    while True:
        n = yield r
        if not n:
            return
        print('[CONSUMER] Consuming %s...' % n)
        r = '200 OK'

def produce(c):
    c.send(None)
    n = 0
    while n < 5:
        n = n + 1
        print('[PRODUCER] Producing %s...' % n)
        r = c.send(n)
        print('[PRODUCER] Consumer return: %s' % r)
    c.close()

c = consumer()
produce(c)

執行結果：

[PRODUCER] Producing 1...
[CONSUMER] Consuming 1...
[PRODUCER] Consumer return: 200 OK
[PRODUCER] Producing 2...
[CONSUMER] Consuming 2...
[PRODUCER] Consumer return: 200 OK
[PRODUCER] Producing 3...
[CONSUMER] Consuming 3...
[PRODUCER] Consumer return: 200 OK
[PRODUCER] Producing 4...
[CONSUMER] Consuming 4...
[PRODUCER] Consumer return: 200 OK
[PRODUCER] Producing 5...
[CONSUMER] Consuming 5...
[PRODUCER] Consumer return: 200 OK

注意到consumer函數是一個generator，把一個consumer傳入produce後：

首先調用c.send(None)啓動生成器；
然後，一旦生產了東西，通過c.send(n)切換到consumer執行；
consumer通過yield拿到消息，處理，又通過yield把結果傳回；
produce拿到consumer處理的結果，繼續生產下一條消息；
produce決定不生產了，通過c.close()關閉consumer，整個過程結束。

整個流程無鎖，由一個線程執行，produce和consumer協作完成任務，所以稱爲“協程”，而非線程的搶佔式多任務。

最後套用Donald Knuth的一句話總結協程的特點：

“子程序就是協程的一種特例。”

這是python2的示例, 對python3的支持講自行查.

gevent

Python通過yield提供了對協程的基本支持，但是不完全。而第三方的gevent爲Python提供了比較完善的協程支持。

gevent是第三方庫，通過greenlet實現協程，其基本思想是：

當一個greenlet遇到IO操作時，比如訪問網絡，就自動切換到其他的greenlet，等到IO操作完成，再在適當的時候切換回來繼續執行。由於IO操作非常耗時，經常使程序處於等待狀態，有了gevent爲我們自動切換協程，就保證總有greenlet在運行，而不是等待IO。

由於切換是在IO操作時自動完成，所以gevent需要修改Python自帶的一些標準庫，這一過程在啓動時通過monkey patch完成：

from gevent import monkey; monkey.patch_socket()
import gevent

def f(n):
    for i in range(n):
        print gevent.getcurrent(), i

g1 = gevent.spawn(f, 5)
g2 = gevent.spawn(f, 5)
g3 = gevent.spawn(f, 5)
g1.join()
g2.join()
g3.join()

運行結果：

<Greenlet at 0x10e49f550: f(5)> 0
<Greenlet at 0x10e49f550: f(5)> 1
<Greenlet at 0x10e49f550: f(5)> 2
<Greenlet at 0x10e49f550: f(5)> 3
<Greenlet at 0x10e49f550: f(5)> 4
<Greenlet at 0x10e49f910: f(5)> 0
<Greenlet at 0x10e49f910: f(5)> 1
<Greenlet at 0x10e49f910: f(5)> 2
<Greenlet at 0x10e49f910: f(5)> 3
<Greenlet at 0x10e49f910: f(5)> 4
<Greenlet at 0x10e49f4b0: f(5)> 0
<Greenlet at 0x10e49f4b0: f(5)> 1
<Greenlet at 0x10e49f4b0: f(5)> 2
<Greenlet at 0x10e49f4b0: f(5)> 3
<Greenlet at 0x10e49f4b0: f(5)> 4

可以看到，3個greenlet是依次運行而不是交替運行。

要讓greenlet交替運行，可以通過gevent.sleep()交出控制權：

def f(n):
    for i in range(n):
        print gevent.getcurrent(), i
        gevent.sleep(0)

執行結果：

<Greenlet at 0x10cd58550: f(5)> 0
<Greenlet at 0x10cd58910: f(5)> 0
<Greenlet at 0x10cd584b0: f(5)> 0
<Greenlet at 0x10cd58550: f(5)> 1
<Greenlet at 0x10cd584b0: f(5)> 1
<Greenlet at 0x10cd58910: f(5)> 1
<Greenlet at 0x10cd58550: f(5)> 2
<Greenlet at 0x10cd58910: f(5)> 2
<Greenlet at 0x10cd584b0: f(5)> 2
<Greenlet at 0x10cd58550: f(5)> 3
<Greenlet at 0x10cd584b0: f(5)> 3
<Greenlet at 0x10cd58910: f(5)> 3
<Greenlet at 0x10cd58550: f(5)> 4
<Greenlet at 0x10cd58910: f(5)> 4
<Greenlet at 0x10cd584b0: f(5)> 4

3個greenlet交替運行，

把循環次數改爲500000，讓它們的運行時間長一點，然後在操作系統的進程管理器中看，線程數只有1個。

當然，實際代碼裏，我們不會用gevent.sleep()去切換協程，而是在執行到IO操作時，gevent自動切換，代碼如下：

from gevent import monkey; monkey.patch_all()
import gevent
import urllib2

def f(url):
    print('GET: %s' % url)
    resp = urllib2.urlopen(url)
    data = resp.read()
    print('%d bytes received from %s.' % (len(data), url))

gevent.joinall([
        gevent.spawn(f, 'https://www.python.org/'),
        gevent.spawn(f, 'https://www.yahoo.com/'),
        gevent.spawn(f, 'https://github.com/'),
])

運行結果：

GET: https://www.python.org/
GET: https://www.yahoo.com/
GET: https://github.com/
45661 bytes received from https://www.python.org/.
14823 bytes received from https://github.com/.
304034 bytes received from https://www.yahoo.com/.

從結果看，3個網絡操作是併發執行的，而且結束順序不同，但只有一個線程。

小結

使用gevent，可以獲得極高的併發性能，但gevent只能在Unix/Linux下運行，在Windows下不保證正常安裝和運行。

由於gevent是基於IO切換的協程，所以最神奇的是，我們編寫的Web App代碼，不需要引入gevent的包，也不需要改任何代碼，僅僅在部署的時候，用一個支持gevent的WSGI服務器，立刻就獲得了數倍的性能提升。具體部署方式可以參考後續“實戰”-“部署Web App”一節。

asyncio

閱讀: 5482605

asyncio是Python 3.4版本引入的標準庫，直接內置了對異步IO的支持。

asyncio的編程模型就是一個消息循環。我們從asyncio模塊中直接獲取一個EventLoop的引用，然後把需要執行的協程扔到EventLoop中執行，就實現了異步IO。

用asyncio實現Hello world代碼如下：

import asyncio

@asyncio.coroutine
def hello():
    print("Hello world!")
    # 異步調用asyncio.sleep(1):
    r = yield from asyncio.sleep(1)
    print("Hello again!")

# 獲取EventLoop:
loop = asyncio.get_event_loop()
# 執行coroutine
loop.run_until_complete(hello())
loop.close()

@asyncio.coroutine把一個generator標記爲coroutine類型，然後，我們就把這個coroutine扔到EventLoop中執行。

hello()會首先打印出Hello world!，然後，yield from語法可以讓我們方便地調用另一個generator。由於asyncio.sleep()也是一個coroutine，所以線程不會等待asyncio.sleep()，而是直接中斷並執行下一個消息循環。當asyncio.sleep()返回時，線程就可以從yield from拿到返回值（此處是None），然後接着執行下一行語句。

把asyncio.sleep(1)看成是一個耗時1秒的IO操作，在此期間，主線程並未等待，而是去執行EventLoop中其他可以執行的coroutine了，因此可以實現併發執行。

我們用Task封裝兩個coroutine試試：

import threading
import asyncio

@asyncio.coroutine
def hello():
    print('Hello world! (%s)' % threading.currentThread())
    yield from asyncio.sleep(1)
    print('Hello again! (%s)' % threading.currentThread())

loop = asyncio.get_event_loop()
tasks = [hello(), hello()]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

觀察執行過程：

Hello world! (<_MainThread(MainThread, started 140735195337472)>)
Hello world! (<_MainThread(MainThread, started 140735195337472)>)
(暫停約1秒)
Hello again! (<_MainThread(MainThread, started 140735195337472)>)
Hello again! (<_MainThread(MainThread, started 140735195337472)>)

由打印的當前線程名稱可以看出，兩個coroutine是由同一個線程併發執行的。

如果把asyncio.sleep()換成真正的IO操作，則多個coroutine就可以由一個線程併發執行。

我們用asyncio的異步網絡連接來獲取sina、sohu和163的網站首頁：

import asyncio

@asyncio.coroutine
def wget(host):
    print('wget %s...' % host)
    connect = asyncio.open_connection(host, 80)
    reader, writer = yield from connect
    header = 'GET / HTTP/1.0\r\nHost: %s\r\n\r\n' % host
    writer.write(header.encode('utf-8'))
    yield from writer.drain()
    while True:
        line = yield from reader.readline()
        if line == b'\r\n':
            break
        print('%s header > %s' % (host, line.decode('utf-8').rstrip()))
    # Ignore the body, close the socket
    writer.close()

loop = asyncio.get_event_loop()
tasks = [wget(host) for host in ['www.sina.com.cn', 'www.sohu.com', 'www.163.com']]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

執行結果如下：

wget www.sohu.com...
wget www.sina.com.cn...
wget www.163.com...
(等待一段時間)
(打印出sohu的header)
www.sohu.com header > HTTP/1.1 200 OK
www.sohu.com header > Content-Type: text/html
...
(打印出sina的header)
www.sina.com.cn header > HTTP/1.1 200 OK
www.sina.com.cn header > Date: Wed, 20 May 2015 04:56:33 GMT
...
(打印出163的header)
www.163.com header > HTTP/1.0 302 Moved Temporarily
www.163.com header > Server: Cdn Cache Server V2.0
...

可見3個連接由一個線程通過coroutine併發完成。

小結

asyncio提供了完善的異步IO支持；

異步操作需要在coroutine中通過yield from完成；

多個coroutine可以封裝成一組Task然後併發執行。

參考源碼

async_hello.py

async_wget.py

async/await

閱讀: 2910211

用asyncio提供的@asyncio.coroutine可以把一個generator標記爲coroutine類型，然後在coroutine內部用yield from調用另一個coroutine實現異步操作。

爲了簡化並更好地標識異步IO，從Python 3.5開始引入了新的語法async和await，可以讓coroutine的代碼更簡潔易讀。

請注意，async和await是針對coroutine的新語法，要使用新的語法，只需要做兩步簡單的替換：

把@asyncio.coroutine替換爲async；
把yield from替換爲await。

讓我們對比一下上一節的代碼：

@asyncio.coroutine
def hello():
    print("Hello world!")
    r = yield from asyncio.sleep(1)
    print("Hello again!")

用新語法重新編寫如下：

async def hello():
    print("Hello world!")
    r = await asyncio.sleep(1)
    print("Hello again!")

剩下的代碼保持不變。

小結

Python從3.5版本開始爲asyncio提供了async和await的新語法；

注意新語法只能用在Python 3.5以及後續版本，如果使用3.4版本，則仍需使用上一節的方案。

練習

將上一節的異步獲取sina、sohu和163的網站首頁源碼用新語法重寫並運行。

參考源碼

async_hello2.py

async_wget2.py

asyncio可以實現單線程併發IO操作。如果僅用在客戶端，發揮的威力不大。如果把asyncio用在服務器端，例如Web服務器，由於HTTP連接就是IO操作，因此可以用單線程+coroutine實現多用戶的高併發支持。

asyncio實現了TCP、UDP、SSL等協議，aiohttp則是基於asyncio實現的HTTP框架。

我們先安裝aiohttp：

pip install aiohttp

然後編寫一個HTTP服務器，分別處理以下URL：

/ - 首頁返回b'<h1>Index</h1>'；
/hello/{name} - 根據URL參數返回文本hello, %s!。

代碼如下：

import asyncio

from aiohttp import web

async def index(request):
    await asyncio.sleep(0.5)
    return web.Response(body=b'<h1>Index</h1>')

async def hello(request):
    await asyncio.sleep(0.5)
    text = '<h1>hello, %s!</h1>' % request.match_info['name']
    return web.Response(body=text.encode('utf-8'))

async def init(loop):
    app = web.Application(loop=loop)
    app.router.add_route('GET', '/', index)
    app.router.add_route('GET', '/hello/{name}', hello)
    srv = await loop.create_server(app.make_handler(), '127.0.0.1', 8000)
    print('Server started at http://127.0.0.1:8000...')
    return srv

loop = asyncio.get_event_loop()
loop.run_until_complete(init(loop))
loop.run_forever()

注意aiohttp的初始化函數init()也是一個coroutine，loop.create_server()則利用asyncio創建TCP服務。

參考源碼

aio_web.py

peter-jh

發佈了65 篇原創文章 · 獲贊 30 · 訪問量 27萬+

私信關注

python異步IO併發單線程協程gevent

小結

小結

參考源碼

小結

練習

參考源碼

參考源碼

IT行業熱門與前沿技術發展動態

PureMVC最簡項目 HelloWorld

UGUI的核心內容

C#異步task任務 await與async的正確打開方式

C# 異步事件調用委託

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結