併發編程會給我們的程序帶來極大的性能提升，併發編程具有非常廣泛的應用，比如服務器、網絡爬蟲、性能測試等。Python中併發編程的實現方式有：多線程、多進程以及協程。本文主要介紹多線程和多進程。

1. 幾個重要概念

在python併發編程之前，我們需要明確和掌握幾個重要的概念，併發與並行，同步和異步，阻塞與非阻塞。

1.1 併發執行和並行執行

並行（parallel）：指的是互不干擾的在同一時刻做多件事，對應Python中的就是多進程（multi-processing），可以利用多核處理器的優勢，通常應用於 CPU heavy 的場景，比如計算密集型任務。
併發（concurrency）：指的是同時做某些事，但是強調同一時段做多件事，對應Python中就是多線程（multi-threading）或者協程（Coroutine），通常應用於 I/O 操作頻繁的場景，比如發起網絡請求。

1.2 同步調用和異步調用

同步調用和異步調用是提交任務的兩種方式。

同步調用：提交任務，原地等待任務執行結束，拿到任務返回結果。再執行下一行代碼，會導致任務串行執行。
異步調用：提交任務，不進行原地等待，直接執行下一行代碼，任務併發執行。

1.3 阻塞狀態和非阻塞狀態

阻塞運行和非阻塞運行，是程序的運行狀態。

阻塞：程序遇到IO操作時，進行原地等待，即程序處於阻塞態。
非阻塞：程序沒有進行IO操作時，程序處於運行態，即就緒態。

1.4 進程池和線程池

進程池和線程池，是用於控制進程數或線程數的。

如果服務器開啓的進程數或線程數，隨併發的客戶端數目單調遞增，服務器就會承受巨大的壓力，於是使用“池”的概念，對服務端開啓的進程數或線程數加以控制。

進程池：用來存放進程的"池"
線程池：用來存放線程的"池"

當服務器收到客戶端的請求時，從池子中拿出線程或者進程來處理，處理完，再把線程或者進程放入池子中。

2. 單線程與多線程性能比較

先寫一個單線程發起網絡請求的代碼：

import requests
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))

def download_all(sites):
    for site in sites:
        download_one(site)

if __name__ == '__main__':
    sites = [
        'https://golang.google.cn/',
        'https://www.python.org/',
        'http://www.php.net/',
        'https://www.javascript.com/',
        'http://mqtt.org/',
        'https://www.mysql.com/',
        'https://www.java.com/zh_CN/',
        'https://developers.google.cn/protocol-buffers/'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

上面代碼的輸出：

Read 7181 from https://golang.google.cn/
Read 48634 from https://www.python.org/
Read 62050 from http://www.php.net/
Read 32850 from https://www.javascript.com/
Read 17336 from http://mqtt.org/
Read 31275 from https://www.mysql.com/
Read 10454 from https://www.java.com/zh_CN/
Read 34218 from https://developers.google.cn/protocol-buffers/
Download 8 sites in 11.896329030999999 seconds

可見請求這8個網站總共花費11.8秒多，再來看看多線程版本。

Python標準庫爲我們提供了threading和multiprocessing模塊編寫相應的異步多線程/多進程代碼。從Python3.2開始，標準庫爲我們提供了concurrent.futures模塊，它提供了ThreadPoolExecutor和ProcessPoolExecutor兩個類。下面的代碼使用ThreadPoolExecutor這個類實現多線程。

import concurrent.futures

import requests
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))
    return {'url': url, 'content_length': len(resp.content)}

def download_all(sites_list):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        executor.map(download_one, sites_list)

if __name__ == '__main__':
    sites = [
        'https://golang.google.cn/',
        'https://www.python.org/',
        'http://www.php.net/',
        'https://www.javascript.com/',
        'http://mqtt.org/',
        'https://www.mysql.com/',
        'https://www.java.com/zh_CN/',
        'https://developers.google.cn/protocol-buffers/'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

這段代碼的輸出：

Read 650 from http://www.php.net/
Read 7181 from https://golang.google.cn/
Read 10454 from https://www.java.com/zh_CN/
Read 48634 from https://www.python.org/
Read 32850 from https://www.javascript.com/
Read 17336 from http://mqtt.org/
Read 31275 from https://www.mysql.com/
Read 34218 from https://developers.google.cn/protocol-buffers/
Download 8 sites in 1.8238722280000002 seconds

明顯多線程的程序比單線程循環請求快很多。多線程版本與單線程版本區別主要在：

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    executor.map(download_one, sites_list)

這⾥我們創建了⼀個線程池，總共有5個線程可以分配使⽤。executer.map()與Python內置的map()函數類似，表示對sites_list中的每⼀個元素，併發地調⽤函數download_one()。

通常來講，我們應該避免編寫線程數量可以無限制增長的程序。創建大量線程讓你服務器資源枯竭而崩潰，最好是通過使用預先初始化的線程池，設置同時運行線程的上限數量。

由於全局解釋鎖（GIL）的原因，Python 的線程被限制到同一時刻只允許一個線程執行。所以，Python的線程更適用於處理I/O和其他需要併發執行的阻塞操作（比如等待I/O、等待從數據庫獲取數據等等）。

如果是CPU密集型的任務，我們最好用ProcessPoolExecutor這個類。ProcessPoolExecutor的使用方法和ThreadPoolExecutor類似。如果上面的例子用ProcessPoolExecutor來實現，只需要將ThreadPoolExecutor換成ProcessPoolExecutor即可。使用ProcessPoolExecutor時，max_workers參數可以不指定，默認爲CPU的核數。

3. submit方法實現多線程

通過executor.submit()方法，也可以達到多線程執行的效果，不過代碼比較多。上述例⼦中download_all函數也可以寫成下⾯的形式：

def download_all(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        to_do = []
        for site in sites:
            future = executor.submit(download_one, site)
            to_do.append(future)

        for future in concurrent.futures.as_completed(to_do):
            future.result()

這⾥需要兩個循環，第一個循環對每個網站調⽤ executor.submit()產生一個Future對象future並放入to_do中等待執⾏。

第二個循環，是對於執行完成的future通過result()方法獲取結果。as_completed(fs)是針對給定的future迭代器fs，在其完成後，返回完成後的迭代器。

不過，這⾥要注意，future 列表中每個 future 完成的順序，和它在列表中的順序並不⼀定完全⼀致。到底哪個先完成、哪個後完成，取決於系統的調度和每個future的執⾏時間。

通常建議使用executor.map()方法，既簡單又高效，而且返回執行結果的順序，依然與傳入參數的順序保持一致

4. add_done_callback方法的妙用

Future對象也可以像協程一樣，當它設置完成結果時，就可以立即進行回調別的函數。add_done_callback(fn)，則表示 Futures 完成後，會調⽤fn函數。

import concurrent.futures
import requests
import time

def download_one(url):
    resp = requests.get(url)
    return {'url': url, 'content_length': len(resp.content)}

def parse(res):
    res = res.result()
    print('Read {} from {}'.format(res['content_length'], res['url']))

def download_all(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        for site in sites:
            executor.submit(download_one, site).add_done_callback(parse)

if __name__ == '__main__':
    sites = [
        'https://golang.google.cn/',
        'https://www.python.org/',
        'http://www.php.net/',
        'https://www.javascript.com/',
        'http://mqtt.org/',
        'https://www.mysql.com/',
        'https://www.java.com/zh_CN/',
        'https://developers.google.cn/protocol-buffers/'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

這裏的parse函數表示future對象執行完成後需要執行的操作，所以將其放入add_done_callback函數中。

5. 總結

本文描述了併發編程中涉及到的常見概念，比如併發與並行，同步與異步，阻塞與非阻塞，進程池與線程池。合理地運用多線程，能夠極大地提高程序運行效率。

寫多進程或者多線程程序的套路是，先寫一個單操作的函數，再寫一個多進程或者多線程的函數，將單操作的函數封裝到裏面。推薦使用executor.map()方法實現多線程或者多進程。

6. 參考

https://docs.python.org/3/library/concurrent.futures.html
極客時間: Python併發編程之Futures
http://c.biancheng.net/view/2627.html
https://blog.csdn.net/qq_33961117/article/details/82587873

【Python測試開發】使用ProcessPoolExecutor或ThreadPoolExecutor實現多進程或多線程

1. 幾個重要概念

1.1 併發執行和並行執行

1.2 同步調用和異步調用

1.3 阻塞狀態和非阻塞狀態

1.4 進程池和線程池

2. 單線程與多線程性能比較

3. submit方法實現多線程

4. add_done_callback方法的妙用

5. 總結

6. 參考

工作中用到的腳本合集

24-5-18 X

測試開發基礎之算法(9):散列表原理及在Python中的應用

測試開發基礎之算法(6):隊列的操作及應用

測試開發基礎之算法(13):堆、堆排序及三種應用（優先級隊列、Top k、中位數）

【Pytest】python單元測試框架pytest簡介

測試開發之Python核心筆記：認識Python

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結