concurrent.futrues是個高級的的庫,它只在“任務”級別進行操作,意思是你不需要關注同步和線程、進程的管理了。Future 其實是生產-消費者模型的一種擴展,在生產-消費者模型中,生產者不關心消費者什麼時候處理完數據,也不關心消費者處理的結果。你只需要指定一個“max_workers”數量的線程/進程池,然後提交任務和整理結果即可,另一個好處是相對於threading和multiprocessing模塊應用於多線程/多進程場景,頻繁創建/銷燬進程或者線程是非常消耗資源的,而concurrent.futrues有自己的線程池/進程池,以空間換時間。
concurrent.futrues有兩個類:concurrent.futrues.ThreadPoolExecutor(線程池),通常用於IO密集型場景;concurrent.futrues.ProcessPoolExecutor(進程池),通常用於計算密集型場景,爲什麼這樣分使用場景,那是python GIL鎖的原因,多個線程只能用一個CPU,這裏不再贅述了。兩者的使用方法是一樣。
ThreadPoolExecutor/ProcessPoolExecutor常用的方法如下:
1、ThreadPoolExecutor/ProcessPoolExecutor構造實例的時候,傳入max_workers參數來設置線程池中最多能同時運行的線程數目。
2、submit(self, fn, *args, **kwargs)函數來提交線程需要執行的任務(函數名和參數)到線程池中,並返回該任務的句柄(類似於文件、畫圖),注意submit()不是阻塞的,而是立即返回。
3、done()方法判斷該任務是否結束。
4、cancel()方法可以取消提交的任務,如果任務已經在線程池中運行了,就取消不了。
5、result()方法可以獲取任務的返回值。查看內部代碼,發現這個方法是阻塞的。
6、wait(fs, timeout=None, return_when=ALL_COMPLETED),wait接受3個參數,fs表示執行的task序列;timeout表示等待的最長時間,超過這個時間即使線程未執行完成也將返回;return_when表示wait返回結果的條件,默認爲ALL_COMPLETED全部執行完成再返回
7、map(self, fn, *iterables, timeout=None, chunksize=1),第一個參數fn是線程執行的函數;第二個參數接受一個可迭代對象;第三個參數timeout跟wait()的timeout一樣,但由於map是返回線程執行的結果,如果timeout小於線程執行時間會拋異常TimeoutError。
8、as_completed(fs, timeout=None)方法一次取出所有任務的結果。
An iterator over the given futures that yields each as it completes.
Args:
fs: The sequence of Futures (possibly created by different Executors) to
iterate over.
timeout: The maximum number of seconds to wait. If None, then there
is no limit on the wait time.
Returns:
An iterator that yields the given Futures as they complete (finished or
cancelled). If any given Futures are duplicated, they will be returned
once.
Raises:
TimeoutError: If the entire result iterator could not be generated
before the given timeout.
下面比較在計算密集型場景下ThreadPoolExecutor和ProcessPoolExecutor的效率:
import time
from concurrent.futures import ThreadPoolExecutor, as_completed, ProcessPoolExecutor
def get_fib(num):
if num < 3:
return 1
return get_fib(num - 1) + get_fib(num - 2)
def run_thread_pool(workers, fib_num):
start_time = time.time()
with ThreadPoolExecutor(workers) as thread_executor:
tasks = [thread_executor.submit(get_fib, num) for num in range(fib_num)]
results = [task.result() for task in as_completed(tasks)]
print(results)
print("ThreadPoolExecutor spend time: {}s".format(time.time() - start_time))
def run_process_pool(workers, fib_num):
start_time = time.time()
with ProcessPoolExecutor(workers) as process_executor:
tasks = [process_executor.submit(get_fib, num) for num in range(fib_num)]
results = [task.result() for task in as_completed(tasks)]
print(results)
print("ProcessPoolExecutor spend time: {}s".format(time.time() - start_time))
if __name__ == '__main__':
#run_thread_pool(6, 38)
run_process_pool(6, 38)
結果如下:
[5, 2, 1, 1, 3, 1, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 28657, 17711, 121393, 75025, 46368, 317811, 196418, 514229, 1346269, 832040, 2178309, 3524578, 9227465, 5702887, 14930352, 24157817]
ThreadPoolExecutor spend time: 24.460843086242676s
[1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 144, 89, 377, 610, 987, 233, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 75025, 46368, 121393, 196418, 514229, 317811, 1346269, 832040, 2178309, 5702887, 3524578, 9227465, 14930352, 24157817]
ProcessPoolExecutor spend time: 15.908910274505615s