Python3 - 多進程multiprocessing與多線程multithreading

進程(process)和線程(thread)各種開發語言中常見的概念，對於代碼的併發執行，提升代碼效率和縮短運行時間至關重要。

進程是操作系統分配資源的最小單元, 線程是操作系統調度的最小單元。
一個應用程序至少包括1個進程，而1個進程包括1個或多個線程，線程的尺度更小。每個進程在執行過程中擁有獨立的內存單元，而一個線程的多個線程在執行過程中共享內存。

一、多進程

要讓Python程序實現多進程（multiprocessing），我們先了解操作系統的相關知識。

Unix/Linux操作系統提供了一個fork()系統調用，它非常特殊。普通的函數調用，調用一次，返回一次，但是fork()調用一次，返回兩次，因爲操作系統自動把當前進程（稱爲父進程）複製了一份（稱爲子進程），然後，分別在父進程和子進程內返回。

子進程永遠返回0，而父進程返回子進程的ID。這樣做的理由是，一個父進程可以fork出很多子進程，所以，父進程要記下每個子進程的ID，而子進程只需要調用getppid()就可以拿到父進程的ID。

Python的os模塊封裝了常見的系統調用，其中就包括fork，可以在Python程序中輕鬆創建子進程：

import os
print('Process (%s) start...' % os.getpid())
# Only works on Unix/Linux/Mac:
pid = os.fork()
if pid == 0:
    print('I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid()))
else:
    print('I (%s) just created a child process (%s).' % (os.getpid(), pid))

運行結果如下：

Process (876) start...
I (876) just created a child process (877).
I am child process (877) and my parent is 876.

由於Windows沒有fork調用，上面的代碼在Windows上無法運行。

有了fork調用，一個進程在接到新任務時就可以複製出一個子進程來處理新任務，常見的Apache服務器就是由父進程監聽端口，每當有新的http請求時，就fork出子進程來處理新的http請求。

multiprocessing

Python中的多進程是通過multiprocessing包來實現的，和多線程的threading.Thread差不多，它可以利用multiprocessing.Process對象來創建一個進程對象。

標準庫模塊multiprocessing中提供了一個類對象Process，用於表示進程。

# coding: utf-8
import multiprocessing
from multiprocessing import Process, current_process
import time

def do_sth(arg1, arg2):
    print('子進程啓動(%d--%s)' % (current_process().pid, current_process().name))
    print('arg1 = %d, arg2 = %d' % (arg1, arg2))
    print('子進程結束(%d--%s)' % (current_process().pid, current_process().name))

if __name__ == '__main__':
    print('父進程啓動(%d--%s)' % (current_process().pid, current_process().name))
    process = Process(target=do_sth, args=(5, 8)) # 根據類對象Process來創建進程
    process.start()
    process.join()# join等process子線程結束，主線程打印並且結束
    print('父進程結束(%d--%s)' % (current_process().pid, current_process().name))

創建子進程時，只需要傳入一個執行函數和函數的參數，創建一個Process實例，用start()方法啓動，這樣創建進程比fork()還要簡單。

join()方法可以等待子進程結束後再繼續往下運行，通常用於進程間的同步。

python多進程也可以通過繼承Process類來實現。

from multiprocessing import Process, current_process
import time

class MyProcess(Process):
    def __init__(self, name, args):
        super(MyProcess,self).__init__()
        self.args = args

    def run(self):
        print('子進程啓動(%d--%s)' % (current_process().pid, current_process().name))
        print('arg1 = %d, arg2 = %d' % self.args)
        print('子進程結束(%d--%s)' % (current_process().pid, current_process().name))

if __name__ == '__main__':
    print('父進程啓動(%d--%s)' % (current_process().pid, current_process().name))
    mp = MyProcess(name='myprocess', args=(5, 8))
    mp.start()
    time.sleep(2)
    print('父進程結束(%d--%s)' % (current_process().pid, current_process().name))

Pool

如果要啓動大量的子進程，可以用進程池的方式批量創建子進程：

from multiprocessing import Pool
import time, random

def do_sth(i):
    print('子進程%d啓動' % i)
    start = time.time()
    time.sleep(random.random() * 10)
    end = time.time()
    print('子進程%d結束，耗時%.2f秒' % (i, end - start))

if __name__ == '__main__':
    print('父進程啓動...')
    pp = Pool(3) # 進程池最大的數量是3

    for i in range(1, 11):
        pp.apply_async(do_sth, args=(i,))

    pp.close()
    # 父進程將被阻塞，等子進程全部執行完成之後，父進程再繼續執行
    pp.join()
    print('父進程結束')

代碼解讀：

對Pool對象調用join()方法會等待所有子進程執行完畢，調用join()之前必須先調用close()，調用close()之後就不能繼續添加新的Process了。

二、多線程

threading

# coding: utf-8
import time, threading
from threading import current_thread, Thread

print('自動創建並啓動了父（主）線程：%s' % threading.current_thread().getName())
# 自動創建並啓動了父（主）線程：MainThread
#time.sleep(20)

print('父線程%s啓動' % current_thread().getName())


def do_sth(arg1, arg2):
    print('子線程%s啓動' % current_thread().getName())
    time.sleep(20)
    print('arg1 = %d, arg2 = %d' % (arg1, arg2))
    print('子線程%s結束' % current_thread().getName())

# process = Thread(target=do_sth, args=(5, 8), name='mythread')
process = Thread(target=do_sth, args=(5, 8))
process.start()

time.sleep(25)

print('父線程%s結束' % current_thread().getName())

自定義線程

繼承threading.Thread來自定義線程類，其本質是重構Thread類中的run方法

# coding: utf-8
from threading import Thread, current_thread
import time

print('父線程%s啓動' % current_thread().getName())

class MyThread(Thread):
    def __init__(self, name, args):
        super(MyThread, self).__init__(name=name)
        self.args = args

    def run(self):
        print('子線程%s啓動' % current_thread().getName())
        time.sleep(20)
        print('arg1 = %d, arg2 = %d' % self.args)
        print('子線程%s結束' % current_thread().getName())

mt = MyThread(name='mythread', args=(5, 8))
mt.start()

time.sleep(25)

print('父線程%s線程' % current_thread().getName())

threadpool

# coding: utf-8

from threadpool import ThreadPool, makeRequests
import time, random

print('父線程啓動')

args_list = []
for i in range(1, 11):
    args_list.append(i)

def do_sth(i):
    print('子線程%d啓動' % i)

    start = time.time()
    time.sleep(random.random() * 10)
    end = time.time()

    print('子線程%d結束，耗時%.2f秒' % (i, end - start))

tp = ThreadPool(3)

requests = makeRequests(do_sth, args_list)

for req in requests:
    tp.putRequest(req)

tp.wait()

print('父線程結束')

semaphore信號量

semaphore是一個內置的計數器。

每當調用acquire()時，內置計數器-1
每當調用release()時，內置計數器+1

計數器不能小於0，當計數器爲0時，acquire()將阻塞線程直到其他線程調用release()。

import time
import threading

s1=threading.Semaphore(5)# 添加一個計數器

def foo():
	s1.acquire()	#計數器獲得鎖
	time.sleep(2)	#程序休眠2秒
	print("ok",time.ctime())
	s1.release()	#計數器釋放鎖

for i in range(20):
	t1=threading.Thread(target=foo,args=())	#創建線程
	t1.start()	#啓動線程	
    
# ----------使用with語句對代碼進行簡寫-------------  
from threading import Thread, Semaphore
import time, random

sem = Semaphore(3) # 最多同時3個線程可以執行，來限制一個時間點內的線程數量

class MyThread(Thread):
    def run(self):
        # sem.acquire()
        with sem:
            print('%s獲得資源' % self.name)
            time.sleep(random.random() * 10)
        # sem.release()

for i in range(10):
    MyThread().start()

三、全局解釋鎖 GIL

在進行GIL講解之前，我們可以先介紹一下並行和併發的區別：

並行：多個CPU同時執行多個任務，就好像有兩個程序，這兩個程序是真的在兩個不同的CPU內同時被執行
併發：CPU交替處理多個任務，還是有兩個程序，但是隻有一個CPU，會交替處理這兩個程序，而不是同時執行，只不過因爲CPU執行的速度過快，而會使得人們感到是在“同時”執行，執行的先後取決於各個程序對於時間片資源的爭奪。

並行和併發同屬於多任務，目的是要提高CPU的使用效率。

這裏需要注意的是，一個CPU永遠不可能實現並行，即一個CPU不能同時運行多個程序，但是可以在隨機分配的時間片內交替執行（併發），就好像一個人不能同時看兩本書，但是卻能夠先看第一本書半分鐘，再看第二本書半分鐘，這樣來回切換。

什麼是GIL？

即全局解釋器所（global interpreter lock），每個線程在執行時候都需要先獲取GIL，保證同一時刻只有一個線程可以執行代碼，即同一時刻只有一個線程使用CPU，也就是說多線程並不是真正意義上的同時執行。

四、進程通信方式

python提供了多種進程通信的方式，主要有隊列Queue、Pipe、共享內存、Manager模塊等

Queue用於多個進程間實現通信，Pipe是兩個進程的通信

共享內存

from multiprocessing import Process, Value, Array
import ctypes

"""
共享數字、數組 進程之間的通信
"""
def do_sth(num, arr):
    num.value = 1.8
    for i in range(len(arr)):
        arr[i] = -arr[i]


if __name__ == "__main__":
    # num = Value('d', 2.3) 創建一個進程間共享的數字類型，默認值爲2.3，d的類型爲雙精度小數
    num = Value(ctypes.c_double, 2.3) 
    arr = Array('i', range(1, 5)) #創建一個進程間共享的數組類型，初始值爲range[1, 5]，i是指整型
    # arr = Array(ctypes.c_int, range(1, 5))
    p = Process(target=do_sth, args=(num, arr,))
    p.start()
    p.join() # 阻塞主線程

    print(num.value)
    print(arr[:])
    
 #------------------------------------------------
import multiprocessing 

def square_list(mylist, result, square_sum): 
	""" 
	function to square a given list 
	"""
	# append squares of mylist to result array 
	for idx, num in enumerate(mylist): 
		result[idx] = num * num 

	# square_sum value 
	square_sum.value = sum(result) 

	# print result Array 
	print("Result(in process p1): {}".format(result[:])) 

	# print square_sum Value 
	print("Sum of squares(in process p1): {}".format(square_sum.value)) 

if __name__ == "__main__": 
	# input list 
	mylist = [1,2,3,4] 

	# creating Array of int data type with space for 4 integers 
	result = multiprocessing.Array('i', 4) 

	# creating Value of int data type 
	square_sum = multiprocessing.Value('i') 

	# creating new process 
	p1 = multiprocessing.Process(target=square_list, args=(mylist, result, square_sum)) 

	# starting process 
	p1.start() 

	# wait until process is finished 
	p1.join() 

	# print result array 
	print("Result(in main program): {}".format(result[:])) 

	# print square_sum Value 
	print("Sum of squares(in main program): {}".format(square_sum.value))

Manager共享模塊

與共享內存相比，Manager模塊更加靈活，可以支持多種對象類型。

from multiprocessing import Process, Manager

def f(d,l):
    d[1] = 18
    d['2'] = 56
    l.reverse()

if __name__ == '__main__':
    manager = Manager()

    d = manager.dict()
    l = manager.list(range(5))

    p = Process(target=f,args=(d,l))
    p.start()
    p.join()

    print(d)
    print(l)
# --------------------------------------------------------
# 共享字符串
from multiprocessing import Process, Manager
from ctypes import c_char_p

def greet(shareStr):
    shareStr.value = shareStr.value + ", World!"

if __name__ == '__main__':
    manager = Manager()
    shareStr = manager.Value(c_char_p, "Hello")  # 字符串共享
    process = Process(target=greet, args=(shareStr,))
    process.start()
    process.join()
    print(shareStr.value)
    
-------------------------------------------------
import multiprocessing 

def print_records(records): 
	""" 
	function to print record(tuples) in records(list) 
	"""
	for record in records: 
		print("Name: {0}\nScore: {1}\n".format(record[0], record[1])) 

def insert_record(record, records): 
	""" 
	function to add a new record to records(list) 
	"""
	records.append(record) 
	print("New record added!\n") 

if __name__ == '__main__': 
	with multiprocessing.Manager() as manager: 
		# creating a list in server process memory 
		records = manager.list([('Sam', 10), ('Adam', 9), ('Kevin',9)]) 
		# new record to be inserted in records 
		new_record = ('Jeff', 8) 

		# creating new processes 
		p1 = multiprocessing.Process(target=insert_record, args=(new_record, records)) 
		p2 = multiprocessing.Process(target=print_records, args=(records,)) 

		# running process p1 to insert new record 
		p1.start() 
		p1.join() 

		# running process p2 to print records 
		p2.start() 
		p2.join()

New record added!

Name: Sam
Score: 10

Name: Adam
Score: 9

Name: Kevin
Score: 9

Name: Jeff
Score: 8

共享隊列 Queue

import multiprocessing

def square_list(mylist, q):
    """
	function to square a given list
	"""
    # append squares of mylist to queue
    for num in mylist:
        print("put  num %s  to list" %num)
        q.put(num * num) #插入數據到隊列

def print_queue(q):
    """
	function to print queue elements
	"""
    print("Queue elements:")
    while not q.empty():
        print(q.get())
    print("Queue is now empty!")

if __name__ == "__main__":
    # input list
    mylist = [1, 2, 3, 4]
    # creating multiprocessing Queue
    q = multiprocessing.Queue()

    # creating new processes
    p1 = multiprocessing.Process(target=square_list, args=(mylist, q))
    p2 = multiprocessing.Process(target=print_queue, args=(q,))

    # running process p1 to square list
    p1.start()
    p1.join()
    # running process p2 to get queue elements
    p2.start()
    p2.join()

Given below is a simple diagram depicting the operations on queue:

Pipe通信機制

Pipe常用於兩個進程，兩個進程分別位於管道的兩端
Pipe方法返回（conn1,conn2）代表一個管道的兩個端，Pipe方法有duplex參數，默認爲True，即全雙工模式，若爲FALSE，conn1只負責接收信息，conn2負責發送
send和recv方法分別爲發送和接收信息

import multiprocessing

def sender(conn, msgs):
    """
	function to send messages to other end of pipe
	"""
    for msg in msgs:
        conn.send(msg)
        print("Sent the message: {}".format(msg))
    conn.close()

def receiver(conn):
    """
	function to print the messages received from other end of pipe
	"""
    while 1:
        msg = conn.recv()
        if msg == "END":
            break
        print("Received the message: {}".format(msg))

if __name__ == "__main__":
    # messages to be sent
    msgs = ["hello", "hey", "hru?", "END"]

    # creating a pipe 返回一個元組，分別代表管道的兩端
    parent_conn, child_conn = multiprocessing.Pipe()

    # creating new processes 
    p1 = multiprocessing.Process(target=sender, args=(parent_conn, msgs))
    p2 = multiprocessing.Process(target=receiver, args=(child_conn,))

    # running processes
    p1.start()
    p2.start()

    # wait until processes finish
    p1.join()
    p2.join()

參考文檔

https://www.cnblogs.com/luyuze95/p/11289143.html
https://www.geeksforgeeks.org/multithreading-python-set-1/

Python3 - 多進程multiprocessing與多線程multithreading

一、多進程

multiprocessing

Pool

二、多線程

threading

自定義線程

threadpool

semaphore信號量

三、全局解釋鎖 GIL

四、進程通信方式

共享內存

Manager共享模塊

共享隊列 Queue

Pipe通信機制

參考文檔

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

Golang初學：獲取程序內存使用情況，std runtime

深度學習教程(一) 一鍵安裝TensorFlow開發環境

Python3 常用函數總結

Python3 - 多進程multiprocessing與多線程multithreading

Linux 常用命令vi 、top 等命令詳解

使用隱藏的 REST API 提交 SPARK 任務

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結