進程(process)和線程(thread)各種開發語言中常見的概念,對於代碼的併發執行,提升代碼效率和縮短運行時間至關重要。
進程是操作系統分配資源的最小單元, 線程是操作系統調度的最小單元。
一個應用程序至少包括1個進程,而1個進程包括1個或多個線程,線程的尺度更小。每個進程在執行過程中擁有獨立的內存單元,而一個線程的多個線程在執行過程中共享內存。
一、多進程
要讓Python程序實現多進程(multiprocessing),我們先了解操作系統的相關知識。
Unix/Linux操作系統提供了一個fork()
系統調用,它非常特殊。普通的函數調用,調用一次,返回一次,但是fork()
調用一次,返回兩次,因爲操作系統自動把當前進程(稱爲父進程)複製了一份(稱爲子進程),然後,分別在父進程和子進程內返回。
子進程永遠返回0
,而父進程返回子進程的ID。這樣做的理由是,一個父進程可以fork出很多子進程,所以,父進程要記下每個子進程的ID,而子進程只需要調用getppid()
就可以拿到父進程的ID。
Python的os
模塊封裝了常見的系統調用,其中就包括fork
,可以在Python程序中輕鬆創建子進程:
import os
print('Process (%s) start...' % os.getpid())
# Only works on Unix/Linux/Mac:
pid = os.fork()
if pid == 0:
print('I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid()))
else:
print('I (%s) just created a child process (%s).' % (os.getpid(), pid))
運行結果如下:
Process (876) start...
I (876) just created a child process (877).
I am child process (877) and my parent is 876.
由於Windows沒有fork
調用,上面的代碼在Windows上無法運行。
有了fork
調用,一個進程在接到新任務時就可以複製出一個子進程來處理新任務,常見的Apache服務器就是由父進程監聽端口,每當有新的http請求時,就fork出子進程來處理新的http請求。
multiprocessing
Python中的多進程是通過multiprocessing包來實現的,和多線程的threading.Thread差不多,它可以利用multiprocessing.Process對象來創建一個進程對象。
標準庫模塊multiprocessing中提供了一個類對象Process,用於表示進程。
# coding: utf-8
import multiprocessing
from multiprocessing import Process, current_process
import time
def do_sth(arg1, arg2):
print('子進程啓動(%d--%s)' % (current_process().pid, current_process().name))
print('arg1 = %d, arg2 = %d' % (arg1, arg2))
print('子進程結束(%d--%s)' % (current_process().pid, current_process().name))
if __name__ == '__main__':
print('父進程啓動(%d--%s)' % (current_process().pid, current_process().name))
process = Process(target=do_sth, args=(5, 8)) # 根據類對象Process來創建進程
process.start()
process.join()# join等process子線程結束,主線程打印並且結束
print('父進程結束(%d--%s)' % (current_process().pid, current_process().name))
創建子進程時,只需要傳入一個執行函數和函數的參數,創建一個Process
實例,用start()
方法啓動,這樣創建進程比fork()
還要簡單。
join()
方法可以等待子進程結束後再繼續往下運行,通常用於進程間的同步。
python多進程 也可以通過繼承Process類來實現。
from multiprocessing import Process, current_process
import time
class MyProcess(Process):
def __init__(self, name, args):
super(MyProcess,self).__init__()
self.args = args
def run(self):
print('子進程啓動(%d--%s)' % (current_process().pid, current_process().name))
print('arg1 = %d, arg2 = %d' % self.args)
print('子進程結束(%d--%s)' % (current_process().pid, current_process().name))
if __name__ == '__main__':
print('父進程啓動(%d--%s)' % (current_process().pid, current_process().name))
mp = MyProcess(name='myprocess', args=(5, 8))
mp.start()
time.sleep(2)
print('父進程結束(%d--%s)' % (current_process().pid, current_process().name))
Pool
如果要啓動大量的子進程,可以用進程池的方式批量創建子進程:
from multiprocessing import Pool
import time, random
def do_sth(i):
print('子進程%d啓動' % i)
start = time.time()
time.sleep(random.random() * 10)
end = time.time()
print('子進程%d結束,耗時%.2f秒' % (i, end - start))
if __name__ == '__main__':
print('父進程啓動...')
pp = Pool(3) # 進程池最大的數量是3
for i in range(1, 11):
pp.apply_async(do_sth, args=(i,))
pp.close()
# 父進程將被阻塞,等子進程全部執行完成之後,父進程再繼續執行
pp.join()
print('父進程結束')
代碼解讀:
對Pool
對象調用join()
方法會等待所有子進程執行完畢,調用join()
之前必須先調用close()
,調用close()
之後就不能繼續添加新的Process
了。
二、多線程
threading
# coding: utf-8
import time, threading
from threading import current_thread, Thread
print('自動創建並啓動了父(主)線程:%s' % threading.current_thread().getName())
# 自動創建並啓動了父(主)線程:MainThread
#time.sleep(20)
print('父線程%s啓動' % current_thread().getName())
def do_sth(arg1, arg2):
print('子線程%s啓動' % current_thread().getName())
time.sleep(20)
print('arg1 = %d, arg2 = %d' % (arg1, arg2))
print('子線程%s結束' % current_thread().getName())
# process = Thread(target=do_sth, args=(5, 8), name='mythread')
process = Thread(target=do_sth, args=(5, 8))
process.start()
time.sleep(25)
print('父線程%s結束' % current_thread().getName())
自定義線程
繼承threading.Thread來自定義線程類,其本質是重構Thread類中的run方法
# coding: utf-8
from threading import Thread, current_thread
import time
print('父線程%s啓動' % current_thread().getName())
class MyThread(Thread):
def __init__(self, name, args):
super(MyThread, self).__init__(name=name)
self.args = args
def run(self):
print('子線程%s啓動' % current_thread().getName())
time.sleep(20)
print('arg1 = %d, arg2 = %d' % self.args)
print('子線程%s結束' % current_thread().getName())
mt = MyThread(name='mythread', args=(5, 8))
mt.start()
time.sleep(25)
print('父線程%s線程' % current_thread().getName())
threadpool
# coding: utf-8
from threadpool import ThreadPool, makeRequests
import time, random
print('父線程啓動')
args_list = []
for i in range(1, 11):
args_list.append(i)
def do_sth(i):
print('子線程%d啓動' % i)
start = time.time()
time.sleep(random.random() * 10)
end = time.time()
print('子線程%d結束,耗時%.2f秒' % (i, end - start))
tp = ThreadPool(3)
requests = makeRequests(do_sth, args_list)
for req in requests:
tp.putRequest(req)
tp.wait()
print('父線程結束')
semaphore信號量
semaphore是一個內置的計數器。
每當調用acquire()時,內置計數器-1
每當調用release()時,內置計數器+1
計數器不能小於0,當計數器爲0時,acquire()將阻塞線程直到其他線程調用release()。
import time
import threading
s1=threading.Semaphore(5)# 添加一個計數器
def foo():
s1.acquire() #計數器獲得鎖
time.sleep(2) #程序休眠2秒
print("ok",time.ctime())
s1.release() #計數器釋放鎖
for i in range(20):
t1=threading.Thread(target=foo,args=()) #創建線程
t1.start() #啓動線程
# ----------使用with語句對代碼進行簡寫-------------
from threading import Thread, Semaphore
import time, random
sem = Semaphore(3) # 最多同時3個線程可以執行,來限制一個時間點內的線程數量
class MyThread(Thread):
def run(self):
# sem.acquire()
with sem:
print('%s獲得資源' % self.name)
time.sleep(random.random() * 10)
# sem.release()
for i in range(10):
MyThread().start()
三、全局解釋鎖 GIL
在進行GIL講解之前,我們可以先介紹一下並行和併發的區別:
- 並行:多個CPU同時執行多個任務,就好像有兩個程序,這兩個程序是真的在兩個不同的CPU內同時被執行
- 併發:CPU交替處理多個任務,還是有兩個程序,但是隻有一個CPU,會交替處理這兩個程序,而不是同時執行,只不過因爲CPU執行的速度過快,而會使得人們感到是在“同時”執行,執行的先後取決於各個程序對於時間片資源的爭奪。
並行和併發同屬於多任務,目的是要提高CPU的使用效率。
這裏需要注意的是,一個CPU永遠不可能實現並行,即一個CPU不能同時運行多個程序,但是可以在隨機分配的時間片內交替執行(併發),就好像一個人不能同時看兩本書,但是卻能夠先看第一本書半分鐘,再看第二本書半分鐘,這樣來回切換。
什麼是GIL?
即全局解釋器所(global interpreter lock),每個線程在執行時候都需要先獲取GIL,保證同一時刻只有一個線程可以執行代碼,即同一時刻只有一個線程使用CPU,也就是說多線程並不是真正意義上的同時執行。
四、進程通信方式
python提供了多種進程通信的方式,主要有隊列Queue、Pipe、共享內存、Manager模塊等
Queue用於多個進程間實現通信,Pipe是兩個進程的通信
共享內存
from multiprocessing import Process, Value, Array
import ctypes
"""
共享數字、數組 進程之間的通信
"""
def do_sth(num, arr):
num.value = 1.8
for i in range(len(arr)):
arr[i] = -arr[i]
if __name__ == "__main__":
# num = Value('d', 2.3) 創建一個進程間共享的數字類型,默認值爲2.3,d的類型爲雙精度小數
num = Value(ctypes.c_double, 2.3)
arr = Array('i', range(1, 5)) #創建一個進程間共享的數組類型,初始值爲range[1, 5],i是指整型
# arr = Array(ctypes.c_int, range(1, 5))
p = Process(target=do_sth, args=(num, arr,))
p.start()
p.join() # 阻塞主線程
print(num.value)
print(arr[:])
#------------------------------------------------
import multiprocessing
def square_list(mylist, result, square_sum):
"""
function to square a given list
"""
# append squares of mylist to result array
for idx, num in enumerate(mylist):
result[idx] = num * num
# square_sum value
square_sum.value = sum(result)
# print result Array
print("Result(in process p1): {}".format(result[:]))
# print square_sum Value
print("Sum of squares(in process p1): {}".format(square_sum.value))
if __name__ == "__main__":
# input list
mylist = [1,2,3,4]
# creating Array of int data type with space for 4 integers
result = multiprocessing.Array('i', 4)
# creating Value of int data type
square_sum = multiprocessing.Value('i')
# creating new process
p1 = multiprocessing.Process(target=square_list, args=(mylist, result, square_sum))
# starting process
p1.start()
# wait until process is finished
p1.join()
# print result array
print("Result(in main program): {}".format(result[:]))
# print square_sum Value
print("Sum of squares(in main program): {}".format(square_sum.value))
Manager共享模塊
與共享內存相比,Manager模塊更加靈活,可以支持多種對象類型。
from multiprocessing import Process, Manager
def f(d,l):
d[1] = 18
d['2'] = 56
l.reverse()
if __name__ == '__main__':
manager = Manager()
d = manager.dict()
l = manager.list(range(5))
p = Process(target=f,args=(d,l))
p.start()
p.join()
print(d)
print(l)
# --------------------------------------------------------
# 共享字符串
from multiprocessing import Process, Manager
from ctypes import c_char_p
def greet(shareStr):
shareStr.value = shareStr.value + ", World!"
if __name__ == '__main__':
manager = Manager()
shareStr = manager.Value(c_char_p, "Hello") # 字符串共享
process = Process(target=greet, args=(shareStr,))
process.start()
process.join()
print(shareStr.value)
-------------------------------------------------
import multiprocessing
def print_records(records):
"""
function to print record(tuples) in records(list)
"""
for record in records:
print("Name: {0}\nScore: {1}\n".format(record[0], record[1]))
def insert_record(record, records):
"""
function to add a new record to records(list)
"""
records.append(record)
print("New record added!\n")
if __name__ == '__main__':
with multiprocessing.Manager() as manager:
# creating a list in server process memory
records = manager.list([('Sam', 10), ('Adam', 9), ('Kevin',9)])
# new record to be inserted in records
new_record = ('Jeff', 8)
# creating new processes
p1 = multiprocessing.Process(target=insert_record, args=(new_record, records))
p2 = multiprocessing.Process(target=print_records, args=(records,))
# running process p1 to insert new record
p1.start()
p1.join()
# running process p2 to print records
p2.start()
p2.join()
New record added!
Name: Sam
Score: 10
Name: Adam
Score: 9
Name: Kevin
Score: 9
Name: Jeff
Score: 8
共享隊列 Queue
import multiprocessing
def square_list(mylist, q):
"""
function to square a given list
"""
# append squares of mylist to queue
for num in mylist:
print("put num %s to list" %num)
q.put(num * num) #插入數據到隊列
def print_queue(q):
"""
function to print queue elements
"""
print("Queue elements:")
while not q.empty():
print(q.get())
print("Queue is now empty!")
if __name__ == "__main__":
# input list
mylist = [1, 2, 3, 4]
# creating multiprocessing Queue
q = multiprocessing.Queue()
# creating new processes
p1 = multiprocessing.Process(target=square_list, args=(mylist, q))
p2 = multiprocessing.Process(target=print_queue, args=(q,))
# running process p1 to square list
p1.start()
p1.join()
# running process p2 to get queue elements
p2.start()
p2.join()
Given below is a simple diagram depicting the operations on queue:
Pipe通信機制
- Pipe常用於兩個進程,兩個進程分別位於管道的兩端
- Pipe方法返回(conn1,conn2)代表一個管道的兩個端,Pipe方法有duplex參數,默認爲True,即全雙工模式,若爲FALSE,conn1只負責接收信息,conn2負責發送
- send和recv方法分別爲發送和接收信息
import multiprocessing
def sender(conn, msgs):
"""
function to send messages to other end of pipe
"""
for msg in msgs:
conn.send(msg)
print("Sent the message: {}".format(msg))
conn.close()
def receiver(conn):
"""
function to print the messages received from other end of pipe
"""
while 1:
msg = conn.recv()
if msg == "END":
break
print("Received the message: {}".format(msg))
if __name__ == "__main__":
# messages to be sent
msgs = ["hello", "hey", "hru?", "END"]
# creating a pipe 返回一個元組,分別代表管道的兩端
parent_conn, child_conn = multiprocessing.Pipe()
# creating new processes
p1 = multiprocessing.Process(target=sender, args=(parent_conn, msgs))
p2 = multiprocessing.Process(target=receiver, args=(child_conn,))
# running processes
p1.start()
p2.start()
# wait until processes finish
p1.join()
p2.join()
參考文檔
https://www.cnblogs.com/luyuze95/p/11289143.html
https://www.geeksforgeeks.org/multithreading-python-set-1/