前面學習了多線程,接下來學習多進程的創建和使用。多進程更適合計算密集型的操作,他的語法和多線程非常相像,唯一需要注意的是,多線程之間是可以直接共享內存數據的;但是多進程默認每個進程是不能訪問其他進程(程序)的內容。我們可以通過一些特殊的方式(隊列,數組和字典)來實現,注意這幾個數據結構和平常使用的不太一樣,是在多進程中特殊定義的。
例如:通過queue來共享數據
#!/usr/bin/env python # -*- coding:utf-8 -*- # Author:Alex Li from multiprocessing import Process from multiprocessing import queues import multiprocessing from multiprocessing import Array def foo(i,arg): arg.put(i) print('say hi',i,arg.qsize()) if __name__ == "__main__": # li = [] li = queues.Queue(20,ctx=multiprocessing) for i in range(10): p = Process(target=foo,args=(i,li,)) p.start() p.join() ------------------ say hi 0 1 say hi 1 2 say hi 2 3 say hi 3 4 say hi 4 5 say hi 5 6 say hi 6 7 say hi 7 8 say hi 8 9 say hi 9 10
例2 通過array來共享數據,注意array初始化的時候就需要固定數據類型和長度
from multiprocessing import Process from multiprocessing import queues import multiprocessing from multiprocessing import Array def foo(i,arg): arg[i] = i + 100 for item in arg: print(item) print('================') if __name__ == "__main__": li = Array('i', 10) for i in range(10): p = Process(target=foo,args=(i,li,)) p.start() ---------------- 0 0 0 0 0 0 0 107 0 0 ================ 0 0 0 0 0 0 0 107 108 0 ================ 0 101 0 0 0 0 0 107 108 0 ================ 0 101 0 0 0 0 106 107 108 0 ================ 0 101 0 0 0 105 106 107 108 0 ================ ...(等等省略)
例3 通過字典方式進程間共享
#!/usr/bin/env python # -*- coding:utf-8 -*- # Author:Alex Li from multiprocessing import Process from multiprocessing import queues import multiprocessing from multiprocessing import Manager def foo(i,arg): arg[i] = i + 100 print(arg.values()) if __name__ == "__main__": # li = [] # li = queues.Queue(20,ctx=multiprocessing) obj = Manager() li = obj.dict() for i in range(10): p = Process(target=foo,args=(i,li,)) p.start() p.join() ---------------- [100] [100, 101] [100, 101, 102] [100, 101, 102, 103] [100, 101, 102, 103, 104] [100, 101, 102, 103, 104, 105] [100, 101, 102, 103, 104, 105, 106] [100, 101, 102, 103, 104, 105, 106, 107] [100, 101, 102, 103, 104, 105, 106, 107, 108] [100, 101, 102, 103, 104, 105, 106, 107, 108, 109]
和線程類似,當多個進程操作同一個全局變量的時候,需要加鎖,不然可能錯誤;
比如
#!/usr/bin/env python # -*- coding:utf-8 -*- # Author:Alex Li from multiprocessing import Process from multiprocessing import queues from multiprocessing import Array from multiprocessing import RLock, Lock, Event, Condition, Semaphore import multiprocessing import time def foo(i,lis): lis[0] = lis[0] - 1 time.sleep(1) print('say hi',lis[0]) if __name__ == "__main__": # li = [] li = Array('i', 1) li[0] = 10 for i in range(10): p = Process(target=foo,args=(i,li)) p.start() ------------- say hi 0 say hi 0 say hi 0 say hi 0 say hi 0 say hi 0 say hi 0 say hi 0 say hi 0 say hi 0
如何修復?
兩種方式,一個是p.start()下面加個p.join(),那真的就算按順序一個個執行了;還有一個方式就是加鎖
#!/usr/bin/env python # -*- coding:utf-8 -*- # Author:Alex Li from multiprocessing import Process from multiprocessing import queues from multiprocessing import Array from multiprocessing import RLock, Lock, Event, Condition, Semaphore import multiprocessing import time def foo(i,lis,lc): lc.acquire() lis[0] = lis[0] - 1 time.sleep(1) print('say hi',lis[0]) lc.release() if __name__ == "__main__": # li = [] li = Array('i', 1) li[0] = 10 lock = RLock() for i in range(10): p = Process(target=foo,args=(i,li,lock)) p.start() -------------- say hi 9 say hi 8 say hi 7 say hi 6 say hi 5 say hi 4 say hi 3 say hi 2 say hi 1 say hi 0
和線程池相比,Python已經提供了完備的進程池模塊,因此可以直接使用。進程池裏面有2種方法,apply或apply_async;前者是阻塞,而後者是非阻塞的
例如下面例子我使用的apply_async,那麼所有的進程是(非阻塞)同時執行的,當執行到time.sleep(5),每個子線程會卡5秒,而同時主線程執行到了pool.terminate(),這個時候就直接終止程序了
#!/usr/bin/env python # -*- coding:utf-8 -*- from multiprocessing import Pool import time def f1(arg): print(arg,'b') time.sleep(5) print(arg,'a') if __name__ == "__main__": pool = Pool(5) for i in range(30): # pool.apply(func=f1,args=(i,))#按照順序執行 pool.apply_async(func=f1,args=(i,))#同時執行 # pool.close() # 所有的任務執行完畢 time.sleep(2) pool.terminate() # 立即終止 pool.join() pass -------------- "C:\Program Files\Python3\python.exe" C:/temp/s13day11/day11/s16.py 0 b 1 b 2 b 3 b 4 b
如果改成close(),那麼他會等待pool中的任務執行完成之後再中止程序
from multiprocessing import Pool import time def f1(arg): print(arg,'b') time.sleep(5) print(arg,'a') if __name__ == "__main__": pool = Pool(5) for i in range(30): # pool.apply(func=f1,args=(i,))#按照順序執行 pool.apply_async(func=f1,args=(i,))#同時執行 pool.close() # 所有的任務執行完畢 time.sleep(2) # pool.terminate() # 立即終止 pool.join() pass ---------- "C:\Program Files\Python3\python.exe" C:/temp/s13day11/day11/s16.py 0 b 1 b 2 b 3 b 4 b 0 a 5 b 1 a 6 b 2 a 7 b 3 a 8 b 4 a 9 b 5 a 10 b 6 a 11 b 7 a 8 a 12 b 13 b 9 a 14 b 10 a 15 b 11 a 16 b 13 a 12 a 18 b 17 b 14 a 19 b 15 a 20 b 16 a 21 b 17 a 18 a 22 b 23 b 19 a 24 b 20 a 25 b 21 a 26 b 22 a 27 b 23 a 28 b 24 a 29 b 25 a 26 a 27 a 28 a 29 a
注意和線程類似,進程裏面也可以使用join(),確保主進程阻塞在這裏直到所有的子進程都結束。