cs61a 課時筆記 並行計算

筆記參考:http://composingprograms.com/pages/48-parallel-computing.html

python提供兩種並行計算的方式Thread和Multiprocessing。

並行計算

線程(Thread)

多線程是僞並行計算。
在線程中,一個解釋器中存在多個執行“線程”。每個線程獨立於其他線程執行代碼,儘管它們共享相同的數據。然而,Python解釋器一次只解釋一個線程中的代碼,在它們之間切換以提供並行的假象。
使用threading庫的Thread,傳入目標函數和參數即可.

import threading
def thread_test():
    other = threading.Thread(target=thread_sum, args=(0,1,2))
    other.start()
    thread_sum()
def thread_sum(a=1, b=2, c=3):
    print('thread_sum from', threading.current_thread().name, a+b+c)
    
thread_test()

結果:

(('thread_sum from', 'MainThread', 6)
'thread_sum from', 'Thread-1', 3)

多處理器處理

與線程不同它允許程序生成多個解釋器或進程,每個解釋器或進程都可以獨立運行代碼。這些進程通常不共享數據,因此任何共享狀態都必須在進程之間進行通信。另一方面,進程根據底層操作系統和硬件提供的並行級別並行執行。因此,如果CPU有多個處理器內核,Python進程就可以真正併發地運行。
如下代碼:

import multiprocessing
def process_hello():
    other = multiprocessing.Process(target=process_say_hello, args=())
    other.start()
    process_say_hello()
    
def process_say_hello():
    print('hello from', multiprocessing.current_process().name)

process_hello()

結果:

('hello from', 'MainProcess')
('hello from', 'Process-1')

共享信息的問題

在使用線程的時候,信息是共享的。解析器可以在任何時間在多個線程間切換。這就有可能出現當前線程的值在另外一個線程中被改變,如下例子:

import threading
from time import sleep

counter = [0]

def increment():
    count = counter[0]
    sleep(0) # try to force a switch to the other thread
    counter[0] = count + 1

other = threading.Thread(target=increment, args=())
other.start()
increment()
print('count is now: ', counter[0])

我們認爲在sleep被調用時,當前線程被暫停,開始執行另外一個線程。在這個代碼中要實現自增,需要3個操作:讀取counter[0],計算,寫入counter[0]。可能的運行順序如下:

Thread 0                    Thread 1
read counter[0]: 0
                            read counter[0]: 0
calculate 0 + 1: 1
write 1 -> counter[0]
                            calculate 0 + 1: 1
                            write 1 -> counter[0]

因此需要多線程中可變數據進行同步或者說保護。

同步

使用隊列先進先出的特性可以保證信息被同步。如下例子,我們已經將consumer線程標記爲守護進程(daemon),這意味着程序在退出時不會等待該線程完成。這允許我們在consumer中使用無限循環。但是,我們需要確保在隊列中的所有項都被使用之後主線程能夠退出。調用task_done方法來通知隊列它已經處理完一個項,主線程調用join方法,該方法等待所有項都被處理完,以確保程序僅在處理完之後才退出。

from queue import Queue
import threading

queue = Queue()

def synchronized_consume():
    while True:
        print('got an item:', queue.get())
        queue.task_done()

def synchronized_produce():
    consumer = threading.Thread(target=synchronized_consume, args=())
    consumer.daemon = True
    consumer.start()
    for i in range(10):
        queue.put(i)
    queue.join()

synchronized_produce()
('got an item:', 0)
('got an item:', 1)
('got an item:', 2)
('got an item:', 3)
('got an item:', 4)
('got an item:', 5)
('got an item:', 6)
('got an item:', 7)
('got an item:', 8)
('got an item:', 9)

當上述使用隊列的最簡單同步方式無法使用時,需要我們自己定義如何同步。鎖機制是一個可以實現的基本方法。在Thread模塊中提供了Lock類,其中提供了獲取和釋放鎖的方法。只有一個線程可以獲取獲取鎖,當其被釋放後才能被其他線程獲取。
一個例子來自:詳解python中的Lock與RLock

import threading

count = 0
def print_time(threadName):
    global count
    c=0
    while(c<20):
        c+=1
        count+=1
        print("{0}: set count to {1}".format( threadName, count) )
try:
    threading.Thread( target=print_time, args=("Thread-1", ) ).start()
    threading.Thread( target=print_time, args=("Thread-2", ) ).start()
    threading.Thread( target=print_time, args=("Thread-3", ) ).start()
except Exception as e:
    print("Error: unable to start thread")

在沒有給線程加鎖的時候,每個線程都會對全局變量ccount加1.運行結果如下,可以認爲現在在任意時刻切換到任意線程執行。

Thread-1: set count to 1
Thread-2: set count to 2
Thread-2: set count to 3
Thread-3: set count to 4
Thread-2: set count to 5
Thread-1: set count to 6
Thread-3: set count to 7
Thread-2: set count to 8
Thread-1: set count to 9
...

Thread-2: set count to 33
Thread-3: set count to 34
Thread-2: set count to 35
Thread-3: set count to 36
Thread-1: set count to 37
Thread-2: set count to 38
Thread-3: set count to 39
Thread-3: set count to 40
Thread-3: set count to 41
Thread-1: set count to 42
Thread-1: set count to 43
Thread-3: set count to 44
Thread-1: set count to 45
...
Thread-3: set count to 52
Thread-1: set count to 53
Thread-1: set count to 54
Thread-1: set count to 55
Thread-1: set count to 56
Thread-3: set count to 57
Thread-3: set count to 58
Thread-1: set count to 59
Thread-3: set count to 60

給線程加鎖:

import threading

count = 0
lock = threading.Lock()

def print_time(threadName):
    global count

    c=0
    # lock
    with lock:
        while(c<30):
            c+=1
            count+=1
            print("{0}: set count to {1}".format( threadName, count) )
try:
    threading.Thread( target=print_time, args=("Thread-1", ) ).start()
    threading.Thread( target=print_time, args=("Thread-2", ) ).start()
    threading.Thread( target=print_time, args=("Thread-3", ) ).start()
except Exception as e:
    print("Error: unable to start thread")

由於給線程加了鎖,只有當前被鎖線程被釋放,纔會執行下一個線程。這裏沒有直接調用acquire()和release()方法,而是使用with。這兩種方法是等價的,使用with更加簡潔和方便,因爲線程必須被釋放,不然會被死鎖。爲了避免這種情況,就像爲了避免打開文件後忘記關閉使用一樣with。

Thread-1: set count to 1
Thread-1: set count to 2
Thread-1: set count to 3
Thread-1: set count to 4
Thread-1: set count to 5
Thread-1: set count to 6
Thread-1: set count to 7
Thread-1: set count to 8
Thread-1: set count to 9
Thread-1: set count to 10
Thread-1: set count to 11
Thread-1: set count to 12
Thread-1: set count to 13
Thread-1: set count to 14
Thread-1: set count to 15
Thread-1: set count to 16
Thread-1: set count to 17
Thread-1: set count to 18
Thread-1: set count to 19
Thread-1: set count to 20
Thread-1: set count to 21
Thread-1: set count to 22
Thread-1: set count to 23
Thread-1: set count to 24
Thread-1: set count to 25
Thread-1: set count to 26
Thread-1: set count to 27
Thread-1: set count to 28
Thread-1: set count to 29
Thread-1: set count to 30
Thread-2: set count to 31
Thread-2: set count to 32
Thread-2: set count to 33
Thread-2: set count to 34
Thread-2: set count to 35
Thread-2: set count to 36
Thread-2: set count to 37
Thread-2: set count to 38
Thread-2: set count to 39
Thread-2: set count to 40
Thread-2: set count to 41
Thread-2: set count to 42
Thread-2: set count to 43
Thread-2: set count to 44
Thread-2: set count to 45
Thread-2: set count to 46
Thread-2: set count to 47
Thread-2: set count to 48
Thread-2: set count to 49
Thread-2: set count to 50
Thread-2: set count to 51
Thread-2: set count to 52
Thread-2: set count to 53
Thread-2: set count to 54
Thread-2: set count to 55
Thread-2: set count to 56
Thread-2: set count to 57
Thread-2: set count to 58
Thread-2: set count to 59
Thread-2: set count to 60
Thread-3: set count to 61
Thread-3: set count to 62
Thread-3: set count to 63
Thread-3: set count to 64
Thread-3: set count to 65
Thread-3: set count to 66
Thread-3: set count to 67
Thread-3: set count to 68
Thread-3: set count to 69
Thread-3: set count to 70
Thread-3: set count to 71
Thread-3: set count to 72
Thread-3: set count to 73
Thread-3: set count to 74
Thread-3: set count to 75
Thread-3: set count to 76
Thread-3: set count to 77
Thread-3: set count to 78
Thread-3: set count to 79
Thread-3: set count to 80
Thread-3: set count to 81
Thread-3: set count to 82
Thread-3: set count to 83
Thread-3: set count to 84
Thread-3: set count to 85
Thread-3: set count to 86
Thread-3: set count to 87
Thread-3: set count to 88
Thread-3: set count to 89
Thread-3: set count to 90

信息傳遞

一個可以完全避免數據被另外的線程改變的方法就是避免執行同一段代碼。使用多處理的方式就能夠做到。多處理的方式可以通過信息傳遞使每個處理的信息得到共享。 multiprocessing 模塊中的pipe類提供了通信的管道。send方法發送一個對象,recv方法接受一個對象。接收對象是阻塞的。只有當有對象被接收時,recv才被調用。這也是爲什麼下面的import multiprocessing()中可以添加無限循環。

import multiprocessing
def process_consume(in_pipe):
    while True:
        item = in_pipe.recv()
        if item is None:
            return
        print('got an item:', item)

def process_produce():
    pipe = multiprocessing.Pipe(False)
    consumer = multiprocessing.Process(target=process_consume, args=(pipe[0],))
    consumer.start()
    for i in range(10):
        pipe[1].send(i)
    pipe[1].send(None) # done signal

process_produce()
('got an item:', 0)
('got an item:', 1)
('got an item:', 2)
('got an item:', 3)
('got an item:', 4)
('got an item:', 5)
('got an item:', 6)
('got an item:', 7)
('got an item:', 8)
('got an item:', 9)

上面的代碼中,最後發送一個None作爲結束的標誌。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章