cs61a 课时笔记 并行计算

笔记参考:http://composingprograms.com/pages/48-parallel-computing.html

python提供两种并行计算的方式Thread和Multiprocessing。

并行计算

线程(Thread)

多线程是伪并行计算。
在线程中,一个解释器中存在多个执行“线程”。每个线程独立于其他线程执行代码,尽管它们共享相同的数据。然而,Python解释器一次只解释一个线程中的代码,在它们之间切换以提供并行的假象。
使用threading库的Thread,传入目标函数和参数即可.

import threading
def thread_test():
    other = threading.Thread(target=thread_sum, args=(0,1,2))
    other.start()
    thread_sum()
def thread_sum(a=1, b=2, c=3):
    print('thread_sum from', threading.current_thread().name, a+b+c)
    
thread_test()

结果:

(('thread_sum from', 'MainThread', 6)
'thread_sum from', 'Thread-1', 3)

多处理器处理

与线程不同它允许程序生成多个解释器或进程,每个解释器或进程都可以独立运行代码。这些进程通常不共享数据,因此任何共享状态都必须在进程之间进行通信。另一方面,进程根据底层操作系统和硬件提供的并行级别并行执行。因此,如果CPU有多个处理器内核,Python进程就可以真正并发地运行。
如下代码:

import multiprocessing
def process_hello():
    other = multiprocessing.Process(target=process_say_hello, args=())
    other.start()
    process_say_hello()
    
def process_say_hello():
    print('hello from', multiprocessing.current_process().name)

process_hello()

结果:

('hello from', 'MainProcess')
('hello from', 'Process-1')

共享信息的问题

在使用线程的时候,信息是共享的。解析器可以在任何时间在多个线程间切换。这就有可能出现当前线程的值在另外一个线程中被改变,如下例子:

import threading
from time import sleep

counter = [0]

def increment():
    count = counter[0]
    sleep(0) # try to force a switch to the other thread
    counter[0] = count + 1

other = threading.Thread(target=increment, args=())
other.start()
increment()
print('count is now: ', counter[0])

我们认为在sleep被调用时,当前线程被暂停,开始执行另外一个线程。在这个代码中要实现自增,需要3个操作:读取counter[0],计算,写入counter[0]。可能的运行顺序如下:

Thread 0                    Thread 1
read counter[0]: 0
                            read counter[0]: 0
calculate 0 + 1: 1
write 1 -> counter[0]
                            calculate 0 + 1: 1
                            write 1 -> counter[0]

因此需要多线程中可变数据进行同步或者说保护。

同步

使用队列先进先出的特性可以保证信息被同步。如下例子,我们已经将consumer线程标记为守护进程(daemon),这意味着程序在退出时不会等待该线程完成。这允许我们在consumer中使用无限循环。但是,我们需要确保在队列中的所有项都被使用之后主线程能够退出。调用task_done方法来通知队列它已经处理完一个项,主线程调用join方法,该方法等待所有项都被处理完,以确保程序仅在处理完之后才退出。

from queue import Queue
import threading

queue = Queue()

def synchronized_consume():
    while True:
        print('got an item:', queue.get())
        queue.task_done()

def synchronized_produce():
    consumer = threading.Thread(target=synchronized_consume, args=())
    consumer.daemon = True
    consumer.start()
    for i in range(10):
        queue.put(i)
    queue.join()

synchronized_produce()
('got an item:', 0)
('got an item:', 1)
('got an item:', 2)
('got an item:', 3)
('got an item:', 4)
('got an item:', 5)
('got an item:', 6)
('got an item:', 7)
('got an item:', 8)
('got an item:', 9)

当上述使用队列的最简单同步方式无法使用时,需要我们自己定义如何同步。锁机制是一个可以实现的基本方法。在Thread模块中提供了Lock类,其中提供了获取和释放锁的方法。只有一个线程可以获取获取锁,当其被释放后才能被其他线程获取。
一个例子来自:详解python中的Lock与RLock

import threading

count = 0
def print_time(threadName):
    global count
    c=0
    while(c<20):
        c+=1
        count+=1
        print("{0}: set count to {1}".format( threadName, count) )
try:
    threading.Thread( target=print_time, args=("Thread-1", ) ).start()
    threading.Thread( target=print_time, args=("Thread-2", ) ).start()
    threading.Thread( target=print_time, args=("Thread-3", ) ).start()
except Exception as e:
    print("Error: unable to start thread")

在没有给线程加锁的时候,每个线程都会对全局变量ccount加1.运行结果如下,可以认为现在在任意时刻切换到任意线程执行。

Thread-1: set count to 1
Thread-2: set count to 2
Thread-2: set count to 3
Thread-3: set count to 4
Thread-2: set count to 5
Thread-1: set count to 6
Thread-3: set count to 7
Thread-2: set count to 8
Thread-1: set count to 9
...

Thread-2: set count to 33
Thread-3: set count to 34
Thread-2: set count to 35
Thread-3: set count to 36
Thread-1: set count to 37
Thread-2: set count to 38
Thread-3: set count to 39
Thread-3: set count to 40
Thread-3: set count to 41
Thread-1: set count to 42
Thread-1: set count to 43
Thread-3: set count to 44
Thread-1: set count to 45
...
Thread-3: set count to 52
Thread-1: set count to 53
Thread-1: set count to 54
Thread-1: set count to 55
Thread-1: set count to 56
Thread-3: set count to 57
Thread-3: set count to 58
Thread-1: set count to 59
Thread-3: set count to 60

给线程加锁:

import threading

count = 0
lock = threading.Lock()

def print_time(threadName):
    global count

    c=0
    # lock
    with lock:
        while(c<30):
            c+=1
            count+=1
            print("{0}: set count to {1}".format( threadName, count) )
try:
    threading.Thread( target=print_time, args=("Thread-1", ) ).start()
    threading.Thread( target=print_time, args=("Thread-2", ) ).start()
    threading.Thread( target=print_time, args=("Thread-3", ) ).start()
except Exception as e:
    print("Error: unable to start thread")

由于给线程加了锁,只有当前被锁线程被释放,才会执行下一个线程。这里没有直接调用acquire()和release()方法,而是使用with。这两种方法是等价的,使用with更加简洁和方便,因为线程必须被释放,不然会被死锁。为了避免这种情况,就像为了避免打开文件后忘记关闭使用一样with。

Thread-1: set count to 1
Thread-1: set count to 2
Thread-1: set count to 3
Thread-1: set count to 4
Thread-1: set count to 5
Thread-1: set count to 6
Thread-1: set count to 7
Thread-1: set count to 8
Thread-1: set count to 9
Thread-1: set count to 10
Thread-1: set count to 11
Thread-1: set count to 12
Thread-1: set count to 13
Thread-1: set count to 14
Thread-1: set count to 15
Thread-1: set count to 16
Thread-1: set count to 17
Thread-1: set count to 18
Thread-1: set count to 19
Thread-1: set count to 20
Thread-1: set count to 21
Thread-1: set count to 22
Thread-1: set count to 23
Thread-1: set count to 24
Thread-1: set count to 25
Thread-1: set count to 26
Thread-1: set count to 27
Thread-1: set count to 28
Thread-1: set count to 29
Thread-1: set count to 30
Thread-2: set count to 31
Thread-2: set count to 32
Thread-2: set count to 33
Thread-2: set count to 34
Thread-2: set count to 35
Thread-2: set count to 36
Thread-2: set count to 37
Thread-2: set count to 38
Thread-2: set count to 39
Thread-2: set count to 40
Thread-2: set count to 41
Thread-2: set count to 42
Thread-2: set count to 43
Thread-2: set count to 44
Thread-2: set count to 45
Thread-2: set count to 46
Thread-2: set count to 47
Thread-2: set count to 48
Thread-2: set count to 49
Thread-2: set count to 50
Thread-2: set count to 51
Thread-2: set count to 52
Thread-2: set count to 53
Thread-2: set count to 54
Thread-2: set count to 55
Thread-2: set count to 56
Thread-2: set count to 57
Thread-2: set count to 58
Thread-2: set count to 59
Thread-2: set count to 60
Thread-3: set count to 61
Thread-3: set count to 62
Thread-3: set count to 63
Thread-3: set count to 64
Thread-3: set count to 65
Thread-3: set count to 66
Thread-3: set count to 67
Thread-3: set count to 68
Thread-3: set count to 69
Thread-3: set count to 70
Thread-3: set count to 71
Thread-3: set count to 72
Thread-3: set count to 73
Thread-3: set count to 74
Thread-3: set count to 75
Thread-3: set count to 76
Thread-3: set count to 77
Thread-3: set count to 78
Thread-3: set count to 79
Thread-3: set count to 80
Thread-3: set count to 81
Thread-3: set count to 82
Thread-3: set count to 83
Thread-3: set count to 84
Thread-3: set count to 85
Thread-3: set count to 86
Thread-3: set count to 87
Thread-3: set count to 88
Thread-3: set count to 89
Thread-3: set count to 90

信息传递

一个可以完全避免数据被另外的线程改变的方法就是避免执行同一段代码。使用多处理的方式就能够做到。多处理的方式可以通过信息传递使每个处理的信息得到共享。 multiprocessing 模块中的pipe类提供了通信的管道。send方法发送一个对象,recv方法接受一个对象。接收对象是阻塞的。只有当有对象被接收时,recv才被调用。这也是为什么下面的import multiprocessing()中可以添加无限循环。

import multiprocessing
def process_consume(in_pipe):
    while True:
        item = in_pipe.recv()
        if item is None:
            return
        print('got an item:', item)

def process_produce():
    pipe = multiprocessing.Pipe(False)
    consumer = multiprocessing.Process(target=process_consume, args=(pipe[0],))
    consumer.start()
    for i in range(10):
        pipe[1].send(i)
    pipe[1].send(None) # done signal

process_produce()
('got an item:', 0)
('got an item:', 1)
('got an item:', 2)
('got an item:', 3)
('got an item:', 4)
('got an item:', 5)
('got an item:', 6)
('got an item:', 7)
('got an item:', 8)
('got an item:', 9)

上面的代码中,最后发送一个None作为结束的标志。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章