多線程和多進程 - 初窺

一、說明

在平常工作中，我們使用top命令查看一臺linux服務器的cpu使用情況時，會發現某個進程的cpu使用率會超過100%，這是爲什麼？

二、舉例

實驗環境爲 CentOS7.6 + Python2.7

1. 多線程、多進程在操作系統中的表現形式

我們首先看兩個例子，test1.py和test2.py，都是執行死循環，test1.py兩個線程，test2.py兩個進程。
【test1.py】 -- 多線程

import threading

def foo():
    while 1:
        pass
 
task1 = threading.Thread(target=foo)
task2 = threading.Thread(target=foo)

task1.start()
task2.start()

執行：python test1.py，然後開啓另一個窗口，執行top查看cpu使用情況（如果是多核處理器，按“1”可以看每一個cpu核的使用情況）。

【test2.py】 -- 多進程

import multiprocessing

def foo():
    while 1:
        pass
 
task1 = multiprocessing.Process(target=foo)
task2 = multiprocessing.Process(target=foo)

task1.start()
task2.start()

殺掉test1的進程，執行：python test2.py，然後開啓另一個窗口，執行top查看cpu使用情況。

通過上面兩個例子可以看到，test1只有一個進程，單個進程的cpu使用率超過100%，且該進程在兩個cpu核上執行。test2有兩個進程，每個進程的cpu使用率爲100%，也在兩個cpu核上執行。

2. 單線程、多線程、多進程的運行速度

接下來我們再來看三個例子，test3.py、test4.py和test5.py，都是將值做一億次減法。test3.py採用單線程，test4.py採用多線程，test5.py採用多進程。
【test3.py】 -- 單線程

import time

N = 100000000

def foo(n):
    while n > 0:
        n -= 1
 
start = time.time()
foo(N)
end = time.time()
print('Time taken in seconds: {}'.format(round(end-start, 2)))

[root@34host ~]# python test3.py 
Time taken in seconds: 2.91

【test4.py】 -- 多線程

import time
import threading

N = 100000000

def foo(n):
    while n > 0:
        n -= 1
 
task1 = threading.Thread(target=foo, args=(N/2,))
task2 = threading.Thread(target=foo, args=(N/2,))

start = time.time()
task1.start()
task2.start()
task1.join()
task2.join()
end = time.time()
print('Time taken in seconds: {}'.format(round(end-start, 2)))

[root@34host ~]# python test4.py 
Time taken in seconds: 6.0

【test5.py】 -- 多進程

import time
import multiprocessing

N = 100000000

def foo(n):
    while n > 0:
        n -= 1
 
task1 = multiprocessing.Process(target=foo, args=(N/2,))
task2 = multiprocessing.Process(target=foo, args=(N/2,))

start = time.time()
task1.start()
task2.start()
task1.join()
task2.join()
end = time.time()
print('Time taken in seconds: {}'.format(round(end-start, 2)))

[root@34host ~]# python test5.py 
Time taken in seconds: 1.48

可以看到多線程比單線程的效率低一倍，多進程比單線程的效率高一倍。我使用多線程的目的無非是想讓程序快一點，反而慢了。

3. 單線程、多線程、多進程的執行結果

接下來我們再來看三個例子，test6.py、test7.py和test8.py，都是將值做一千萬次加法，最後打印這個值。test6.py採用單線程，test7.py採用多線程，test8.py採用多進程。
【test6.py】 -- 單線程

import time

N = 10000000
sum = 0

def foo(n):
    global sum
    for i in range(0, n):
        sum += 1

start = time.time()        
foo(N)
end = time.time()

print('The value of sum: {}'.format(sum))
print('Time taken in seconds: {}'.format(round(end-start, 2)))

[root@34host ~]# python test6.py 
The value of sum: 10000000
Time taken in seconds: 1.26

【test7.py】 -- 多線程

import time
import threading

N = 10000000
sum = 0

def foo(n):
    global sum
    for i in range(0, n):
        sum += 1
 
task1 = threading.Thread(target=foo, args=(N/2,))
task2 = threading.Thread(target=foo, args=(N/2,))

start = time.time()
task1.start()
task2.start()
task1.join()
task2.join()
end = time.time()

print('The value of sum: {}'.format(sum))
print('Time taken in seconds: {}'.format(round(end-start, 2)))

[root@34host ~]# python test7.py 
The value of sum: 7333348
Time taken in seconds: 1.76

【test8.py】 -- 多進程

import time
import multiprocessing

N = 10000000
sum = 0

def foo(n):
    global sum
    for i in range(0, n):
        sum += 1
 
task1 = multiprocessing.Process(target=foo, args=(N/2,))
task2 = multiprocessing.Process(target=foo, args=(N/2,))

start = time.time()
task1.start()
task2.start()
task1.join()
task2.join()
end = time.time()

print('The value of sum: {}'.format(sum))
print('Time taken in seconds: {}'.format(round(end-start, 2)))

[root@34host ~]# python test8.py 
The value of sum: 0
Time taken in seconds: 0.57

可以看到上面三種不同的寫法，得出來的結果都不一樣。

我將test7.py和test8.py都改造一下，分別爲test9.py和test10.py
【test9.py】 -- 多線程

import time
import threading

N = 10000000
sum = 0
lock = threading.Lock()

def foo(n):
    global sum
    global lock
    for i in range(0, n):
        with lock:        
            sum += 1
 
task1 = threading.Thread(target=foo, args=(N/2,))
task2 = threading.Thread(target=foo, args=(N/2,))

start = time.time()
task1.start()
task2.start()
task1.join()
task2.join()
end = time.time()

print('The value of sum: {}'.format(sum))
print('Time taken in seconds: {}'.format(round(end-start, 2)))

[root@34host ~]# python test9.py 
The value of sum: 10000000
Time taken in seconds: 21.49

【test10.py】 -- 多進程

import time
import multiprocessing

N = 10000000
sum = multiprocessing.Value('i', 0)
lock = multiprocessing.Lock()

def foo(n):
    global sum
    global lock
    for i in range(0, n):
        with lock: 
            sum.value += 1
 
task1 = multiprocessing.Process(target=foo, args=(N/2,))
task2 = multiprocessing.Process(target=foo, args=(N/2,))

start = time.time()
task1.start()
task2.start()
task1.join()
task2.join()
end = time.time()

print('The value of sum: {}'.format(sum.value))
print('Time taken in seconds: {}'.format(round(end-start, 2)))

[root@34host ~]# python test10.py 
The value of sum: 10000000
Time taken in seconds: 41.3

可以看到結果都是正確的了，但是執行的時間卻比之前長了很多，而且多進程還要慢於多線程。

三、問題

上面的例子，就在我大腦中產生了很多的疑惑。我後面將依次解開這些謎底。

多線程跟多進程有什麼區別，什麼時候用多線程，什麼時候用多進程？
單線程、多線程、多進程的效率問題？
多線程、多進程在編程的時候有哪些注意事項？

多線程和多進程 - 初窺

一、說明

二、舉例

1. 多線程、多進程在操作系統中的表現形式

2. 單線程、多線程、多進程的運行速度

3. 單線程、多線程、多進程的執行結果

三、問題

多線程和多進程 - 初窺

Xming - xmanager的替代方案

Mysql - 數據庫時區是客戶端屬性還是服務端屬性

sql求連續值問題

SQL優化-20231016

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結