Python 多線程基礎

官方參考文檔

https://docs.python.org/zh-cn/3.7/library/threading.html#module-threading

Thread 直接創建子線程

import threading
import time


def work(internal):
    name = threading.current_thread().name
    print(f"{name} start")
    time.sleep(internal)
    print(f"{name} end")


print("Main: ", threading.current_thread().name)
for i in range(5):
    thread_instance = threading.Thread(target=work, args=(i, ))
    thread_instance.start()

print("Main: end")

這裏一共產生了三個線程，分別是主線程MainThread和兩個子線程Thread-1、Thread-2。另外我們觀察到，主線程首先運行結束，
Thread-1、Thread-2 才接連運行結束，分別間隔了 1 秒和 4 秒。這說明主線程並沒有等待子線程運行完畢才結束運行，而是直接退出了，有點不符合常理。

規定主線程在子線程後退出

import threading
import time


def work(internal):
    name = threading.current_thread().name
    print(f"{name} start")
    time.sleep(internal)
    print(f"{name} end")


print("Main: ", threading.current_thread().name)
for i in range(5):
    thread_instance = threading.Thread(target=work, args=(i, ))
    thread_instance.start()
    # 規定主線程在子線程後退出 
    thread_instance.join()

print("Main: end")

有關於 join

如果我們測試上一步的運行時間，可以發現不管是單獨運行，還是多線程運行，join 的運行時間均是 10s 左右。
（10 = 1+2+3+4）似乎失去了多線程運行的意義，其實則是沒有正確使用 join 的結果。

那麼, join 真正的含義是什麼呢？
join 會卡住主線程，並讓當前已經 start 的子線程繼續運行，直到調用.join的這個線程運行完畢。
所以，我們只需要 join 時間最長的一個線程即可。

import threading
import time


now = lambda :time.time()


def work(internal):
    name = threading.current_thread().name
    print(f"{name} start")
    time.sleep(internal)
    print(f"{name} end")


t1 = now()
print("Main: ", threading.current_thread().name)
for i in range(5):
    thread_instance = threading.Thread(target=work, args=(i, ))
    thread_instance.start()
    # 可規定主線程在子線程後退出
    if i == 4:
        thread_instance.join()

print(f"Main: end, Time: {now() - t1}")

當然，這是在我們知道哪個線程先運行完，哪個線程後面運行完的情況下。
在我們不知道哪個線程先運行完成的情況下，在以後之後，需要對每一個進行 join。

我們設想這樣一個場景。你的爬蟲使用10個線程爬取100個 URL，主線程需要等到所有URL 都已經爬取完成以後，再來分析數據。此時就可以通過 join 先把主線程卡住，
等到10個子線程全部運行結束了，再用主線程進行後面的操作。
如果我不知道哪個線程先運行完，那個線程後運行完怎麼辦？這個時候就要每個線程都執行 join 操作了。
這種情況下，每個線程使用 join是合理的：

thread_list = []
for _ in range(10):
    thread = threading.Thread(target=xxx, args=(xxx, xxx)) 換行thread.start()
    thread_list.append(thread)

for thread in thread_list:
    thread.join()

通過繼承的方式創建多線程

import threading
import time


class MyThread(threading.Thread):
    def __init__(self, interval):
        super(MyThread, self).__init__()
        self.interval = interval

    def run(self):
        name = threading.current_thread().name
        print(f"{name} start")
        time.sleep(self.interval)
        print(f"{name} end")


print("Main: ", threading.current_thread().name)
for i in range(5):
    thread_instance = MyThread(i)
    thread_instance.start()
    # 可規定主線程在子線程後退出
    # 可規定主線程在子線程後退出
    if i == 4:
        thread_instance.join() 
print("Main: end")

兩種實現方式的效果是相同的。

守護線程

在線程中有一個叫作守護線程的概念，如果一個線程被設置爲守護線程，那麼意味着這個線程是“不重要”的，這意味着，如果主線程結束了而該守護線程還沒有運行完，
那麼它將會被強制結束。在 Python 中我們可以通過 setDaemon 方法來將某個線程設置爲守護線程。

import threading
import time


now = lambda:time.time()


def work(internal):
    name = threading.current_thread().name
    print(f"{name} start")
    time.sleep(internal)
    print(f"{name} end")


thread_1 = threading.Thread(target=work, args=(1, ))
thread_2 = threading.Thread(target=work, args=(5, ))
thread_2.setDaemon(True)
thread_1.start()
thread_2.start()

print("Main End.")

互斥鎖

在一個進程中的多個線程是共享資源的，比如在一個進程中，有一個全局變量 count 用來計數，現在我們聲明多個線程，每個線程運行時都給 count 加 1，
讓我們來看看效果如何，代碼實現如下:

import threading
import time

count = 0


class MyThread(threading.Thread):
    def __init__(self):
        super(MyThread, self).__init__()

    def run(self):
        global count
        temp = count + 1
        time.sleep(0.001)
        count = temp


def main():
    threads = []
    for _ in range(1000):
        thread_ = MyThread()
        thread_.start()
        threads.append(thread_)

    for t in threads:
        t.join()

    print("Final count: ", count)


main()

那這樣，按照常理來說，最終的 count 值應該爲 1000。但其實不然，我們來運行一下看看。
運行結果如下：
Final count: 69

這是爲什麼呢？因爲count這個值是共享的，每個線程都可以在執行temp=count這行代碼時拿到當前count的值，但是這些線程中的一些線程可能是併發或者並行執行的，
這就導致不同的線程拿到的可能是同一個 count 值，最後導致有些線程的 count 的加 1 操作並沒有生效，導致最後的結果偏小。

所以，如果多個線程同時對某個數據進行讀取或修改，就會出現不可預料的結果。爲了避免這種情況，我們需要對多個線程進行同步，要實現同步，
我們可以對需要操作的數據進行加鎖保護，這裏就需要用到threading.Lock 了。

加鎖保護是什麼意思呢？就是說，某個線程在對數據進行操作前，需要先加鎖，這樣其他的線程發現被加鎖了之後，就無法繼續向下執行，會一直等待鎖被釋放，
只有加鎖的線程把鎖釋放了，其他的線程才能繼續加鎖並對數據做修改，修改完了再釋放鎖。這樣可以確保同一時間只有一個線程操作數據，多個線程不會再同時讀取和修改同一個數據，
這樣最後的運行結果就是對的了。

import threading
import time

count = 0
lock = threading.Lock()


class MyThread(threading.Thread):
    def __init__(self):
        super(MyThread, self).__init__()

    def run(self):
        global count
        # 獲取鎖
        lock.acquire()
        temp = count + 1
        time.sleep(0.001)
        count = temp
        # 釋放鎖
        lock.release()


def main():
    threads = []
    for _ in range(1000):
        thread_ = MyThread()
        thread_.start()
        threads.append(thread_)

    for t in threads:
        t.join()

    print("Final count: ", count)


main()

關於 Python 中的多線程

由於Python中GIL的限制，導致不論是在單核還是多核條件下，在同一時刻只能運行一個線程，導致Python多線程無法發揮多核並行的優勢。
GIL全稱爲GlobalInterpreterLock，中文翻譯爲全局解釋器鎖，其最初設計是出於數據安全而考慮的。在Python多線程下，每個線程的執行方式如下：

獲取 GIL
執行對應線程的代碼
釋放 GIL
可見，某個線程想要執行，必須先拿到GIL，我們可以把GIL看作是通行證，並且在一個Python進程中，GIL只有一個。拿不到通行證的線程，就不允許執行。
這樣就會導致，即使是多核條件下，一個 Python 進程下的多個線程，同一時刻也只能執行一個線程。

不過對於爬蟲這種 IO 密集型任務來說，這個問題影響並不大。而對於計算密集型任務來說，由於 GIL 的存在，多線程總體的運行效率相比可能反而比單線程更低。

Python 多線程基礎

官方參考文檔

Thread 直接創建子線程

規定主線程在子線程後退出

有關於 join

通過繼承的方式創建多線程

守護線程

互斥鎖

關於 Python 中的多線程

Win10 LTSC 2019 安裝後的一些步驟

推薦2款開源、美觀的WinForm UI控件庫

NET9 AspnetCore將整合OpenAPI的文檔生成功能而無需三方庫

在Linux下管理MySQL的大小寫敏感性

如何找出 post 請求的 url 以及相應的請求參數

Python爬蟲代理

金融知識: 北上資金以及南下資金

清理 docker 佔用從磁盤空間的常用命令

Python 時間相關

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結