Python：線程之定位與銷燬

背景

開工前我就覺得有什麼不太對勁，感覺要背鍋。這可不，上班第三天就捅鍋了。

我們有個了不起的後臺程序，可以動態加載模塊，並以線程方式運行，通過這種形式實現插件的功能。而模塊更新時候，後臺程序自身不會退出，只會將模塊對應的線程關閉、更新代碼再啓動，6 得不行。

於是乎我就寫了個模塊準備大展身手，結果忘記寫退出函數了，導致每次更新模塊都新創建一個線程，除非重啓那個程序，否則那些線程就一直苟活着。

這可不行啊，得想個辦法清理呀，要不然怕是要炸了。

那麼怎麼清理呢？我能想到的就是兩步走：

找出需要清理的線程號 tid；
銷燬它們；

找出線程ID

和平時的故障排查相似，先通過 ps 命令看看目標進程的線程情況，因爲已經是 setName 設置過線程名，所以正常來說應該是看到對應的線程的。直接用下面代碼來模擬這個線程：

Python 版本的多線程

#coding: utf8
import threading
import os
import time

def tt():
    info = threading.currentThread()
    while True:
        print 'pid: ', os.getpid()
        print info.name, info.ident
        time.sleep(3)

t1 = threading.Thread(target=tt)
t1.setName('OOOOOPPPPP')
t1.setDaemon(True)
t1.start()

t2 = threading.Thread(target=tt)
t2.setName('EEEEEEEEE')
t2.setDaemon(True)
t2.start()


t1.join()
t2.join()

輸出:

root@10-46-33-56:~# python t.py
pid:  5613
OOOOOPPPPP 139693508122368
pid:  5613
EEEEEEEEE 139693497632512
...

可以看到在 Python 裏面輸出的線程名就是我們設置的那樣，然而 Ps 的結果卻是令我懷疑人生：

root@10-46-33-56:~# ps -Tp 5613
  PID  SPID TTY          TIME CMD
 5613  5613 pts/2    00:00:00 python
 5613  5614 pts/2    00:00:00 python
 5613  5615 pts/2    00:00:00 python

正常來說不該是這樣呀，我有點迷了，難道我一直都是記錯了？用別的語言版本的多線程來測試下：

C 版本的多線程

#include<stdio.h>
#include<sys/syscall.h>
#include<sys/prctl.h>
#include<pthread.h>

void *test(void *name)
{    
    pid_t pid, tid;
    pid = getpid();
    tid = syscall(__NR_gettid);
    char *tname = (char *)name;
    
    // 設置線程名字
    prctl(PR_SET_NAME, tname);
    
    while(1)
    {
        printf("pid: %d, thread_id: %u, t_name: %s\n", pid, tid, tname);
        sleep(3);
    }
}

int main()
{
    pthread_t t1, t2;
    void *ret;
    pthread_create(&t1, NULL, test,  (void *)"Love_test_1");
    pthread_create(&t2, NULL, test,  (void *)"Love_test_2");
    pthread_join(t1, &ret);
    pthread_join(t2, &ret);
}

輸出：

root@10-46-33-56:~# gcc t.c -lpthread && ./a.out
pid: 5575, thread_id: 5577, t_name: Love_test_2
pid: 5575, thread_id: 5576, t_name: Love_test_1
pid: 5575, thread_id: 5577, t_name: Love_test_2
pid: 5575, thread_id: 5576, t_name: Love_test_1
...

用 PS 命令再次驗證：

root@10-46-33-56:~# ps -Tp 5575
  PID  SPID TTY          TIME CMD
 5575  5575 pts/2    00:00:00 a.out
 5575  5576 pts/2    00:00:00 Love_test_1
 5575  5577 pts/2    00:00:00 Love_test_2

這個纔是正確嘛，線程名確實是可以通過 Ps 看出來的嘛！

不過爲啥 Python 那個看不到呢？既然是通過 setName 設置線程名的，那就看看定義咯：

[threading.py]
class Thread(_Verbose):
    ...
    @property
    def name(self):
        """A string used for identification purposes only.

        It has no semantics. Multiple threads may be given the same name. The
        initial name is set by the constructor.

        """
        assert self.__initialized, "Thread.__init__() not called"
        return self.__name
    def setName(self, name):
        self.name = name
    ...

看到這裏其實只是在 Thread 對象的屬性設置了而已，並沒有動到根本，那肯定就是看不到咯~

這樣看起來，我們已經沒辦法通過 ps 或者 /proc/ 這類手段在外部搜索 python 線程名了，所以我們只能在 Python 內部來解決。

於是問題就變成了，怎樣在 Python 內部拿到所有正在運行的線程呢？

threading.enumerate 可以完美解決這個問題！Why?

Because 在下面這個函數的 doc 裏面說得很清楚了，返回所有活躍的線程對象，不包括終止和未啓動的。

[threading.py]

def enumerate():
    """Return a list of all Thread objects currently alive.

    The list includes daemonic threads, dummy thread objects created by
    current_thread(), and the main thread. It excludes terminated threads and
    threads that have not yet been started.

    """
    with _active_limbo_lock:
        return _active.values() + _limbo.values()

因爲拿到的是 Thread 的對象，所以我們通過這個能到該線程相關的信息！

請看完整代碼示例：

#coding: utf8

import threading
import os
import time


def get_thread():
    pid = os.getpid()
    while True:
        ts = threading.enumerate()
        print '------- Running threads On Pid: %d -------' % pid
        for t in ts:
            print t.name, t.ident
        print
        time.sleep(1)
        
def tt():
    info = threading.currentThread()
    pid = os.getpid()
    while True:
        print 'pid: {}, tid: {}, tname: {}'.format(pid, info.name, info.ident)
        time.sleep(3)
        return

t1 = threading.Thread(target=tt)
t1.setName('Thread-test1')
t1.setDaemon(True)
t1.start()

t2 = threading.Thread(target=tt)
t2.setName('Thread-test2')
t2.setDaemon(True)
t2.start()

t3 = threading.Thread(target=get_thread)
t3.setName('Checker')
t3.setDaemon(True)
t3.start()

t1.join()
t2.join()
t3.join()

輸出：

root@10-46-33-56:~# python t_show.py
pid: 6258, tid: Thread-test1, tname: 139907597162240
pid: 6258, tid: Thread-test2, tname: 139907586672384

------- Running threads On Pid: 6258 -------
MainThread 139907616806656
Thread-test1 139907597162240
Checker 139907576182528
Thread-test2 139907586672384

------- Running threads On Pid: 6258 -------
MainThread 139907616806656
Thread-test1 139907597162240
Checker 139907576182528
Thread-test2 139907586672384

------- Running threads On Pid: 6258 -------
MainThread 139907616806656
Thread-test1 139907597162240
Checker 139907576182528
Thread-test2 139907586672384

------- Running threads On Pid: 6258 -------
MainThread 139907616806656
Checker 139907576182528
...

代碼看起來有點長，但是邏輯相當簡單，Thread-test1 和 Thread-test2 都是打印出當前的 pid、線程 id 和線程名字，然後 3s 後退出，這個是想模擬線程正常退出。

而 Checker 線程則是每秒通過 threading.enumerate 輸出當前進程內所有活躍的線程。

可以明顯看到一開始是可以看到 Thread-test1 和 Thread-test2的信息，當它倆退出之後就只剩下 MainThread 和 Checker 自身而已了。

銷燬指定線程

既然能拿到名字和線程 id，那我們也就能幹掉指定的線程了！

假設現在 Thread-test2 已經黑化，發瘋了，我們需要制止它，那我們就可以通過這種方式解決了：

在上面的代碼基礎上，增加和補上下列代碼：

def _async_raise(tid, exctype):
    """raises the exception, performs cleanup if needed"""
    tid = ctypes.c_long(tid)
    if not inspect.isclass(exctype):
        exctype = type(exctype)
    res = ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, ctypes.py_object(exctype))
    if res == 0:
        raise ValueError("invalid thread id")
    elif res != 1:
        ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, None)
        raise SystemError("PyThreadState_SetAsyncExc failed")

def stop_thread(thread):
    _async_raise(thread.ident, SystemExit)

def get_thread():
    pid = os.getpid()
    while True:
        ts = threading.enumerate()
        print '------- Running threads On Pid: %d -------' % pid
        for t in ts:
            print t.name, t.ident, t.is_alive()
            if t.name == 'Thread-test2':
                print 'I am go dying! Please take care of yourself and drink more hot water!'
                stop_thread(t)
        print
        time.sleep(1)

輸出

root@10-46-33-56:~# python t_show.py
pid: 6362, tid: 139901682108160, tname: Thread-test1
pid: 6362, tid: 139901671618304, tname: Thread-test2
------- Running threads On Pid: 6362 -------
MainThread 139901706389248 True
Thread-test1 139901682108160 True
Checker 139901661128448 True
Thread-test2 139901671618304 True
Thread-test2: I am go dying. Please take care of yourself and drink more hot water!

------- Running threads On Pid: 6362 -------
MainThread 139901706389248 True
Thread-test1 139901682108160 True
Checker 139901661128448 True
Thread-test2 139901671618304 True
Thread-test2: I am go dying. Please take care of yourself and drink more hot water!

pid: 6362, tid: 139901682108160, tname: Thread-test1
------- Running threads On Pid: 6362 -------
MainThread 139901706389248 True
Thread-test1 139901682108160 True
Checker 139901661128448 True
// Thread-test2 已經不在了

一頓操作下來，雖然我們這樣對待 Thread-test2，但它還是關心着我們：多喝熱水，

PS: 熱水雖好，八杯足矣，請勿貪杯哦。

書回正傳，上述的方法是極爲粗暴的，爲什麼這麼說呢？

因爲它的原理是：利用 Python 內置的 API，觸發指定線程的異常，讓其可以自動退出；

爲什麼停止線程這麼難

多線程本身設計就是在進程下的協作併發，是調度的最小單元，線程間分食着進程的資源，所以會有許多鎖機制和狀態控制。

如果使用強制手段幹掉線程，那麼很大機率出現意想不到的bug。而且最重要的鎖資源釋放可能也會出現意想不到問題。

我們甚至也無法通過信號殺死進程那樣直接殺線程，因爲 kill 只有對付進程才能達到我們的預期，而對付線程明顯不可以，不管殺哪個線程，整個進程都會退出！

而因爲有 GIL，使得很多童鞋都覺得 Python 的線程是Python 自行實現出來的，並非實際存在，Python 應該可以直接銷燬吧？

然而事實上 Python 的線程都是貨真價實的線程！

什麼意思呢？Python 的線程是操作系統通過 pthread 創建的原生線程。Python 只是通過 GIL 來約束這些線程，來決定什麼時候開始調度，比方說運行了多少個指令就交出 GIL，至於誰奪得花魁，得聽操作系統的。

如果是單純的線程，其實系統是有辦法終止的，比如: pthread_exit,pthread_kill 或 pthread_cancel，詳情可看：https://www.cnblogs.com/Creat...

很可惜的是： Python 層面並沒有這些方法的封裝！我的天，好氣！可能人家覺得，線程就該溫柔對待吧。

如何溫柔退出線程

想要溫柔退出線程，其實差不多就是一句廢話了~

要麼運行完退出，要麼設置標誌位，時常檢查標記位，該退出的就退出咯。

擴展

《如何正確的終止正在運行的子線程》：https://www.cnblogs.com/Creat...
《不要粗暴的銷燬python線程》：http://xiaorui.cc/2017/02/22/...

Python：線程之定位與銷燬

背景

找出線程ID

銷燬指定線程

爲什麼停止線程這麼難

如何溫柔退出線程

擴展

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

遊記：雲南之旅

Python：線程爲什麼搞個setDaemon

遊記：泰國之旅

Linux：netstat 面試答疑

Python：線程之定位與銷燬

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結