線程的正常退出和資源回收

在最近開發的多線程程序中,觀察到一種現象,線程調用pthread_exit()退出後,進程的VSZ沒有減少,隨着這樣的線程增多,可以看到VSZ的值變得越來越大。

一開始以爲是程序那裏漏內存,查看了所有new的地方,沒有發現有漏內存的情況。

通過pmap分析,發現跟沒有線程退出情況的進程相比,會多出下面幾個內存塊,其他部分都沒有不同的地方。

pmap 19661

...................

00007f80eeb5c000      4K -----    [ anon ]
00007f80eeb5d000   8192K rwx--    [ anon ]

................................

gdb裏頭從這些地址裏頭看不到任何有意義的內容

通過valgrind也沒發現問題

valgrind --tool=memcheck --leak-check=full -v --track-origins=yes --log-file=val.log --track-fds=yes --time-stamp=yes --show-reachable=yes my_app

於是就懷疑是線程退出的時候沒有釋放資源

從網上查找資源看到chinaunix上面有些文章,關於thread的資源安全釋放的問題告誡如下:

如果線程是joinable的,主線程(或某個負責回收線程的線程)需要調用pthread_join()來回收線程

如果不想把回收線程阻塞住,而讓系統自動回收線程資源,即不調用pthread_join(),則線程必須是detached。

joinable和detached是通過pthread_attr_setdetachstate()來設置的。

由於我的回收線程還需要處理別的事務不能長時間阻塞住,並且通過打印pthread_join()前後的時間差發現即使線程已經退出,pthread_join()仍然可能會等上5秒鐘,

所以最後採用的是pthread_exit() + detached的方法,而不是pthread_exit() + pthread_join().


回頭再來看看爲什麼是4k和8M

首先下載glibc,在nptl目錄下面能找到pthread_create.c

__pthread_create_2_1()->ALLOCATE_STACK()->allocate_stack()

/* Allocate some anonymous memory.  If possible use the cache.  */

-->get_cached_stack()

gdb attach應用程序

(gdb) p stack_cache       ===========>這裏得確保能讀到libc的符號表
$3 = {next = 0x7f80edb599c0, prev = 0x7f80ed3589c0}

(gdb) p sizeof(struct pthread)
$6 = 2304               ==================>4k

(gdb) p *(struct pthread *)0x7f80eeb5b700
$9 = {{header = {tcb = 0x7f80eeb5b700, dtv = 0x1ff1190, self = 0x7f80eeb5b700, multiple_threads = 1, gscope_flag = 0, sysinfo = 0, stack_guard = 16092494444486863360, pointer_guard = 1023798611218601545, 
      vgetcpu_cache = {0, 0}, private_futex = 128, rtld_must_xmm_save = 0, __private_tm = {0x0, 0x0, 0x0, 0x0, 0x0}, __unused2 = 0, rtld_savespace_sse = {{{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 
            0, 0}}, {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {{0, 0, 
            0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {{0, 0, 0, 0}, {0, 0, 0, 
            0}, {0, 0, 0, 0}, {0, 0, 0, 0}}}, __padding = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, __padding = {0x7f80eeb5b700, 0x1ff1190, 0x7f80eeb5b700, 0x1, 0x0, 0xdf54066f81962a00, 
      0xe3543699f391649, 0x0, 0x0, 0x80, 0x0 <repeats 14 times>}}, list = {next = 0x7f80efb5d9c0, prev = 0x7f8106d82230}, tid = 8395, pid = 19661, robust_prev = 0x7f80eeb5b9e0, robust_head = {
    list = 0x7f80eeb5b9e0, futex_offset = -32, list_op_pending = 0x0}, cleanup = 0x0, cleanup_jmp_buf = 0x7f80eeb5af30, cancelhandling = 2, flags = 0, specific_1stblock = {{seq = 1, data = 0x11bbbc0}, {
      seq = 0, data = 0x0} <repeats 31 times>}, specific = {0x7f80eeb5ba10, 0x0 <repeats 31 times>}, specific_used = true, report_events = true, user_stack = false, stopped_start = false, 
  parent_cancelhandling = 0, lock = 0, setxid_futex = 0, cpuclock_offset = 2940773369682629, joinid = 0x0, result = 0x0, schedparam = {__sched_priority = 0}, schedpolicy = 0, 
  start_routine = 0x69e32f <base::Thread::ThreadRunner(void*)>, arg = 0x11fa5b0, eventbuf = {eventmask = {event_bits = {0, 0}}, eventnum = TD_ALL_EVENTS, eventdata = 0x0}, nextevent = 0x0, exc = {
    exception_class = 0, exception_cleanup = 0, private_1 = 0, private_2 = 0}, stackblock = 0x7f80ee35b000, stackblock_size = 8392704, guardsize = 4096, reported_guardsize = 4096, tpp = 0x0, res = {
    retrans = 0, retry = 0, options = 0, nscount = 0, nsaddr_list = {{sin_family = 0, sin_port = 0, sin_addr = {s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}, {sin_family = 0, sin_port = 0, 
        sin_addr = {s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}, {sin_family = 0, sin_port = 0, sin_addr = {s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}}, id = 0, dnsrch = {0x0, 
      0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, defdname = '\000' <repeats 255 times>, pfcode = 0, ndots = 0, nsort = 0, ipv6_unavail = 0, unused = 0, sort_list = {{addr = {s_addr = 0}, mask = 0}, {addr = {
          s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {
        addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}}, qhook = 0, rhook = 0, res_h_errno = 0, _vcsock = 0, _flags = 0, _u = {
      pad = '\000' <repeats 51 times>, _ext = {nscount = 0, nsmap = {0, 0, 0}, nssocks = {0, 0, 0}, nscount6 = 0, nsinit = 0, nsaddrs = {0x0, 0x0, 0x0}, initstamp = 0}}}, end_padding = 0x7f80eeb5b700 ""}
(gdb) p ((struct pthread *)0x7f80eeb5b700)->stackblock_size
$10 = 8392704           ============>8M

對照pmap裏頭dump出的信息,可以看出4k是thread控制塊的大小(之所以是4k估計是頁大小對其的結果),8M是thread棧的大小

而從地址的特點來看,所有棧都是在stack_cache裏頭分配的,這是一塊預分配的內存,所以各個棧的地址是連續的。

這些地址在哪裏釋放呢?我們來看看pthread_join()函數

pthread_join()->__free_tcb()->__deallocate_stack()

這就是某些情況需要顯式地調用pthread_join()的原因


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章