Android中native進程內存泄露的調試技巧（一）

基於Android5.0版本

Android爲Java程序提供了方便的內存泄露信息和工具（如MAT），便於查找。但是，對於純粹C/C++ 編寫的natvie進程，卻不那麼容易查找內存泄露。傳統的C/C＋＋程序可以使用valgrind工具，也可以使用某些代碼檢查工具。幸運的是，Google的bionic庫爲我們查找內存泄露提供了一個非常棒的API－－get_malloc_leak_info。利用它，我們很容易通過得到backtrace的方式找到涉嫌內存泄露的地方。

代碼原理分析

我們可以使用adb shell setprop libc.debug.malloc 1來設置內存的調試等級（debug_level），更詳細的等級解釋見文件bionic/libc/bionic/malloc_debug_common.cpp中的註釋：

// Handle to shared library where actual memory allocation is implemented.
// This library is loaded and memory allocation calls are redirected there
// when libc.debug.malloc environment variable contains value other than
// zero:
// 1  - For memory leak detections.
// 5  - For filling allocated / freed memory with patterns defined by
//      CHK_SENTINEL_VALUE, and CHK_FILL_FREE macros.
// 10 - For adding pre-, and post- allocation stubs in order to detect
//      buffer overruns.
// Note that emulator's memory allocation instrumentation is not controlled by
// libc.debug.malloc value, but rather by emulator, started with -memcheck
// option. Note also, that if emulator has started with -memcheck option,
// emulator's instrumented memory allocation will take over value saved in
// libc.debug.malloc. In other words, if emulator has started with -memcheck
// option, libc.debug.malloc value is ignored.
// Actual functionality for debug levels 1-10 is implemented in
// libc_malloc_debug_leak.so, while functionality for emulator's instrumented
// allocations is implemented in libc_malloc_debug_qemu.so and can be run inside
// the emulator only.

get_malloc_leak_info()函數也位於malloc_debug_common.cpp文件中，若探究其實現，請自行查看源碼。

對於不同的內存調試等級（debug_level），malloc_dispatch_table將指向不同的內存分配管理函數。這樣，內存的分配和釋放，在不同的的調試等級下，將使用不同的函數版本。

詳細的代碼過程如下：

// Initializes memory allocation framework.
// This routine is called from __libc_init routines implemented
// in libc_init_static.c and libc_init_dynamic.c files.
extern "C" __LIBC_HIDDEN__ void malloc_debug_init() {
#if !defined(LIBC_STATIC)
  static pthread_once_t malloc_init_once_ctl = PTHREAD_ONCE_INIT;
  if (pthread_once(&malloc_init_once_ctl, malloc_init_impl)) {
    error_log("Unable to initialize malloc_debug component.");
  }
#endif  // !LIBC_STATIC
}

如代碼註釋所說，__libc_init()例程中（位於libc_init_static.c和libc_init_dynamic.c文件中）會調用malloc_debug_init進行初始化，進而調用malloc_init_impl（在一個進程中，使用pthread_once保證其只被執行一次）

在malloc_init_impl()例程中，先打開so庫，再從so庫中解析出malloc_debug_initialize符號，然後執行它。當debug_level爲1/5/10時，將會打開libc_malloc_debug_leak.so庫文件，malloc_debug_initialize()函數的實現在malloc_debug_check.cpp文件中；當debug_level爲20時，將會打開libc_malloc_debug_qemu.so庫文件，malloc_debug_initialize()函數的實現在malloc_debug_qemu.cpp文件中。

接着，針對不同的debug_level，解析出不同的內存操作函數malloc/free/calloc/realloc/memalign實現。對於debug_level等級1、5、10的情況，malloc/free/calloc/realloc/memalign各種版本的實現位於文件bionic/libc/bionic/malloc_debug_leak.cpp和malloc_debug_check.cpp中。

當debug_level爲1調試memory leak時，其實現是打出backtrace：

leak_malloc()函數實現如下

extern "C" void* leak_malloc(size_t bytes) {
    if (DebugCallsDisabled()) {
        return g_malloc_dispatch->malloc(bytes);
    }

    // allocate enough space infront of the allocation to store the pointer for
    // the alloc structure. This will making free'ing the structer really fast!

    // 1. allocate enough memory and include our header
    // 2. set the base pointer to be right after our header

    size_t size = bytes + sizeof(AllocationEntry);
    if (size < bytes) { // Overflow.
        errno = ENOMEM;
        return NULL;
    }

    void* base = g_malloc_dispatch->malloc(size);
    if (base != NULL) {
        ScopedPthreadMutexLocker locker(&g_hash_table->lock);

        uintptr_t backtrace[BACKTRACE_SIZE];
        size_t numEntries = GET_BACKTRACE(backtrace, BACKTRACE_SIZE);

        AllocationEntry* header = reinterpret_cast<AllocationEntry*>(base);
        header->entry = record_backtrace(backtrace, numEntries, bytes);
        header->guard = GUARD;

        // now increment base to point to after our header.
        // this should just work since our header is 8 bytes.
        base = reinterpret_cast<AllocationEntry*>(base) + 1;
    }

    return base;
}

extern bool g_backtrace_enabled;

#define GET_BACKTRACE(bt, depth) \
  (g_backtrace_enabled ? get_backtrace(bt, depth) : 0)

該malloc函數在實際分配的bytes字節前額外分配了一塊數據用作AllocationEntry。在分配內存成功後，分配了一個擁有32個元素的指針數組，用於存放調用堆棧指針，調用宏函數GET_BACKTRACE將調用堆棧保存起來，也就是將各函數指針保存到數組backtrace中；然後使用record_backtrace記錄下該調用堆棧，然後讓AllocationEntry的entry成員指向它。函數record_backtrace會通過hash值在全局調用堆棧表gHashTable裏查找。若沒找到，則創建一項調用堆棧信息，將其加入到全局表中。最後，將base所指向的地方往後移一下，然後它，就是分配的內存地址。

可見，該版本的malloc函數額外記錄了調用堆棧的信息。通過在分配的內存塊前加一個頭的方式，保存瞭如何查詢hash表調用堆棧信息的entry。

再來看一下record_backtrace函數，在分析其代碼之前，看一下結構體（文件malloc_debug_common.h）：

#define HASHTABLE_SIZE      1543

// =============================================================================
// Structures
// =============================================================================

struct HashEntry {
    size_t slot;
    HashEntry* prev;
    HashEntry* next;
    size_t numEntries;
    // fields above "size" are NOT sent to the host
    size_t size;
    size_t allocations;
    uintptr_t backtrace[0];
};

struct HashTable {
    pthread_mutex_t lock;
    size_t count;
    HashEntry* slots[HASHTABLE_SIZE];
};

在一個進程中，有一個全局的變量gHashTable，用於記錄誰最終調用了malloc分配內存的調用堆棧列表。gHashTable的類型是HashTable，其有一個指針，這個指針指向一個slots數組，該數組的最大容量是1543；數組中有多少有效的值由另一個成員count記錄。可以通過backtrace和 numEntries得到hash值，再與HASHTABLE_SIZE整除得到HashEntry在該數組中的索引，這樣就可以根據自身信息根據hash，快速得到在數組中的索引。

另一個結構體是HashEntry，因其成員存在指向前後的指針，所以它也是個鏈表，hash值相同將添加到鏈表的後面。HashEntry第一個成員slot就是自身在數組中的索引，亦即由hash運算而來；最後一項即調用堆棧backtrace[0]，裏面是函數指針，這個數組具體有多少項則由另一個成員numEntries記錄；size表示該次分配的內存的大小；allocations是分配次數，即有多少次同一調用路徑。

這兩個數據結構關係可由下圖表示：

在leak_malloc中調用record_backtrace記錄堆棧信息時，先由backtrace和numEntries得到hash值，再整除運算後得到在gHashTable中的數組索引；接着檢查是否已經存在該項，即有沒有分配了相同內存大小、同一調用路徑、記錄了相當數量的函數指針的HashEntry。若有，則直接在原有項上的allocations加1，沒有則創建新項：爲HashEntry結構體分配內存，然後調用堆棧信息複製給HashEntry最後的一個成員backtrace。最後，還要爲整個表格增加計數。

這樣record_backtrace函數完成了向全局表中添加backtrace信息的任務：要麼新增加一項HashEntry，要麼增加索引。

static HashEntry* record_backtrace(uintptr_t* backtrace, size_t numEntries, size_t size) {
    size_t hash = get_hash(backtrace, numEntries);
    size_t slot = hash % HASHTABLE_SIZE;

    if (size & SIZE_FLAG_MASK) {
        debug_log("malloc_debug: allocation %zx exceeds bit width\n", size);
        abort();
    }

    if (gMallocLeakZygoteChild) {
        size |= SIZE_FLAG_ZYGOTE_CHILD;
    }

    HashEntry* entry = find_entry(g_hash_table, slot, backtrace, numEntries, size);

    if (entry != NULL) {
        entry->allocations++;
    } else {
        // create a new entry
        entry = static_cast<HashEntry*>(g_malloc_dispatch->malloc(sizeof(HashEntry) + numEntries*sizeof(uintptr_t)));
        if (!entry) {
            return NULL;
        }
        entry->allocations = 1;
        entry->slot = slot;
        entry->prev = NULL;
        entry->next = g_hash_table->slots[slot];
        entry->numEntries = numEntries;
        entry->size = size;

        memcpy(entry->backtrace, backtrace, numEntries * sizeof(uintptr_t));

        g_hash_table->slots[slot] = entry;

        if (entry->next != NULL) {
            entry->next->prev = entry;
        }

        // we just added an entry, increase the size of the hashtable
        g_hash_table->count++;
    }

    return entry;
}

而在leak_free()函數中會釋放上述全局hash表中的堆棧項

extern "C" void leak_free(void* mem) {
  if (DebugCallsDisabled()) {
    return g_malloc_dispatch->free(mem);
  }

  if (mem == NULL) {
    return;
  }

  ScopedPthreadMutexLocker locker(&g_hash_table->lock);

  // check the guard to make sure it is valid
  AllocationEntry* header = to_header(mem);

  if (header->guard != GUARD) {
    // could be a memaligned block
    if (header->guard == MEMALIGN_GUARD) {
      // For memaligned blocks, header->entry points to the memory
      // allocated through leak_malloc.
      header = to_header(header->entry);
    }
  }

  if (header->guard == GUARD || is_valid_entry(header->entry)) {
    // decrement the allocations
    HashEntry* entry = header->entry;
    entry->allocations--;
    if (entry->allocations <= 0) {
      remove_entry(entry);
      g_malloc_dispatch->free(entry);
    }

    // now free the memory!
    g_malloc_dispatch->free(header);
  } else {
    debug_log("WARNING bad header guard: '0x%x'! and invalid entry: %p\n",
              header->guard, header->entry);
  }
}

該函數傳入的參數是調用malloc()函數返回的內存地址，首先檢查mem是否爲NULL，若爲NULL，直接返回，該函數什麼也沒幹，若不爲空，取出AllocationEntry結構體，進而得到類型爲HashEntry*的變量entry。接下來，先對成員allocations減一操作，若該引用計數小於等於0，則從hash表中移除，並釋放entry佔用的內存空間。最後，不管成員allocations的值是多少，都會釋放由malloc()分配的內存空間。

因此，在全局表中剩下的未被釋放的項，就是分配了內存但未被釋放的調用了malloc的調用堆棧。

那麼，如何獲取一個進程malloc的分配情況呢？接下來，就看一下bionic庫提供的API - get_malloc_leak_info()函數，該函數用於獲取內存泄露信息。在分配內存時，記錄下調用堆棧，在釋放時清除它們。這樣，剩下的就很有可能是產生內存泄露的地方。

// Retrieve native heap information.
//
// "*info" is set to a buffer we allocate
// "*overallSize" is set to the size of the "info" buffer
// "*infoSize" is set to the size of a single entry
// "*totalMemory" is set to the sum of all allocations we're tracking; does
//   not include heap overhead
// "*backtraceSize" is set to the maximum number of entries in the back trace

// =============================================================================
// Exported for use by ddms.
// =============================================================================
extern "C" void get_malloc_leak_info(uint8_t** info, size_t* overallSize,
    size_t* infoSize, size_t* totalMemory, size_t* backtraceSize) {
  // Don't do anything if we have invalid arguments.
  if (info == NULL || overallSize == NULL || infoSize == NULL ||
    totalMemory == NULL || backtraceSize == NULL) {
    return;
  }
  *totalMemory = 0;

  ScopedPthreadMutexLocker locker(&g_hash_table.lock);
  if (g_hash_table.count == 0) {
    *info = NULL;
    *overallSize = 0;
    *infoSize = 0;
    *backtraceSize = 0;
    return;
  }

  HashEntry** list = static_cast<HashEntry**>(Malloc(malloc)(sizeof(void*) * g_hash_table.count));

  // Get the entries into an array to be sorted.
  size_t index = 0;
  for (size_t i = 0 ; i < HASHTABLE_SIZE ; ++i) {
    HashEntry* entry = g_hash_table.slots[i];
    while (entry != NULL) {
      list[index] = entry;
      *totalMemory = *totalMemory + ((entry->size & ~SIZE_FLAG_MASK) * entry->allocations);
      index++;
      entry = entry->next;
    }
  }

  // XXX: the protocol doesn't allow variable size for the stack trace (yet)
  *infoSize = (sizeof(size_t) * 2) + (sizeof(uintptr_t) * BACKTRACE_SIZE);
  *overallSize = *infoSize * g_hash_table.count;
  *backtraceSize = BACKTRACE_SIZE;

  // now get a byte array big enough for this
  *info = static_cast<uint8_t*>(Malloc(malloc)(*overallSize));
  if (*info == NULL) {
    *overallSize = 0;
    Malloc(free)(list);
    return;
  }

  qsort(list, g_hash_table.count, sizeof(void*), hash_entry_compare);

  uint8_t* head = *info;
  const size_t count = g_hash_table.count;
  for (size_t i = 0 ; i < count ; ++i) {
    HashEntry* entry = list[i];
    size_t entrySize = (sizeof(size_t) * 2) + (sizeof(uintptr_t) * entry->numEntries);
    if (entrySize < *infoSize) {
      // We're writing less than a full entry, clear out the rest.
      memset(head + entrySize, 0, *infoSize - entrySize);
    } else {
      // Make sure the amount we're copying doesn't exceed the limit.
      entrySize = *infoSize;
    }
    memcpy(head, &(entry->size), entrySize);
    head += *infoSize;
  }

  Malloc(free)(list);
}

函數get_malloc_leak_info()一共接收5個參數，用於存放各種變量的地址，調用結束後，這些變量將得到修改。如其代碼註釋所說：
*info將指向在該函數中分配的整塊內存，這些內存空間大小爲overallSize；
整個空間若干小項組成，每項的大小爲infoSize，這個小項的數據結構等同於HashEntry中自size成員開始的結構，即第一個成員是malloc分配的內存大小size，第二個成員是記錄的分配次數allocations，即多次有着相同調用堆棧的計數，最後一項是backtrace，共32（BACKTRACE_SIZE）個指針值的空間。因此，*info指向的大內存塊包含了共有overallSize/infoSize個小項。注意HashEntry中backtrace數組是按實際數量分配的，而此處則統一按32個分配空間，若不到32個，則後面的值置0；
totalMemory是malloc分配的所有內存的大小；

最後一個參數是backtraceSize，即32（BACKTRACE_SIZE）

該函數首先檢查傳遞進來的參數的合法性，以及全局堆棧中是否有堆棧項。接着，查看全局堆棧表中有多少項，然後分配內存，構建數組list，用於保存指針，這些指針用於指向gHashTable中所有的HashEntry項，在遍歷全局堆棧哈希表時，對數組list進行賦值，並順便計算出已分配的但未釋放的內存空間大小totalMemory（用於返回給調用者）。然後，對參數infoSize，overallSize，backtraceSize進行賦值，併爲info分配大小爲overallSize的內存空間。目前，list中保存的是所有的HashEntry項，先對list排序，接着，遍歷數組list，把HashEntry中的size,allocations,backtraces[32]拷貝到info指向的內存中。info用於返回給調用者，至此，通過調用get_malloc_leak_info()函數，就可以得到進程的內存malloc堆棧。與其對應的還有一個get_malloc_leak_free()函數，用於釋放info指向的內存空間。

總結

當程序運行結束時，一般來說，內存都應該釋放，這時我們可以調用get_malloc_leak_info獲取未被釋放的調用堆棧項。原理上，這些就是內存泄露的地方。但實際情況可能是，在我們運行get_malloc_leak_info時，某些內存應該保留還不應該釋放。
另外，我們有時要檢查的進程是守護進程，不會退出。所以有些內存應該一直保持下去，不被釋放。這時，我們可以選擇某個狀態的一個時刻來查看未釋放的內存，比如在剛進入時的idle狀態時的一個時刻，使用get_malloc_leak_info獲取未釋放的內存信息，然後在程序執行某些操作結束後返回Idle狀態時，再次使用get_malloc_leak_info獲取未釋放的內存信息。兩種信息對比，新多出來的調用堆棧項，就存在涉嫌內存泄露。

斐然成章

發佈了65 篇原創文章 · 獲贊 58 · 訪問量 33萬+

私信關注

Android中native進程內存泄露的調試技巧（一）

測試人員都是畫畫大神，讓我看看誰還不會用代碼圖？

Object.values()對象遍歷

我拍了拍Redis，被移出了羣聊···

網絡現代化通向雲原生應用的高速公路

面試官：說說你對序列化的理解

深入理解軟件包的配置、編譯與安裝【轉】

Special Shell Variables

Makefile和Makefile.sh在同一級目錄下引發的問題

Building System之編譯前的準備工作

Valgrind —— Android使用摘要

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結