使用 GDB 查看 Jemalloc 內存佈局

學習中主要參考 @杏林小軒 的 Jemalloc系列:

jemalloc 3.6.0源碼詳解—[0]基礎知識

jemalloc 3.6.0源碼詳解—[1]Arena

jemalloc 3.6.0源碼詳解—[2]Chunk

jemalloc 3.6.0源碼詳解—[3]Run and bins

jemalloc 3.6.0源碼詳解—[4]Thread caches

jemalloc 3.6.0源碼詳解—[5]分配及實現

jemalloc 3.6.0源碼詳解—[6]釋放及實現

這裏借用部分內容,便於複習;本文補充在 Android設備上查看 Jemalloc 的內存佈局.

1.概念

 jemalloc對內存劃分按照如下由高到低的順序:

  1. 內存是由一定數量的arenas進行管理.
  2. 一個arena被分割成若干chunks, 後者主要負責記錄bookkeeping(記錄信息).
  3. chunk內部又包含着若干runs, 作爲分配小塊內存的基本單元.
  4. run由pages組成, 最終被劃分成一定數量的regions,
  5. 對於small size的分配請求來說, 這些region就相當於user memory.



2.使用shadow查看Jemalloc內存

shadow使用及配置:Shadow

2.1 查看 arenas

arena 指針: (arena_t *)
對應 struct:struct arena_s {
(gdb) jearenas 
index    address         bins    chunks    threads
------------------------------------------------------
0        0x7623a00140    36      17               
                                                  
1        0x7623a8fc00    36      4 
可以看到,總共有兩個 arena,每個 arena有36個 bin,但chunk數量不同,兩個 arena共有 21個chunk;
需要注意的是:arena本來是 cpu core數目的 4倍,而 Android 設置了最多有 2 個 arena;

實際上,arena 指針是存儲在  je_arenas 數組中,shadow的 jearenas就是從這個數據讀取的 arena:
(gdb) p je_arenas
$1 = (arena_t **) 0x7623a87c00
(gdb) p *je_arenas@2
$2 = {0x7623a00140, 0x7623a8fc00}
從上面 jearenas的打印,我們知道兩個 arena 的指針分別是:
{0x7623a00140, 0x7623a8fc00},對應的類型是 (arena_t *);
可以通過如下命令查看 arena具體數據:
(gdb) p *((arena_t *)0x7623a00140)
$4 = {
  ind = 0, 
  nthreads = {10, 10}, 
  lock = {
    lock = {
      __private = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
    }, 
    witness = {
      name = 0x0, 
      rank = 0, 
      comp = 0x0, 
      link = {
        qre_next = 0x0, 
        qre_prev = 0x0
      }
    }
  }, 
  stats = {
  .....

2.2 查看 chunks

chunk指針 :(arena_chunk_t*)
struct arena_chunk_s {
	/*
	 * A pointer to the arena that owns the chunk is stored within the node.
	 * This field as a whole is used by chunks_rtree to support both
	 * ivsalloc() and core-based debugging.
	 */
	extent_node_t		node;

	/*
	 * True if memory could be backed by transparent huge pages.  This is
	 * only directly relevant to Linux, since it is the only supported
	 * platform on which jemalloc interacts with explicit transparent huge
	 * page controls.
	 */
	bool			hugepage;

	/*
	 * Map of pages within chunk that keeps track of free/large/small.  The
	 * first map_bias entries are omitted, since the chunk header does not
	 * need to be tracked in the map.  This omission saves a header page
	 * for common chunk sizes (e.g. 4 MiB).
	 */
	arena_chunk_map_bits_t	map_bits[1]; /* Dynamically sized. */
};
chunk內存佈局:

查看chunks:

(gdb) jechunks 
addr            arena           no_runs
-------------------------------------------
0x7604800000    0x7623a00140    4      
0x7604a00000    0x7623a00140    13     
0x7605600000    0x7623a00140    58     
0x7605800000    0x7623a8fc00    12     
0x7605a00000    0x7623a00140    99     
0x7606000000    0x7623a00140    42     
0x7606200000    0x7623a00140    73     
0x7607800000    0x7623a8fc00    103    
0x7607a00000    0x7623a00140    69     
0x7607c00000    0x7623a8fc00    53     
0x7608400000    0x7623a00140    91     
0x7613400000    0x7623a00140    94     
0x7613600000    0x7623a00140    37     
0x7613800000    0x7623a00140    31     
0x7613a00000    0x7623a00140    23     
0x7613c00000    0x7623a00140    33     
0x7613e00000    0x7623a00140    22     
0x7614000000    0x7623a00140    7      
0x7615e00000    0x7623a00140    171    
0x7618a00000    0x7623a8fc00    158    
0x7623800000    0x7623a00140    174 
可以看到共有 21 chunk;與上面 jearenas統計的數據符合;
這個數據除了指明各個 chunk指針 (arena_chunk_t),還統計了每個 chunk中的 runs數量;
查看一個chunk:
(gdb) p *(arena_chunk_t*)0x7604800000
$12 = {
  node = {
    en_arena = 0x7623a00140, /* arena */
    en_addr = 0x7604800000, 
    en_size = 2097152, /* Total region size. */
    en_sn = 16, 
    en_zeroed = true, 
    en_committed = true, 
    en_achunk = true, 
    en_prof_tctx = 0x0, 
    rd = {
      rd_link = {
        qre_next = 0x0, 
        qre_prev = 0x0
      }
    }, 
    cc_link = {
      qre_next = 0x0, 
      qre_prev = 0x0
    }, 
    {
      szsnad_link = {
        rbn_left = 0x7623800000, 
        rbn_right_red = 0x7604a00000
      }, 
      ql_link = {
        qre_next = 0x7623800000, 
        qre_prev = 0x7604a00000
      }
    }, 
    ad_link = {
      rbn_left = 0x0, 
      rbn_right_red = 0x0
    }
  }, 
  hugepage = true, 
  map_bits = {{
      bits = 2015216
    }}
}
可以看到這個 chunk所屬的 arena是 0x7623a00140, chunk的大小是 2M。

可以知道的是 chunks是保存在 je_chunks_rtree 數據結構中的。
還沒有搞明白這個數據結構如何訪問,搞明白了,再補充。

詳細查看單個的 chunk:
(gdb) jechunk 0x7623800000
This chunk belongs to the arena at 0x7623a00140.

addr            info                  size       usage  
------------------------------------------------------------
0x7623800000    headers               0xd000     -      
0x762380d000    small run (0x1c00)    0x7000     4/4    
0x7623814000    small run (0xc00)     0x3000     4/4    
0x7623817000    small run (0x50)      0x5000     256/256
0x762381c000    small run (0x20)      0x1000     128/128
0x762381d000    small run (0x1000)    0x1000     1/1    
0x762381e000    small run (0x1000)    0x1000     1/1    
0x762381f000    small run (0x1000)    0x1000     1/1    
0x7623820000    small run (0x200)     0x1000     8/8    
0x7623821000    small run (0x50)      0x5000     256/256
0x7623826000    small run (0x20)      0x1000     128/128
...
arena_chunk_t中的 map_bits用來記錄當前chunk偏移 0xd000(je_map_bias) 之後的所有的 page的使用狀態;
比如第一個 run是 small run,run大小是0x7000, 包含 7個page,這 7 個 page被平分成了 4個 region;



每個 run 包含 N 個 region,每個 region又包含 N個 PAGE,所以 run大小是 page整數倍;
查看一個run:

(gdb) jerun 0x762380d000
*    status    address         preview         
---------------------------------------------------
0    used      0x762380d000    0000007623812400
1    used      0x762380ec00    0000000000000000
2    used      0x7623810800    0000007613505000
3    used      0x7623812400    0000007613506c00
這個 run是當前chunk的第一個run,包含4個 region,都是使用狀態,每個 region的開始地址也標註了;
我們通過 當前chunk的 map_bits 看下,第一個run的所有page的狀態;
(gdb) p /t ((arena_chunk_t*)0x7623800000)->map_bits[0]
$155 = {
  bits = 1111100001
}
(gdb) p /t ((arena_chunk_t*)0x7623800000)->map_bits[1]
$156 = {
  bits = 10001111100001
}
(gdb) p /t ((arena_chunk_t*)0x7623800000)->map_bits[2]
$157 = {
  bits = 100001111100001
}
(gdb) p /t ((arena_chunk_t*)0x7623800000)->map_bits[3]
$158 = {
  bits = 110001111100001
}
(gdb) p /t ((arena_chunk_t*)0x7623800000)->map_bits[4]
$159 = {
  bits = 1000001111100001
}
(gdb) p /t ((arena_chunk_t*)0x7623800000)->map_bits[5]
$160 = {
  bits = 1010001111100001
}
(gdb) p /t ((arena_chunk_t*)0x7623800000)->map_bits[6]
$161 = {
  bits = 1100001111100001
}
通過 struct arena_chunk_map_bits_s,我們知道, 對應 bits的 第 [0] bit,代表是否被 allocated,第[1] bit代表是否 large run,
可以看到 第 0 ~ 6 的這7個 page都是 allocated狀態[1],且屬於small run [0];
我們可以找一個 large run驗證下:
(gdb) jechunk 0x7623800000
This chunk belongs to the arena at 0x7623a00140.

addr            info                  size       usage  
------------------------------------------------------------
0x7623800000    headers               0xd000     -      
0x762380d000    small run (0x1c00)    0x7000     4/4    
0x7623814000    small run (0xc00)     0x3000     4/4    
...
0x7623883000    small run (0x40)      0x1000     64/64  
0x7623884000    large run             0x7000     -      
0x762388b000    small run (0xc00)     0x3000     4/4  

(gdb) p  (0x7623884000-0x762380d000)/4096
$163 = 119
(gdb) p /t ((arena_chunk_t*)0x7623800000)->map_bits[118]
$164 = {
  bits = 10000001
}
(gdb) p /t ((arena_chunk_t*)0x7623800000)->map_bits[119]
$165 = {
  bits = 1111111111100011
}
可以看到,0x7623884000 對應的 large run,其第1個 bit是 1;
對於 map_bits的解釋,在 arena.h文件中:
/* Each element of the chunk map corresponds to one page within the chunk. */
struct arena_chunk_map_bits_s {
	/*
	 * Run address (or size) and various flags are stored together.  The bit
	 * layout looks like (assuming 32-bit system):
	 *
	 *   ???????? ???????? ???nnnnn nnndumla
	 *
	 * ? : Unallocated: Run address for first/last pages, unset for internal
	 *                  pages.
	 *     Small: Run page offset.
	 *     Large: Run page count for first page, unset for trailing pages.
	 * n : binind for small size class, BININD_INVALID for large size class.
	 * d : dirty?
	 * u : unzeroed?
	 * m : decommitted?
	 * l : large?
	 * a : allocated?
	 *
	 * Following are example bit patterns for the three types of runs.
	 *
	 * p : run page offset
	 * s : run size
	 * n : binind for size class; large objects set these to BININD_INVALID
	 * x : don't care
	 * - : 0
	 * + : 1
	 * [DUMLA] : bit set
	 * [dumla] : bit unset
	 *
	 *   Unallocated (clean):
	 *     ssssssss ssssssss sss+++++ +++dum-a
	 *     xxxxxxxx xxxxxxxx xxxxxxxx xxx-Uxxx
	 *     ssssssss ssssssss sss+++++ +++dUm-a
	 *
	 *   Unallocated (dirty):
	 *     ssssssss ssssssss sss+++++ +++D-m-a
	 *     xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
	 *     ssssssss ssssssss sss+++++ +++D-m-a
	 *
	 *   Small:
	 *     pppppppp pppppppp pppnnnnn nnnd---A
	 *     pppppppp pppppppp pppnnnnn nnn----A
	 *     pppppppp pppppppp pppnnnnn nnnd---A
	 *
	 *   Large:
	 *     ssssssss ssssssss sss+++++ +++D--LA
	 *     xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
	 *     -------- -------- ---+++++ +++D--LA
	 *
	 *   Large (sampled, size <= LARGE_MINCLASS):
	 *     ssssssss ssssssss sssnnnnn nnnD--LA
	 *     xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
	 *     -------- -------- ---+++++ +++D--LA
	 *
	 *   Large (not sampled, size == LARGE_MINCLASS):
	 *     ssssssss ssssssss sss+++++ +++D--LA
	 *     xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
	 *     -------- -------- ---+++++ +++D--LA
	 */
	size_t				bits;
#define	CHUNK_MAP_ALLOCATED	((size_t)0x01U)
#define	CHUNK_MAP_LARGE		((size_t)0x02U)
#define	CHUNK_MAP_STATE_MASK	((size_t)0x3U)

#define	CHUNK_MAP_DECOMMITTED	((size_t)0x04U)
#define	CHUNK_MAP_UNZEROED	((size_t)0x08U)
#define	CHUNK_MAP_DIRTY		((size_t)0x10U)
#define	CHUNK_MAP_FLAGS_MASK	((size_t)0x1cU)

#define	CHUNK_MAP_BININD_SHIFT	5
#define	BININD_INVALID		((size_t)0xffU)
#define	CHUNK_MAP_BININD_MASK	(BININD_INVALID << CHUNK_MAP_BININD_SHIFT)
#define	CHUNK_MAP_BININD_INVALID CHUNK_MAP_BININD_MASK

#define	CHUNK_MAP_RUNIND_SHIFT	(CHUNK_MAP_BININD_SHIFT + 8)
#define	CHUNK_MAP_SIZE_SHIFT	(CHUNK_MAP_RUNIND_SHIFT - LG_PAGE)
#define	CHUNK_MAP_SIZE_MASK						\
    (~(CHUNK_MAP_BININD_MASK | CHUNK_MAP_FLAGS_MASK | CHUNK_MAP_STATE_MASK))
};
所以,對於該 large run :
0x7623884000    large run             0x7000     -      
其總共佔了 7個page,其對應的 map_bits的 第 [119] ~ [125] 的7 個page,所以再查看其最後一個 page的狀態看下:
(gdb) p /t ((arena_chunk_t*)0x7623800000)->map_bits[125]
$169 = {
  bits = 1111111100011
}
(gdb) p  ((arena_chunk_t*)0x7623800000)->map_bits[119]->bits >> 13
$179 = 7
其對應的第一個page的 map_bits中也記錄了該 large run的大小:7 * PAGE_SIZE 


其內容,確實符合規則;

對於chunk中第一個run 0x762380d000,是個 small run:
(gdb) p  ((arena_chunk_t*)0x7623800000)->map_bits[0]->bits >> 13
$172 = 0
(gdb) p  ((arena_chunk_t*)0x7623800000)->map_bits[1]->bits >> 13
$173 = 1
(gdb) p  ((arena_chunk_t*)0x7623800000)->map_bits[2]->bits >> 13
$174 = 2
(gdb) p  ((arena_chunk_t*)0x7623800000)->map_bits[3]->bits >> 13
$175 = 3
(gdb) p  ((arena_chunk_t*)0x7623800000)->map_bits[4]->bits >> 13
$176 = 4
(gdb) p  ((arena_chunk_t*)0x7623800000)->map_bits[5]->bits >> 13
$177 = 5
(gdb) p  ((arena_chunk_t*)0x7623800000)->map_bits[6]->bits >> 13
$178 = 6
也驗證了,map_bits保存其的狀態 bits的高19位,保存着當前 page 在該run中的 offset;

而對於 small run,對應page的 map_bits的第[5~12]的 8個bit保存其對應的 bin的index
0x762380d000    small run (0x1c00)    0x7000     4/4    
該run共 7個page,其狀態對應 map_bits的起始 7個元素,當前run第一個 page的狀態中保存有該run對應的bin index:
(gdb) p   ((arena_chunk_t*)0x7623800000)->map_bits[0]->bits << 19 >> 24
$194 = 31

(gdb) jebins 
arena @ 0x7623a00140
index    addr            size      runcur      
---------------------------------------------------
0        0x7623a00ac0    0x8       0x7623801c28
1        0x7623a00b68    0x10      0x7613408348
2        0x7623a00c10    0x20      0x7615e03b48
3        0x7623a00cb8    0x30      0x76134084c8
4        0x7623a00d60    0x40      0x7623803548
...
30       0x7623a01e70    0x1800    0x76084044a8
31       0x7623a01f18    0x1c00    0x7608401808
32       0x7623a01fc0    0x2000    0x7608403f68
33       0x7623a02068    0x2800    0x7606207568
34       0x7623a02110    0x3000    0x7607a07f88
35       0x7623a021b8    -         - 
可以看到 31號bin對應的 run size確實是 0x1c00;

2.3 查看 runs

run指針:(arena_run_t *)
struct arena_run_s {
	/* Index of bin this run is associated with. */
	szind_t		binind;

	/* Number of free regions in run. */
	unsigned	nfree;

	/* Per region allocated/deallocated bitmap. */
	bitmap_t	bitmap[BITMAP_GROUPS_MAX];
};
查看所有 runs:

(gdb) jeruns 
*       run_addr        run_size    region_size    no_regions    no_free
----------------------------------------------------------------------------
1       0x7604902000    0x31000     -              -             -      
2       0x7604933000    0x31000     -              -             -      
3       0x7604964000    0x31000     -              -             -      
4       0x7604995000    0x31000     -              -             -      
5       0x7604a0d000    0x31000     -              -             -      
6       0x7604a4f000    0x5000      0x1400         4             0      
7       0x7604a54000    0x3000      0x600          8             0      
8       0x7604a57000    0x5000      0x1400         4             0      
9       0x7604a5c000    0x5000      0x1400         4             3      
10      0x7604a77000    0x1000      0x800          2             0      
11      0x7604a78000    0x31000     -              -             -      
12      0x7604b0e000    0xb000      -              -             -      
13      0x7604b1a000    0x3000      0x1800         2             1      
14      0x7604b33000    0x3000      0x3000         1             0      
15      0x7604b36000    0x3000      0x3000         1             0      
16      0x7604b5f000    0xb000      -              -             -      
17      0x7604b6b000    0x1000      0x1000         1             0      
18      0x760560d000    0x29000     -              -             -      
19      0x7605636000    0x29000     -              -             -      
20      0x760565f000    0x29000     -              -             -      
21      0x7605688000    0x5000      0x280          32            29     
22      0x7605690000    0x8000      -              -             -      
23      0x7605698000    0x5000      0xa00          8             6      
24      0x76056b1000    0x3000      0x3000         1             0      
25      0x76056b4000    0x3000      0x3000         1             0      
26      0x76056b7000    0x3000      0x3000         1             0      
....
可以看到,對應的 run指針()地址,當前 run 的 size,當前run中的 region數量,以及每個 region的size,和 處於 free狀態的region數量。
runs是連續存儲在 chunks中的:

(gdb) p /x 0x1400*4
$18 = 0x5000

(gdb) p /x 0x7604a4f000+0x5000
$20 = 0x7604a54000

(gdb) jechunk 0x7604a54000
This chunk belongs to the arena at 0x7623a00140.

addr            info                  size       usage
----------------------------------------------------------
0x7604a00000    headers               0xd000     -    
0x7604a0d000    large run             0x31000    -    
0x7604a3e000    unused range          0x73000    -    
0x7604a4f000    small run (0x1400)    0x5000     4/4  
0x7604a54000    small run (0x600)     0x3000     8/8  
0x7604a57000    small run (0x1400)    0x5000     4/4  
0x7604a5c000    small run (0x1400)    0x5000     1/4  
0x7604a61000    unused range          0x20000    -    
0x7604a77000    small run (0x800)     0x1000     2/2  
0x7604a78000    large run             0x31000    -    
0x7604aa9000    unused range          0xc7000    -    
0x7604b0e000    large run             0xb000     -    
0x7604b19000    unused range          0x17000    -    
0x7604b1a000    small run (0x1800)    0x3000     1/2  
0x7604b1d000    unused range          0x1c000    -    
0x7604b33000    small run (0x3000)    0x3000     1/1  
0x7604b36000    small run (0x3000)    0x3000     1/1  
0x7604b39000    unused range          0x2c000    -    
0x7604b5f000    large run             0xb000     -    
0x7604b6a000    unused range          0x17000    -    
0x7604b6b000    small run (0x1000)    0x1000     1/1  
0x7604b6c000    unused range          0x96000    -    
在chunk中,後續的每個run都是page size的整數倍,且每個run是緊鄰的;
可以確認的是,runs的獲取,是由 chunk獲得的,使用 chunk及其成員 arena_chunk_map_bits_tmap_bits[1];
來獲取的每個 chunk的 runs;
其中sizeof (arena_chunk_t) =128,而 header size = 0xd000(13個page = je_map_bias)
(gdb) p sizeof(arena_chunk_t)
$94 = 128

(gdb) p je_map_bias 
$95 = 13
從上面數據可以看到,緊跟着 header後面就是 chunk中的連續的 runs,而 header 的大小是固定的:je_map_bias * PAGE_SIZE;

je_map_bias是根據chunk的大小 je_chunksize (je_chunk_npages*PAGE_SIZE),以及 arena_chunk_t計算出來的,
在 Android8.0的一個手機上:
(gdb) p /x je_chunksize
$117 = 0x200000
(gdb) p je_map_bias 
$118 = 13
每個chunk最大是 2M,一個 chunk 中有 1 ~ N 個run,所以每個run最大是: 
(gdb) p  je_chunksize-je_map_bias*4096
$123 = 2043904

(gdb) p je_arena_maxrun
$121 = 2043904 

chunk與 runs的關係:



查看其中的一個 run 的bitmap:

(gdb) jeruns 
*       run_addr        run_size    region_size    no_regions    no_free
----------------------------------------------------------------------------
1       0x7604902000    0x31000     -              -             -      
...
32      0x76056c4000    0x1000      0x80           32            5      
...

(gdb) p &((arena_run_t*)0x76056c4000)->bitmap
$261 = (bitmap_t (*)[8]) 0x76056c4008
可以看到,其 bitmap數組只有 8個元素,而它有 32 個 region;

(gdb) jerun 0x76056c4000
*     status    address         preview         
----------------------------------------------------
0     used      0x76056c4000    0000003800000000
1     used      0x76056c4080    0000002200000000
2     used      0x76056c4100    0000441400000000
3     used      0x76056c4180    ff917754ff917701
4     used      0x76056c4200    ff917754ff917754
5     used      0x76056c4280    0e00000050221c13
6     used      0x76056c4300    0c00000000000000
7     used      0x76056c4380    ff917754ff917754
8     used      0x76056c4400    ff917754ff917701
9     used      0x76056c4480    0000223300000000
10    used      0x76056c4500    0000261400000000
11    used      0x76056c4580    00001d1600000000
12    used      0x76056c4600    0000457000000000
13    used      0x76056c4680    0020670962405c06
14    used      0x76056c4700    00003a4e00000000
15    used      0x76056c4780    0000002200000000
16    used      0x76056c4800    0000002200000000
17    used      0x76056c4880    0000003400000000
18    used      0x76056c4900    0000003400000000
19    used      0x76056c4980    0000002700000000
20    used      0x76056c4a00    0000003000000000
21    used      0x76056c4a80    00002d8000000000
22    used      0x76056c4b00    000064d900000000
23    used      0x76056c4b80    0000002200000000
24    used      0x76056c4c00    0000441400000000
25    free      0x76056c4c80    0000000006000002
26    free      0x76056c4d00    0000000006000002
27    free      0x76056c4d80    0000000006000002
28    free      0x76056c4e00    0000000006000002
29    free      0x76056c4e80    0000000006000002
30    used      0x76056c4f00    0000000006000002
31    used      0x76056c4f80    0000000006000002

看到 jerun從開始的位置就是屬於第一個 region,header在哪裏 ?

且查看 8.0 代碼中runlayout,與上圖有所不同:


2.3 查看 regions

(gdb) jeruns 
*       run_addr        run_size    region_size    no_regions    no_free
----------------------------------------------------------------------------
1       0x7604902000    0x31000     -              -             -      
2       0x7604933000    0x31000     -              -             -      
3       0x7604964000    0x31000     -              -             -      
4       0x7604995000    0x31000     -              -             -      
5       0x7604a0d000    0x31000     -              -             -      
6       0x7604a4f000    0x5000      0x1400         4             0
...

(gdb) jerun 0x7604a4f000
*    status    address         preview         
---------------------------------------------------
0    used      0x7604a4f000    00000076268e01b8
1    used      0x7604a50400    00000076268e01b8
2    used      0x7604a51800    00000076268e01b8
3    used      0x7604a52c00    00000076268e01b8
可以看到,當前 run總共有 4個region,且都是使用狀態,每個 region的 size 是 0x1400。
(gdb) jeregions 0x1400
*     run_addr        reg_size    run_size    usage
-------------------------------------------------------
1     0x7604a4f000    5120        0x5000      4/4  
2     0x7604a57000    5120        0x5000      4/4  
3     0x7604a5c000    5120        0x5000      1/4  
4     0x76057bd000    5120        0x5000      2/4  
5     0x7605877000    5120        0x5000      1/4  
6     0x7605b2a000    5120        0x5000      2/4  
7     0x7605b34000    5120        0x5000      4/4  
8     0x7605b39000    5120        0x5000      3/4  
9     0x7605be7000    5120        0x5000      2/4  
10    0x7605bec000    5120        0x5000      4/4  
11    0x7606187000    5120        0x5000      4/4  
12    0x76062fd000    5120        0x5000      4/4  
13    0x7606302000    5120        0x5000      4/4  
14    0x76078bd000    5120        0x5000      4/4  
15    0x76079da000    5120        0x5000      4/4  
16    0x7607b60000    5120        0x5000      1/4  
17    0x7607dea000    5120        0x5000      3/4  
18    0x7613c84000    5120        0x5000      4/4  
19    0x7613fb3000    5120        0x5000      4/4  
20    0x7623973000    5120        0x5000      4/4  

(gdb) p /x 5120
$213 = 0x1400
region大小爲  0x1400的 run共有 21個,第一個 run就是我們剛剛查看的 那個run;

2.4 查看 bins

run是分配的執行者, 而分配的調度者是bin. 這個概念同dlmalloc中的bin是類似的, 但jemalloc中bin要更復雜一些. 直白地說, 可以把bin看作non-full run的倉庫, bin負責記錄當前arena中某一個size class範圍內所有non-full run的使用情況. 當有分配請求時, arena查找相應size class的bin, 找出可用於分配的run, 再由run分配region. 當然, 因爲只有small region分配需要run, 所以bin也只對應small size class.
struct arena_bin_s {
    malloc_mutex_t         lock;    
    arena_run_t            *runcur;
    arena_run_tree_t       runs;
    malloc_bin_stats_t     stats;
};
  • lock: 該lock同arena內部的lock不同, 主要負責保護current run. 而對於run本身的分配和釋放還是需要依賴arena lock. 通常情況下, 獲得bin lock的前提是獲得arena lock, 但反之不成立.

  • runcur: 當前可用於分配的run, 一般情況下指向地址最低的non-full run, 同一時間一個bin只有一個current run用於分配.

  • runs: rb tree, 記錄當前arena中該bin對應size class的所有non-full runs. 因爲分配是通過current run完成的, 所以也相當於current run的倉庫.

  • stats: 統計信息.

這段粗體,摘自 @杏林小軒 的博客;


查看bins:
(gdb) jebins
arena @ 0x7623a00140
index    addr            size      runcur      
---------------------------------------------------
0        0x7623a00ac0    0x8       0x7623801c28
1        0x7623a00b68    0x10      0x7613408348
2        0x7623a00c10    0x20      0x7615e03b48
3        0x7623a00cb8    0x30      0x76134084c8
4        0x7623a00d60    0x40      0x7623803548
5        0x7623a00e08    0x50      0x7608408168
6        0x7623a00eb0    0x60      0x7607a088e8
7        0x7623a00f58    0x70      0x7607a071a8
8        0x7623a01000    0x80      0x7613c04328
9        0x7623a010a8    0xa0      0x762380b2e8
10       0x7623a01150    0xc0      0x7608405348
11       0x7623a011f8    0xe0      0x762380c368
12       0x7623a012a0    0x100     0x7608408888
13       0x7623a01348    0x140     0x761340b0a8
14       0x7623a013f0    0x180     0x7623802468
15       0x7623a01498    0x1c0     0x7623804b08
16       0x7623a01540    0x200     0x76084017a8
17       0x7623a015e8    0x280     0x7607a09368
18       0x7623a01690    0x300     0x7615e06428
19       0x7623a01738    0x380     0x7615e03de8
20       0x7623a017e0    -         -           
21       0x7623a01888    0x500     0x7623806cc8
22       0x7623a01930    -         -           
23       0x7623a019d8    0x700     0x7623802888
24       0x7623a01a80    0x800     0x7613409de8
25       0x7623a01b28    0xa00     0x7623807388
26       0x7623a01bd0    0xc00     0x7605a03248
27       0x7623a01c78    0xe00     0x760560b588
28       0x7623a01d20    0x1000    0x7613407028
29       0x7623a01dc8    0x1400    0x7607a08f48
30       0x7623a01e70    0x1800    0x76084044a8
31       0x7623a01f18    0x1c00    0x7608401808
32       0x7623a01fc0    0x2000    0x7608403f68
33       0x7623a02068    0x2800    0x7606207568
34       0x7623a02110    0x3000    0x7607a07f88
35       0x7623a021b8    -         -           

arena @ 0x7623a8fc00
index    addr            size      runcur      
---------------------------------------------------
0        0x7623a90580    0x8       0x7618a01868
1        0x7623a90628    0x10      0x7618a09008
2        0x7623a906d0    0x20      0x7618a04e08
3        0x7623a90778    0x30      0x7607804268
4        0x7623a90820    0x40      0x7607c0c8a8
5        0x7623a908c8    0x50      0x760780ac28
6        0x7623a90970    0x60      0x7618a021c8
7        0x7623a90a18    0x70      0x7618a022e8
8        0x7623a90ac0    0x80      0x7607807b08
9        0x7623a90b68    0xa0      0x7618a02588
10       0x7623a90c10    0xc0      0x7607c0c4e8
11       0x7623a90cb8    0xe0      0x7618a015c8
12       0x7623a90d60    0x100     0x7618a02048
13       0x7623a90e08    0x140     0x7607807568
14       0x7623a90eb0    0x180     0x7607802528
15       0x7623a90f58    0x1c0     0x7618a0a868
16       0x7623a91000    0x200     0x7618a03008
17       0x7623a910a8    0x280     0x7618a035a8
18       0x7623a91150    0x300     0x7607c0c068
19       0x7623a911f8    0x380     0x7607c0c608
20       0x7623a912a0    0x400     0x7607c0aa48
21       0x7623a91348    0x500     0x7618a05528
22       0x7623a913f0    -         -           
23       0x7623a91498    0x700     0x7607805b88
24       0x7623a91540    0x800     0x76078035a8
25       0x7623a915e8    0xa00     0x76058015c8
26       0x7623a91690    -         -           
27       0x7623a91738    0xe00     0x7618a09068
28       0x7623a917e0    -         -           
29       0x7623a91888    0x1400    0x7607c0c308
30       0x7623a91930    0x1800    0x7607c0a808
31       0x7623a919d8    0x1c00    0x76078063c8
32       0x7623a91a80    -         -           
33       0x7623a91b28    0x2800    0x7607804628
34       0x7623a91bd0    -         -           
35       0x7623a91c78    -         -           
可以看到,每個 arena都對應 36個 bin,對應 36種大小的 mem class;
而實際上,每個 bin都關聯着多個 run,其中有的 run已經滿了,有的 run還沒滿,有的 run還在使用中;

每個 run中的 region size是固定的,其實就是與  bin 的  mem class對應;
比如,bin[31] 對應的 mem class是 0x1c00 大小的內存,那麼它關聯的所有 run的 region size都是 0x1c00;

另外,有個 table:je_size2index_tab 是用來根據 mem size來查找其對應在 bins中的index的:
(gdb) p /d je_size2index_tab 
$286 = {0, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 
  15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 17 <repeats 16 times>, 18 <repeats 16 times>, 19 <repeats 16 times>, 20 <repeats 16 times>, 21 <repeats 32 times>, 
  22 <repeats 32 times>, 23 <repeats 32 times>, 24 <repeats 32 times>, 25 <repeats 64 times>, 26 <repeats 64 times>, 27 <repeats 64 times>, 28 <repeats 64 times>}
由於bins是被分成 Group的,除了0號bin之外, 相鄰的4個bin屬於同一group, 相鄰的兩個group,各自group內的相鄰 bin 的差額是2倍;比如說:
已知 group N內有四個bin: bin1,bin2,bin3,bin4,這4個bin的size差距都是 deltaA
group N+1內有4個bin:bin5,bin6,bin7,bin8,那麼這4個 bin的size差距都是 deltaA * 2

group劃分如下:
{0}, {1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10, 11, 12}, {13, 14, 15, 16}, ...
由於bin的最小 size是 8 byte,所以得到一個 mem地址,獲取其 bin index可以這麼計算:
bin_idx =  (addr -1 ) >> 3
由於每個 bin 的取值範圍不同,所以其對應的 index需要在 je_size2index_tab 中重複 (bin_range_size/8 ) 次,這樣才能得到正確的 bin index,比如:
(gdb) jebins 
arena @ 0x7623a00140
index    addr            size      runcur      
---------------------------------------------------
0        0x7623a00ac0    0x8       0x7623801c28
1        0x7623a00b68    0x10      0x7613408348
2        0x7623a00c10    0x20      0x7615e03b48
3        0x7623a00cb8    0x30      0x76134084c8
4        0x7623a00d60    0x40      0x7623803548
5        0x7623a00e08    0x50      0x7608408168
6        0x7623a00eb0    0x60      0x7607a088e8
7        0x7623a00f58    0x70      0x7607a071a8
8        0x7623a01000    0x80      0x7613c04328
9        0x7623a010a8    0xa0      0x762380b2e8
10       0x7623a01150    0xc0      0x7608405348
11       0x7623a011f8    0xe0      0x762380c368
12       0x7623a012a0    0x100     0x7608408888
13       0x7623a01348    0x140     0x761340b0a8
14       0x7623a013f0    0x180     0x7623802468
15       0x7623a01498    0x1c0     0x7623804b08
第1組中,index1的bin,對應的取值範圍是 0x8 ~  0x10,它的範圍只有8 byte,所以 index1 在 je_size2index_tab 中出現1次即可;
第1組中,index2的bin,對應的取值範圍是 0x10 ~  0x20,它的範圍有16 byte,所以 index1 在 je_size2index_tab 中出現 16/8 = 2次;
第3組中,index9的bin,對應的取值範圍是 0xa0 ~  0x80,它的範圍有32 byte,所以 index9 在 je_size2index_tab 中出現 32/8 = 4次;
依次類推即可;
這樣,我們就可以很快的爲一個請求的 size找到其合適的 bin 去分配內存了:
比如  arenas[idx]->bins[ (size-1) >> 3] 即是 匹配 size的 bin,在該 bin上掛的 runs中給其分配 region 即可;

2.5 tcache

TLS/TSD是另一種針對多線程優化使用的分配技術, jemalloc中稱爲tcache. tcache解決的是同一cpu core下不同線程對heap的競爭. 通過爲每個線程指定專屬分配區域,來減小線程間的干擾. 但顯然這種方法會增大整體內存消耗量. 爲了減小副作用,jemalloc將tcache設計成一個bookkeeping結構, 在tcache中保存的僅僅是指向外部region的指針, region對象仍然位於各個run當中. 換句話說, 如果一個region被tcache記錄了, 那麼從run的角度看, 它就已經被分配了.

tcache的內容如下,

struct tcache_s {
    ql_elm(tcache_t) link;        
    uint64_t           prof_accumbytes;
    arena_t            *arena;        
    unsigned           ev_cnt;        
    unsigned           next_gc_bin;    
    tcache_bin_t      tbins[1];    
};

  • link: 鏈接節點, 用於將同一個arena下的所有tcache鏈接起來.

  • prof_accumbytes: memory profile相關.

  • arena: 該tcache所屬的arena指針.

  • ev_cnt: 是tcache內部的一個週期計數器. 每當tcache執行一次分配或釋放時, ev_cnt會記錄一次. 直到週期到來, jemalloc會執行一次incremental gc.這裏的gc會清理tcache中多餘的region, 將它們釋放掉. 儘管這不意味着系統內存會獲得釋放, 但可以解放更多的region交給其他更飢餓的線程以分配.

  • next_gc_bin: 指向下一次gc的binidx. tcache gc按照一週期清理一個bin執行.

  • tbins: tcache bin數組. 同樣外掛在tcache後面.

同arena bin類似, tcache同樣有tcache_bin_t和tcache_bin_info_t.tcache_bin_t作用類似於arena bin, 但其結構要比後者更簡單. 準確的說, tcache bin並沒有分配調度的功能, 而僅起到記錄作用. 其內部通過一個stack記錄指向外部arena run中的region指針. 而一旦region被cache到tbins內, 就不能再被其他任何線程所使用, 儘管它可能甚至與其他線程tcache中記錄的region位於同一個arena run中.

tcache bin結構如下,

struct tcache_bin_s {
    tcache_bin_stats_t tstats;
    int                   low_water;
    unsigned             lg_fill_div;
    unsigned             ncached;
    void                  **avail;
}
  • tstats: tcache bin內部統計.

  • low_water: 記錄兩次gc間tcache內部使用的最低水線. 該數值與下一次gc時嘗試釋放的region數量有關. 釋放量相當於low water數值的3/4.

  • lg_fill_div: 用作tcache refill時作爲除數. 當tcache耗盡時, 會請求arena run進行refill. 但refill不會一次性灌滿tcache, 而是依照其最大容量縮小2^lg_fill_div的倍數. 該數值同low_water一樣是動態的, 兩者互相配合確保tcache處於一個合理的充滿度.

  • ncached: 指當前緩存的region數量, 同時也代表棧頂index.

  • avail: 保存region指針的stack, 稱爲avail-stack.

tcache_bin_info_t保存tcache bin的靜態信息. 其本身只保存了tcache max容量. 該數值是在tcache boot時根據相對應的arena bin的nregs決定的. 通常等於nregs的二倍, 但不得超過TCACHE_NSLOTS_SMALL_MAX. 該數值默認爲200, 但在android中大大提升了該限制, small bins不得超過8, large bins則爲16.


struct tcache_bin_info_s {
    unsigned    ncached_max;
};
tcache layout如下,


tcache的調試目前還沒有搞定,還無法從 gdb中查看 tcache,待調查;


3.Jemalloc分配與釋放

Jemalloc的分配與釋放待學習,分配路徑,參考下面 Jemalloc的框架圖,可以推測一二:

1.優先 tcache,根據 tcache_bin查找合適的run,找到空閒的 region進行分配

2.繼而選擇一個 arena,然後根據bins,選擇對應 bins上掛着的run,再從run中選取合適的 region分配


當然遇到分配過程中的某一個失敗時,可以選取一個 arena,進行分配新的 chunk,run,region;




發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章