Linux內核代碼分析 slab.c

slab.c來自linux內核2.4.22版,本文件按照GNU協議發佈。
一、準備知識:



  1. slab的概念:
    
    
    • 提出的原因:由於操作系統在運行中會不斷產生、使用、釋放大量重複的對象,
      所以對這樣的重複對象的生成進行改進可以大大提高效率

    • 解決buddy system造成的內存浪費問題
    • 最早由sun的工程師提出(1994年)並首先在sunos5.4上應用。
      
      
  2. slab算法的基本思路:
    
    
    分配:
    if(對相對應的緩存區有空閒位置)
    使用這個位置,不必再初始化;
    else{
    分配內存;
    初始化對象;
    }
    釋放:
    在緩存中標記空閒,不做析構;
    資源不足:
    尋找未使用的對象空間;
    按照要求對部分對象做析構;
    釋放對象佔用的空間;
  3. 緩存區:每一個對象放在一個緩存區
    
    
  4. slab:每個slab塊都是頁面大小的整數倍(有上限)
    
    
  5. 着色:字節數按照硬件的要求對齊,可以大大提高硬件緩存的利用率和效率。
    
    
  6. slab塊的兩種管理模式:
    
    
    • on-slab 適用於小對象(小於1/8頁),slab管理結構存放在slab塊中。
      
      
    • off-slab適用於大對象,(大於等於1/8頁),對象和slab塊的管理結構都由cache_slabp中分配。
      根據slab提出者的論文,slab不適合用在大對象上。
  7. slab涉及的重要操作:
    
    
    • 緩存區創建kmem_cache_create與銷燬kmem_cache_destory
      
      
    • 緩存區收縮kmem_cache_shrink與擴張kmem_cache_grow
      
      
    • 對象分配kmem_cache_alloc與釋放kmem_cache_free
      
      
    • 內核態內存的申請kmalloc與釋放kfree
      
      
二、涉及的重要數據結構:



  1. typedef unsigned int kmem_bufctl_t:slab塊中的管理結構
    
    
  2. cache_size表:保存了不同大小(2^n)頁面指向cache_cache的兩種指針(dma和非dma)。
    
    
  3. 鏈表:最重要的是在管理slab結構中出現的3個鏈表,分別爲完全使用的,部分使用的和完全沒用過的slab。
    
    
  4. 結構體:見下面的代碼分析。
    
    




、代碼分析:



每個顏色代表的含義:
紅色:

代碼註釋;
藕荷色:

編譯預處理需要處理的內容;
藍色:



C語言關鍵字、函數定義;
綠色:



宏定義;
黑色:代碼;
灰色:

輸出信息;
深藍色:

我給出的註解。
/*
* linux/mm/slab.c
* Written by Mark Hemment, 1996/97.
* ([email protected])
*
* kmem_cache_destroy() + some cleanup - 1999 Andrea Arcangeli
*
* Major cleanup, different bufctl logic, per-cpu arrays
* (c) 2000 Manfred Spraul
*
以上爲版權信息
* An implementation of the Slab Allocator as described in outline in;
* UNIX Internals: The New Frontiers by Uresh Vahalia
* Pub: Prentice Hall ISBN 0-13-101908-2
關於slab分配符的一本書
* or with a little more detail in;
* The Slab Allocator: An Object-Caching Kernel Memory Allocator
* Jeff Bonwick (Sun Microsystems).
* Presented at: USENIX Summer 1994 Technical Conference

這個人在1994年USENIX年會上首先提出了關於slab(對象緩存)的概念 www.usenix.org

*
*
* The memory is organized in caches, one cache for each object type.
* (e.g. inode_cache, dentry_cache, buffer_head, vm_area_struct)
* Each cache consists out of many slabs (they are small (usually one
* page long) and always contiguous), and each slab contains multiple
* initialized objects.
*
註釋大意:在緩存中,每一個類型的對象都對應一種緩存,比如inode_cache dentry_cache buffer_head vm_area_struct等等。
每一個緩存包含了很多slab(通常都很小,可能只有一個頁那麼大)

* Each cache can only support one memory type (GFP_DMA, GFP_HIGHMEM,
* normal). If you need a special memory type, then must create a new
* cache for that memory type.
*
註釋大意:每個緩存只能支持一種內存的模式(GFP_DMA, GFP_HIGHMEM, normal,這些都在include/linux/mm.h中作爲宏定義)
* In order to reduce fragmentation, the slabs are sorted in 3 groups:
* full slabs with 0 free objects
* partial slabs
* empty slabs with no allocated objects
*
註釋大意:爲了減少碎片,slab被分在3個組:
全都使用了的slab,沒有空閒的對象
部分slab
全空的slab,沒有分配任何對象

* If partial slabs exist, then new allocations come from these slabs,
* otherwise from empty slabs or new slabs are allocated.
*
註釋大意:如果部分slab存在,則從這些slab中分配,如果不存在那就分配空的或者新的slab。
* kmem_cache_destroy() CAN CRASH if you try to allocate from the cache
* during kmem_cache_destroy(). The caller must prevent concurrent allocs.
*
註釋大意:假如在執行kmem_cache_destory()的時候,又要求緩存分配,則會出現崩潰。調用的時候一定要注意避免併發申請。
* On SMP systems, each cache has a short per-cpu head array, most allocs
* and frees go into that array, and if that array overflows, then 1/2
* of the entries in the array are given back into the global cache.
* This reduces the number of spinlock operations.
*
註釋大意:在對稱多處理器器的系統上,每個緩存都有一個對應CPU的數組,
幾乎所有的分配和釋放操作都會進入這個數組。如果這個數組超過限制了,則數組中一般的內容送回到全局緩存中。
這樣可以減少自旋鎖的數目。

* The c_cpuarray may not be read with enabled local interrupts.
*
註釋大意:當本地的中斷處在激活的狀態下,c_cpuarry是不可讀的
* SMP synchronization:
* constructors and destructors are called without any locking.
* Several members in kmem_cache_t and slab_t never change, they
* are accessed without any locking.
* The per-cpu arrays are never accessed from the wrong cpu, no locking.
* The non-constant members are protected with a per-cache irq spinlock.
*
註釋大意:多處理器的同步:
構建和析構都是在不加鎖的情況下調用的
許多kmem_cache_t和slab_t的成員是永遠不會改變的,因此不用加鎖。
那些改變的成員通過每一個緩存的中斷請求自旋鎖來保護。

* Further notes from the original documentation:
*
註釋大意:更多的資料:
* 11 April '97. Started multi-threading - markhe
* The global cache-chain is protected by the semaphore 'cache_chain_sem'.
* The sem is only needed when accessing/extending the cache-chain, which
* can never happen inside an interrupt (kmem_cache_create(),
* kmem_cache_shrink() and kmem_cache_reap()).
*
註釋大意:1997年4月11日,markhe開始做多線程的支持工作。
全局緩存鏈通過互斥鎖cache_chain_sem來保護
這個互斥鎖只在訪問或者擴展緩存鏈的時候才需要,不會在中斷的過程中(kmem_cache_create(),kmem_cache_shrink(),kmem_cache_reap())出現

* To prevent kmem_cache_shrink() trying to shrink a 'growing' cache (which
* maybe be sleeping and therefore not holding the semaphore/lock), the
* growing field is used. This also prevents reaping from a cache.
*
註釋大意:爲了避免kmem_cache_shrink()試圖收縮正在增長的緩存(處在睡眠狀態,並且不持有互斥鎖或者鎖),
避免收縮正在被使用的增長的區域,(這個互斥鎖)還可以避免回收緩存。

* At present, each engine can be growing a cache. This should be blocked.
*
註釋大意:目前,每個部件都可以是一個正在增長的緩存,這是需要在未來做出改變的。
*/


#include <linux/config.h> 編譯的時候調用生成的autoconf.h
#include <linux/slab.h> 自己的頭文件
#include <linux/interrupt.h> 中斷相關的頭文件
#include <linux/init.h> 初始化相關的頭文件
#include <linux/compiler.h> 編譯器相關的頭文件
#include <linux/seq_file.h> 對順序文件作操作的頭文件
#include <asm/uaccess.h> 訪問用戶態內存操作的頭文件

/*
* DEBUG - 1 for kmem_cache_create() to honour; SLAB_DEBUG_INITIAL,
* SLAB_RED_ZONE & SLAB_POISON.
* 0 for faster, smaller code (especially in the critical paths).
*
註釋大意:如果宏DEBUG爲1,則kmem_cache_create()中執行SLAB_DEBUG_INITIAL,SLAB_RED_ZONE,SLAB_POSION相關操作
* STATS - 1 to collect stats for /proc/slabinfo.
* 0 for faster, smaller code (especially in the critical paths).
*
註釋大意:如果STATS爲1,則從/proc/slabinfo中收集狀態信息。爲0的時候可以產生更快並且更小的代碼(尤其是在重要的步驟中)
* FORCED_DEBUG - 1 enables SLAB_RED_ZONE and SLAB_POISON (if possible)
*/
註釋大意:如果FORCED_DEBUG爲1,則激活SLAB_RED_ZONE,並在可能的情況下激活SLAB_POSION

#ifdef CONFIG_DEBUG_SLAB 預編譯處理,如果定義了CONFIG_DEBUG_SLAB,則將下面三個宏定義爲1,否則定義爲0
#define DEBUG 1
#define STATS 1
#define FORCED_DEBUG 1
#else
#define DEBUG 0
#define STATS 0
#define FORCED_DEBUG 0
#endif

/*
* Parameters for kmem_cache_reap
*/
註釋大意:緩存回收需要的參數
#define REAP_SCANLEN 10
#define REAP_PERFECT 10

/* Shouldn't this be in a header file somewhere? */ 註釋大意:這個是否應該加入到某個頭文件中?
#define BYTES_PER_WORD sizeof(void *)

/* Legal flag mask for kmem_cache_create(). */ 註釋大意:kmem_cache_create()法定的標誌位
#if DEBUG 條件編譯,如果在調試模式下
# define CREATE_MASK (SLAB_DEBUG_INITIAL | SLAB_RED_ZONE | /
SLAB_POISON | SLAB_HWCACHE_ALIGN | /
SLAB_NO_REAP | SLAB_CACHE_DMA | /
SLAB_MUST_HWCACHE_ALIGN)
#else 在非調試模式下
# define CREATE_MASK (SLAB_HWCACHE_ALIGN | SLAB_NO_REAP | /
SLAB_CACHE_DMA | SLAB_MUST_HWCACHE_ALIGN)
#endif

/*
* kmem_bufctl_t:
*
* Bufctl's are used for linking objs within a slab
* linked offsets.
*
註釋大意:Bufctl是用來連接slab中的對象的
* This implementation relies on "struct page" for locating the cache &
* slab an object belongs to.
註釋大意:這個調用通過尋找頁面結構體來找對向所屬的緩存和slab。
* This allows the bufctl structure to be small (one int), but limits
* the number of objects a slab (not a cache) can contain when off-slab
* bufctls are used. The limit is the size of the largest general cache
* that does not use off-slab slabs.
註釋大意:bufctl結構體可以非常的小(比如一個整型),但是在off-slab bufctls使用後
slab(不是緩存)中的對象數目是有限的。這個限制數是不使用off-slab的slab最大的普通緩存的大小

* For 32bit archs with 4 kB pages, is this 56.
註釋大意:對於32位結構的系統而言,4k的頁面,這個限制數目爲56。
* This is not serious, as it is only for large objects, when it is unwise
* to have too many per slab.
註釋大意:這個限制並不是很嚴重,因爲它只是針對大的對象而言的。
每個slab中包含很多大的對象是不明智的。

* Note: This limit can be raised by introducing a general cache whose size
* is less than 512 (PAGE_SIZE<<3), but greater than 256.
*/
註釋大意:這個限制可以通過引入一個小於512(PAGE_SIZE<<3)但是大於256的普通緩存來提升。

#define BUFCTL_END 0xffffFFFF 定義宏BUFCTL_END
#define SLAB_LIMIT 0xffffFFFE 定義宏SLAB_LIMIT
typedef unsigned int kmem_bufctl_t;

定義類型kem_bufctl_t實際上是無符號的整型數



/* Max number of objs-per-slab for caches which use off-slab slabs.
* Needed to avoid a possible looping condition in kmem_cache_grow().
*/
註釋大意:使用off-slab對象緩存的每個slab中對象的最大數目
在kmem_cache_grow()中需要避免可能出現的自我循環情況

static unsigned long offslab_limit;定義offslab_limit爲一個無符號整型數

/*
* slab_t
*
* Manages the objs in a slab. Placed either at the beginning of mem allocated
* for a slab, or allocated from an general cache.
* Slabs are chained into three list: fully used, partial, fully free slabs.
*
註釋大意:管理slab中的對象,出現在爲slab分配的內存的起始處或者分配的普通緩存。
slab有3個鏈,一個是完全使用的,一個是部分使用的,一個是完全空的

typedef struct slab_s {
struct list_head list;
unsigned long colouroff;
void *s_mem; /* including colour offset */ 着色的偏移量
unsigned int inuse; /* num of objs active in slab */ 在slab中正在被使用的對象數
kmem_bufctl_t free;slab中第一個空閒對象相對s_mem的偏移量。
} slab_t;slab的鏈狀結構定義。

#define slab_bufctl(slabp) /
((kmem_bufctl_t *)(((slab_t*)slabp)+1))
宏定義slab_bufctl

/*
* cpucache_t
*
* Per cpu structures
* The limit is stored in the per-cpu structure to reduce the data cache
* footprint.
*/
註釋大意:每個CPU的結構

typedef struct cpucache_s {
unsigned int avail;可用
unsigned int limit;限制
} cpucache_t;定義cpucache結構體

#define cc_entry(cpucache) /
((void **)(((cpucache_t*)(cpucache))+1))
宏定義cc_entry(cpu緩存入口)爲一個函數指針
#define cc_data(cachep) /
((cachep)->cpudata[smp_processor_id()])
宏定義cc_data爲緩存中cpu的標號
/*
* kmem_cache_t
*
* manages a cache.
*/


#define CACHE_NAMELEN 20 /* max name length for a slab cache */ 宏定義slab中最長的命名爲20

struct kmem_cache_s {
/* 1) each alloc & free */ 對於每次申請和釋放操作,都首先從滿的和部分使用的slab開始,然後再是空的slab
/* full, partial first, then free */
struct list_head slabs_full;
struct list_head slabs_partial;
struct list_head slabs_free;前面說到的3個不同狀態的鏈
unsigned int objsize;對象的大小
unsigned int flags; /* constant flags */ 屬性標誌
屬性標誌可能存在的幾種:
SLAB_POISON:標誌未初始化的部分,用0xA5(即10100101)填充
SLAB_RED_ZONE: 標誌紅色區域。紅色區域的開始和結束的位置有一個特殊標示來保存這個對象的狀態
RED_MAGIC1(0x5A2CF071)爲活躍狀態,RED_MAGIC2(0x170FC2A5)爲不活躍狀態
當分配對象的時色區域變爲活躍狀態,初始化空閒對象和收回對象空間時變爲不活躍。
紅色區域可以防止堆棧溢出(劃了邊界了,不能越界)。
SLAB_NO_REAP: 即使內存緊缺也不自動收縮這塊緩存
SLAB_HWCACHE_ALIGN: 使用硬件對齊
CFLAGS_OFF_SLAB: off-slab模式(對大的對象操作的時候用這個)
以上變量定義在include/linux/slab.h

unsigned int num; /* # of objs per slab */ 每個slab中對象的數目
spinlock_t spinlock;自旋鎖
#ifdef CONFIG_SMP 如果定義了SMP
unsigned int batchcount;則定義一個批處理計數
#endif

/* 2) slab additions /removals */ slab的增加和消除
/* order of pgs per slab (2^n) */
unsigned int gfporder;每個slab中頁面數目是2的多少次方

/* force GFP flags, e.g. GFP_DMA */
unsigned int gfpflags;申請頁面的時候的優先級,在include/linux/mm.h中定義

size_t colour; /* cache colouring range */ 着色的範圍
unsigned int colour_off; /* colour offset */ 着色的偏移量
unsigned int colour_next; /* cache colouring */ 下一個着色的
kmem_cache_t *slabp_cache;針對off slab模式指向cache_slabp緩衝區的指針
unsigned int growing;對正在增長的slab設置的標誌,以便避免在增長的時候進行了收縮操作。
unsigned int dflags; /* dynamic flags */ 對動態作的標誌

/* constructor func */
void (*ctor)(void *, kmem_cache_t *, unsigned long);構造函數

/* de-constructor func */
void (*dtor)(void *, kmem_cache_t *, unsigned long);析構函數

unsigned long failures;失敗標記

/* 3) cache creation/removal */ 緩存增加和消除
char name[CACHE_NAMELEN];緩存區的名字(在/proc/slabinfo中的名字)
struct list_head next;指向下一個緩存結構的指針
#ifdef CONFIG_SMP 編譯預處理,如果是對稱多處理器
/* 4) per-cpu data */
cpucache_t *cpudata[NR_CPUS];
則設置一個指向每一個CPU運行的進程的指針(NR_CPUS作爲宏定義在include/linux/threads.h)

#endif
#if STATS 編譯預處理,如果需要記錄狀態
unsigned long num_active;活躍的數目
unsigned long num_allocations;分配的數目
unsigned long high_mark;最多活躍的標記
unsigned long grown;增長標記
unsigned long reaped;回收的標記
unsigned long errors;出錯的標記
#ifdef CONFIG_SMP 編譯預處理,如果定義了對稱多處理器
atomic_t allochit;原子計數器分配命中數
atomic_t allocmiss;原子計數器分配未命中數
atomic_t freehit;原子計數器釋放命中數
atomic_t freemiss;原子計數器釋放未命中數
#endif
#endif
};

/* internal c_flags */
#define CFLGS_OFF_SLAB 0x010000UL /* slab management in own cache */ slab管理自己的緩存
#define CFLGS_OPTIMIZE 0x020000UL /* optimized slab lookup */ 優化slab查找

/* c_dflags (dynamic flags). Need to hold the spinlock to access this member */ 動態標誌,訪問的時候要加一個自旋鎖
#define DFLGS_GROWN 0x000001UL /* don't reap a recently grown */

#define OFF_SLAB(x) ((x)->flags & CFLGS_OFF_SLAB) 設置off slab模式
#define OPTIMIZE(x) ((x)->flags & CFLGS_OPTIMIZE) 設置優化模式
#define GROWN(x) ((x)->dlags & DFLGS_GROWN) 設置動態增長標誌

#if STATS 編譯預處理,如果察看狀態
#define STATS_INC_ACTIVE(x) ((x)->num_active++) 活躍加1
#define STATS_DEC_ACTIVE(x) ((x)->num_active--) 活躍減1
#define STATS_INC_ALLOCED(x) ((x)->num_allocations++) 已經分配加1
#define STATS_INC_GROWN(x) ((x)->grown++) 增長加1
#define STATS_INC_REAPED(x) ((x)->reaped++) 回收加1
#define STATS_SET_HIGH(x) do { if ((x)->num_active > (x)->high_mark) /
(x)->high_mark = (x)->num_active; /
} while (0)
設置最多活躍
#define STATS_INC_ERR(x) ((x)->errors++) 錯誤加1
#else 編譯預處理,如果不察看狀態,那麼都是空操作
#define STATS_INC_ACTIVE(x) do { } while (0)
#define STATS_DEC_ACTIVE(x) do { } while (0)
#define STATS_INC_ALLOCED(x) do { } while (0)
#define STATS_INC_GROWN(x) do { } while (0)
#define STATS_INC_REAPED(x) do { } while (0)
#define STATS_SET_HIGH(x) do { } while (0)
#define STATS_INC_ERR(x) do { } while (0)

#endif

#if STATS && defined(CONFIG_SMP) 編譯預處理,如果察看狀態並且是對稱多處理器
#define STATS_INC_ALLOCHIT(x) atomic_inc(&(x)->allochit) 原子操作增加分配命中
#define STATS_INC_ALLOCMISS(x) atomic_inc(&(x)->allocmiss) 原子操作增加分配沒有命中
#define STATS_INC_FREEHIT(x) atomic_inc(&(x)->freehit) 原子操作增加釋放命中的
#define STATS_INC_FREEMISS(x) atomic_inc(&(x)->freemiss)
原子操作增加釋放沒有命中的
#else 編譯預處理,如果不察看狀態,那麼都是空操作
#define STATS_INC_ALLOCHIT(x) do { } while (0)
#define STATS_INC_ALLOCMISS(x) do { } while (0)
#define STATS_INC_FREEHIT(x) do { } while (0)
#define STATS_INC_FREEMISS(x) do { } while (0)
#endif

#if DEBUG 編譯預處理,如果設置了查錯模式
/* Magic nums for obj red zoning.
* Placed in the first word before and the first word after an obj.
*/
爲紅色區域標記的magic number.(前面已經提到過)
#define RED_MAGIC1 0x5A2CF071UL /* when obj is active */
#define RED_MAGIC2 0x170FC2A5UL /* when obj is inactive */

/* ...and for poisoning */ 沒有初始化標記
#define POISON_BYTE 0x5a /* byte value for poisoning */ 01011010作爲起始標記
#define POISON_END 0xa5 /* end-byte of poisoning */ 10100101作爲結束標記
額外的知識: 使用0xA5填充未初始化的區域的原因:
對於爲初始化的區域,也可以考慮用0xFF或0x00填充,但是用0xA5填充可以確保不出現偶然的相鄰位的短路:
例如,D0 D1 D2 D3 ....D7,其中D1-D2出現了短路
對於用0x00填充而言:D0-D7 00000000
對於用0xFF填充而言:D0-D7 11111111
對於用0xA5填充而言:D0-D7 10000101
可以非常容易的檢查出來硬件的失效或者偶然的錯誤
參考:Software-Based Memory Testing 1997 by Michael Barr http://www.netrino.com/Articles/MemoryTesting/paper.html

#endif

/* maximum size of an obj (in 2^order pages) */ 對象可以佔用的最大的頁面的2的冪
#define MAX_OBJ_ORDER 5 /* 32 pages */ 2的5次方等於32

/*
* Do not go above this order unless 0 objects fit into the slab.
*/
當沒有對象適合在slab中的時候,空閒頁最多不超過4個頁,最少不小於2個頁。
#define BREAK_GFP_ORDER_HI 2
#define BREAK_GFP_ORDER_LO 1
static int slab_break_gfp_order = BREAK_GFP_ORDER_LO;初始爲2個頁

/*
* Absolute limit for the gfp order
最多的空閒頁的硬上限爲2的5次方,即32
*/

#define MAX_GFP_ORDER 5 /* 32 pages */


/* Macros for storing/retrieving the cachep and or slab from the
* global 'mem_map'. These are used to find the slab an obj belongs to.
* With kfree(), these are used to find the cache which an obj belongs to.
*/
註釋大意:下面的這些宏是用來在全局mem_map(內存映射)中存儲/找回cachep或slab。
這些宏是用來找到對象所屬的slab,通過使用kfree()來找到對象所屬的緩存。

#define SET_PAGE_CACHE(pg,x) ((pg)->list.next = (struct list_head *)(x))
#define GET_PAGE_CACHE(pg) ((kmem_cache_t *)(pg)->list.next)
#define SET_PAGE_SLAB(pg,x) ((pg)->list.prev = (struct list_head *)(x))
#define GET_PAGE_SLAB(pg) ((slab_t *)(pg)->list.prev)
上面這些宏是通過對頁面的鏈來做操作實現功能的
/* Size description struct for general caches. */ 下面的結構體是對於普通緩存的描述
typedef struct cache_sizes {
size_t cs_size;緩存的大小
kmem_cache_t *cs_cachep;指向cache_cache中kmem_cache_cache_s型通用緩存區描述結構
kmem_cache_t *cs_dmacachep;指向cache_cache中kmem_cache_cache_s型通用緩存區描述結構,處理dma數據塊用的
} cache_sizes_t;

static cache_sizes_t cache_sizes[] = {定義緩存的大小
#if PAGE_SIZE == 4096 編譯預處理,假如頁面的大小爲4096
{ 32, NULL, NULL},
#endif
{ 64, NULL, NULL},
{ 128, NULL, NULL},
{ 256, NULL, NULL},
{ 512, NULL, NULL},
{ 1024, NULL, NULL},
{ 2048, NULL, NULL},
{ 4096, NULL, NULL},
{ 8192, NULL, NULL},
{ 16384, NULL, NULL},
{ 32768, NULL, NULL},
{ 65536, NULL, NULL},
{131072, NULL, NULL},
{ 0, NULL, NULL}
};後面的NULL就是爲cs_cachep和cs_dmacachep準備的

/* internal cache of cache description objs */ 內部緩存的緩存描述對象結構體
static kmem_cache_t cache_cache = {
slabs_full: LIST_HEAD_INIT(cache_cache.slabs_full),
slabs_partial: LIST_HEAD_INIT(cache_cache.slabs_partial),
slabs_free: LIST_HEAD_INIT(cache_cache.slabs_free), 三種狀態的鏈表
objsize: sizeof (kmem_cache_t), 對象的大小
flags: SLAB_NO_REAP, 設置標誌爲不自動回收
spinlock: SPIN_LOCK_UNLOCKED,設置自旋鎖爲不鎖定狀態
colour_off: L1_CACHE_BYTES,設定着色範圍爲1級緩存的大小
name: "kmem_cache" ,設置名稱
};

/* Guard access to the cache-chain. */
static struct semaphore cache_chain_sem;設置互斥鎖,以便保護緩存鏈

/* Place maintainer for reaping. */ 準備回收用的指針
static kmem_cache_t *clock_searchp = &cache_cache;

#define cache_chain (cache_cache.next) 宏定義緩存鏈

#ifdef CONFIG_SMP 編譯預處理,如果是對稱多處理器
/*
* chicken and egg problem: delay the per-cpu array allocation
* until the general caches are up.
*/
註釋大意:先有雞還是先有蛋的問題:等普通緩存就緒之後再分配每個CPU的數組。
static int g_cpucache_up;定義普通緩存是否就緒的狀態變量

static void enable_cpucache (kmem_cache_t *cachep) ;激活cpu緩存
static void enable_all_cpucaches (void) ;激活所有cpu緩存
#endif

/* Cal the num objs, wastage, and bytes left over for a given slab size. */
本函數負責計算對象的數目,浪費的空間,以及在所給的slab中剩餘的空間。
static void kmem_cache_estimate (unsigned long gfporder, size_t size,
int flags, size_t *left_over, unsigned int *num)

{
int i;
size_t wastage = PAGE_SIZE<<gfporder;
size_t extra = 0;
size_t base = 0;

if (!(flags & CFLGS_OFF_SLAB)) {
base = sizeof (slab_t);
extra = sizeof (kmem_bufctl_t);
}
i = 0;
while (i*size + L1_CACHE_ALIGN(base+i*extra) <= wastage)
i++;
if (i > 0)
i--;

if (i > SLAB_LIMIT)
i = SLAB_LIMIT;

*num = i;
wastage -= i*size;
wastage -= L1_CACHE_ALIGN(base+i*extra);
*left_over = wastage;計算出來的浪費的空間
}

/* Initialisation - setup the `cache' cache. */
本函數負責初始化緩存的"緩存"
void __init kmem_cache_init(void)
{
size_t left_over;

init_MUTEX(&cache_chain_sem);
INIT_LIST_HEAD(&cache_chain);

kmem_cache_estimate(0, cache_cache.objsize, 0,
&left_over, &cache_cache.num);
if (!cache_cache.num)
BUG();

cache_cache.colour = left_over/cache_cache.colour_off;
cache_cache.colour_next = 0;
}


/* Initialisation - setup remaining internal and general caches.
* Called after the gfp() functions have been enabled, and before smp_init().
*/
初始化cache_size表的過程。設置保留的內部和普通緩存。在函數gfp() (GFP, get free page)已經被激活之後再調用,
並且在smp_init() (對稱多處理器初始化) 執行後再調用。

void __init kmem_cache_sizes_init(void)
{
cache_sizes_t *sizes = cache_sizes;
char name[20];顯然有問題的,前面已經定義了CACHE_NAMELEN,這裏竟然不用!
顯然是開發不統一造成的,未來修改代碼的時候很可能造成不好影響

/*
* Fragmentation resistance on low memory - only use bigger
* page orders on machines with more than 32MB of memory.
*/
爲了避免在小內存的時候出現碎片,只有當內存大於32M的時候纔會用比較大的頁面數,2^n(冪)
if (num_physpages > (32 << 20) >> PAGE_SHIFT)
slab_break_gfp_order = BREAK_GFP_ORDER_HI;
do {
/* For performance, all the general caches are L1 aligned.
* This should be particularly beneficial on SMP boxes, as it
* eliminates "false sharing".
* Note for systems short on memory removing the alignment will
* allow tighter packing of the smaller caches. */
註釋大意:出於性能的考慮,所有的普通緩存都是按照L1緩存的大小對齊的。
這樣做對對稱多處理器的系統來說是非常有益的,這是由於對稱多處理器系統消除了假共享。

snprintf(name, sizeof (name), "size-%Zd" ,sizes->cs_size);爲/proc/slabinfo做準備
if (!(sizes->cs_cachep =
kmem_cache_create(name, sizes->cs_size,
0, SLAB_HWCACHE_ALIGN, NULL, NULL))) {
BUG();
}如果創建緩存失敗,則報錯。

/* Inc off-slab bufctl limit until the ceiling is hit. */ 增加off-slab模式的控制限制,直到到達底線
if (!(OFF_SLAB(sizes->cs_cachep))) {
offslab_limit = sizes->cs_size-sizeof (slab_t);
offslab_limit /= 2;這裏實際上有問題,應該寫成offslab_limit /=sizeof(kmem_bufctl_t)
如果按照/2計算的話,那永遠都不會到達底線了。這個問題在2.6的內核中已經修正
參考資料:http://www.cs.helsinki.fi/linux/linux-kernel/2001-17/1193.html

}
snprintf(name, sizeof (name), "size-%Zd(DMA)" ,sizes->cs_size);設置名稱
sizes->cs_dmacachep = kmem_cache_create(name, sizes->cs_size, 0,
SLAB_CACHE_DMA|SLAB_HWCACHE_ALIGN, NULL, NULL);
if (!sizes->cs_dmacachep)
BUG();
sizes++;
} while (sizes->cs_size);
}

int __init kmem_cpucache_init(void)
{
#ifdef CONFIG_SMP 編譯預處理,如果是多處理器
g_cpucache_up = 1;設置普通緩存已經激活標誌
enable_all_cpucaches();激活所有cpu的緩存 懷疑有問題,不作爲原子操作可以嗎?而且先設置激活然後執行?
#endif
return 0;
}

__initcall(kmem_cpucache_init) ;

/* Interface to system's page allocator. No need to hold the cache-lock.
*/
對系統頁分配器的藉口,不需要加緩存鎖。
static inline void * kmem_getpages (kmem_cache_t *cachep, unsigned long flags)
{
void *addr;

/*
* If we requested dmaable memory, we will get it. Even if we
* did not request dmaable memory, we might get it, but that
* would be relatively rare and ignorable.
*/
如果我們要求dma方式的內存,那麼我們將獲得。即使我們沒有要求可以dma方式的內存,
我們仍然可能會獲取到,但是通常情況下這個是不會被忽略的。

flags |= cachep->gfpflags;
addr = (void*) __get_free_pages(flags, cachep->gfporder);
/* Assume that now we have the pages no one else can legally
* messes with the 'struct page's.
* However vm_scan() might try to test the structure to see if
* it is a named-page or buffer-page. The members it tests are
* of no interest here.....
*/
到此爲止,我們已經有了別人不能弄亂的頁面了。儘管vm_scan()有可能會去檢測這個結構,看看是一個命名了的頁還是一個緩存頁。
對於成員的測試在這裏並不關心。

return addr;
}

/* Interface to system's page release. */
系統釋放頁面的接口。
static inline void kmem_freepages (kmem_cache_t *cachep, void *addr)
{
unsigned long i = (1<<cachep->gfporder);
struct page *page = virt_to_page(addr);

/* free_pages() does not clear the type bit - we do that.
* The pages have been unlinked from their cache-slab,
* but their 'struct page's might be accessed in
* vm_scan(). Shouldn't be a worry.
*/
free_page()不清除標誌位,我們這裏手工去做。
這些頁面從slab緩存中移出,但是他們的結構化頁仍然可能被vm_scan()訪問到,但是不必擔心。

while (i--) {
PageClearSlab(page);清除標記位
page++;
}
free_pages((unsigned long)addr, cachep->gfporder);釋放頁
}

#if DEBUG 條件編譯,如果設置調試模式
static inline void kmem_poison_obj (kmem_cache_t *cachep, void *addr)
{
int size = cachep->objsize;
if (cachep->flags & SLAB_RED_ZONE) {
addr += BYTES_PER_WORD;
size -= 2*BYTES_PER_WORD;
}留出紅區
memset(addr, POISON_BYTE, size);設置未初始化的地址內容爲POSION_BYTE
*(unsigned char *)(addr+size-1) = POISON_END;寫結尾
}

static inline int kmem_check_poison_obj (kmem_cache_t *cachep, void *addr) 檢查未初始化的空間的對象
{
int size = cachep->objsize;
void *end;
if (cachep->flags & SLAB_RED_ZONE) {
addr += BYTES_PER_WORD;
size -= 2*BYTES_PER_WORD;
}紅區
end = memchr(addr, POISON_END, size);
if (end != (addr+size-1))
return 1;出錯退出
return 0;正常退出
}
#endif

/* Destroy all the objs in a slab, and release the mem back to the system.
* Before calling the slab must have been unlinked from the cache.
* The cache-lock is not held/needed.
*/
註釋大意:銷燬slab中的所有對象,釋放內存給系統。
調用前,slab必須已經和cache取消了連接
緩存鎖不被佔用,也不需要緩存鎖。

static void kmem_slab_destroy (kmem_cache_t *cachep, slab_t *slabp)
{
if (cachep->dtor
#if DEBUG
|| cachep->flags & (SLAB_POISON | SLAB_RED_ZONE)如果調試模式,則作紅區處理
#endif
) {
int i;
for (i = 0; i < cachep->num; i++) {
void* objp = slabp->s_mem+cachep->objsize*i;
#if DEBUG
if (cachep->flags & SLAB_RED_ZONE) {
if (*((unsigned long*)(objp)) != RED_MAGIC1)
BUG();
if (*((unsigned long*)(objp + cachep->objsize
-BYTES_PER_WORD)) != RED_MAGIC1)
BUG();紅區的邊界不對,則報錯
objp += BYTES_PER_WORD;
}
#endif
if (cachep->dtor)
(cachep->dtor)(objp, cachep, 0);清空
#if DEBUG
if (cachep->flags & SLAB_RED_ZONE) {
objp -= BYTES_PER_WORD;減去一個字的長度
}
if ((cachep->flags & SLAB_POISON) &&
kmem_check_poison_obj(cachep, objp))檢查未初始化的部分,如果有問題則報錯
BUG();
#endif
}
}

kmem_freepages(cachep, slabp->s_mem-slabp->colouroff);釋放資源
if (OFF_SLAB(cachep))釋放off-slab模式的資源
kmem_cache_free(cachep->slabp_cache, slabp);
}

/**
* kmem_cache_create - Create a cache.
* @name: A string which is used in /proc/slabinfo to identify this cache.
* @size: The size of objects to be created in this cache.
* @offset: The offset to use within the page.
* @flags: SLAB flags
* @ctor: A constructor for the objects.
* @dtor: A destructor for the objects.
*
* Returns a ptr to the cache on success, NULL on failure.
* Cannot be called within a int, but can be interrupted.
* The @ctor is run when new pages are allocated by the cache
* and the @dtor is run before the pages are handed back.
* The flags are
*
* %SLAB_POISON - Poison the slab with a known test pattern (a5a5a5a5)
* to catch references to uninitialised memory.
*
* %SLAB_RED_ZONE - Insert `Red' zones around the allocated memory to check
* for buffer overruns.
*
* %SLAB_NO_REAP - Don't automatically reap this cache when we're under
* memory pressure.
*
* %SLAB_HWCACHE_ALIGN - Align the objects in this cache to a hardware
* cacheline. This can be beneficial if you're counting cycles as closely
* as davem.
*/

kmem_cache_t *
kmem_cache_create (const char *name, size_t size, size_t offset,
unsigned long flags, void (*ctor)(void*, kmem_cache_t *, unsigned long),
void (*dtor)(void*, kmem_cache_t *, unsigned long))

{
const char *func_nm = KERN_ERR "kmem_create: " ;
size_t left_over, align, slab_size;
kmem_cache_t *cachep = NULL;

/*
* Sanity checks... these are all serious usage bugs.
*/
健壯性檢察
if ((!name) ||
((strlen(name) >= CACHE_NAMELEN - 1)) ||
in_interrupt() ||
(size < BYTES_PER_WORD) ||
(size > (1<<MAX_OBJ_ORDER)*PAGE_SIZE) ||
(dtor && !ctor) ||
(offset < 0 || offset > size))
BUG();

#if DEBUG 條件編譯
if ((flags & SLAB_DEBUG_INITIAL) && !ctor) {
/* No constructor, but inital state check requested */ 沒有構建者,但是要求初始化檢查
printk("%sNo con, but init state check requested - %s/n" , func_nm, name);
flags &= ~SLAB_DEBUG_INITIAL;
}

if ((flags & SLAB_POISON) && ctor) {在沒有構建者的情況下要求設置未初始化
/* request for poisoning, but we can't do that with a constructor */
printk("%sPoisoning requested, but con given - %s/n" , func_nm, name);
flags &= ~SLAB_POISON;
}
#if FORCED_DEBUG
if ((size < (PAGE_SIZE>>3)) && !(flags & SLAB_MUST_HWCACHE_ALIGN))
/*
* do not red zone large object, causes severe
* fragmentation.
*/
不將大的對象放入到紅區,否則會造成大量碎片
flags |= SLAB_RED_ZONE;
if (!ctor)
flags |= SLAB_POISON;
#endif
#endif

/*
* Always checks flags, a caller might be expecting debug
* support which isn't available.
*/

BUG_ON(flags & ~CREATE_MASK);

/* Get cache's description obj. */ 調用kmem_cache_alloc從cache_cache中分配一個對象
cachep = (kmem_cache_t *) kmem_cache_alloc(&cache_cache, SLAB_KERNEL);
if (!cachep)
goto opps;
memset(cachep, 0, sizeof (kmem_cache_t));將新分配的空間全都設置爲0

/* Check that size is in terms of words. This is needed to avoid
* unaligned accesses for some archs when redzoning is used, and makes
* sure any on-slab bufctl's are also correctly aligned.
*/
檢查每個字的大小。在某些體系結構的系統中,要通過這個避免再沒有對齊的情況下對紅區的訪問
並且確認所有on-slab 緩存控制結構體已經正確對齊

if (size & (BYTES_PER_WORD-1)) {
size += (BYTES_PER_WORD-1);
size &= ~(BYTES_PER_WORD-1);
printk("%sForcing size word alignment - %s/n" , func_nm, name);
}

#if DEBUG
if (flags & SLAB_RED_ZONE) {
/*
* There is no point trying to honour cache alignment
* when redzoning.
*/

flags &= ~SLAB_HWCACHE_ALIGN;
size += 2*BYTES_PER_WORD; /* words for redzone */ 紅區的字數
}
#endif
align = BYTES_PER_WORD;
if (flags & SLAB_HWCACHE_ALIGN)如果要求硬件對齊,則按照CPU L1緩存的大小對齊,否則按照字長對齊
align = L1_CACHE_BYTES;

/* Determine if the slab management is 'on' or 'off' slab. */
if (size >= (PAGE_SIZE>>3))判斷on-slab還是off-slab
/*
* Size is large, assume best to place the slab management obj
* off-slab (should allow better packing of objs).
*/
如果大(超過512字節),那就採用off-slab模式
flags |= CFLGS_OFF_SLAB;

if (flags & SLAB_HWCACHE_ALIGN) {
/* Need to adjust size so that objs are cache aligned. */
/* Small obj size, can get at least two per cache line. */
/* FIXME: only power of 2 supported, was better */
調整對象的大小,以便能和緩存對齊
while (size < align/2)
align /= 2;
size = (size+align-1)&(~(align-1));
}

/* Cal size (in pages) of slabs, and the num of objs per slab.
* This could be made much more intelligent. For now, try to avoid
* using high page-orders for slabs. When the gfp() funcs are more
* friendly towards high-order requests, this should be changed.
*/
計算頁面中slab的大小,每個slab中對象的個數。
do {
unsigned int break_flag = 0;
cal_wastage:
kmem_cache_estimate(cachep->gfporder, size, flags,
&left_over, &cachep->num);
計算消耗,left_over保存剩餘的空間,cachep->num保存slab塊中可以存放的對象個數

if (break_flag)
break ;
if (cachep->gfporder >= MAX_GFP_ORDER)如果超大(32*4096=128K),則退出循環
break ;
if (!cachep->num)超過可以保存的數目了,則退出循環。
goto next;
if (flags & CFLGS_OFF_SLAB && cachep->num > offslab_limit) {超過offslab最大的限制,則重新計算花費,然後退出
/* Oops, this num of objs will cause problems. */
cachep->gfporder--;
break_flag++;
goto cal_wastage;
}

/*
* Large num of objs is good, but v. large slabs are currently
* bad for the gfp()s.
*/
對象數目越多越好,但是過多的slab目前會對gfp()造成不好的影響
if (cachep->gfporder >= slab_break_gfp_order)
break ;

if ((left_over*8) <= (PAGE_SIZE<<cachep->gfporder))
防止浪費的操作:假設slab只比對象大一點點,那麼可能會造成一個對象的空間大的浪費,增加slab的大小,以便能存放更多的對象。
如果浪費小於等於1/8則不再增長slab

break ; /* Acceptable internal fragmentation. */
next:
cachep->gfporder++;
} while (1);

if (!cachep->num) {如果超限,則報錯釋放資源,返回
printk("kmem_cache_create: couldn't create cache %s./n" , name);
kmem_cache_free(&cache_cache, cachep);
cachep = NULL;
goto opps;
}
slab_size = L1_CACHE_ALIGN(cachep->num*sizeof (kmem_bufctl_t)+sizeof (slab_t));slab塊中管理變量的大小總和(L1 cache對齊)

/*
* If the slab has been placed off-slab, and we have enough space then
* move it on-slab. This is at the expense of any extra colouring.
*/
能用on-slab的時候就用on-slab
if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {
flags &= ~CFLGS_OFF_SLAB;
left_over -= slab_size;
}

/* Offset must be a multiple of the alignment. */ 將offset設置爲合適的對齊偏移量
offset += (align-1);
offset &= ~(align-1);
if (!offset)
offset = L1_CACHE_BYTES;如果沒有偏移,那就按照L1緩存的大小對齊。
cachep->colour_off = offset;着色偏移量
cachep->colour = left_over/offset;當前着色

/* init remaining fields */ 初始化其他的部分
if (!cachep->gfporder && !(flags & CFLGS_OFF_SLAB))
flags |= CFLGS_OPTIMIZE;

cachep->flags = flags;設標誌
cachep->gfpflags = 0;
if (flags & SLAB_CACHE_DMA)
cachep->gfpflags |= GFP_DMA;
spin_lock_init(&cachep->spinlock);初始化鎖
cachep->objsize = size;設大小
INIT_LIST_HEAD(&cachep->slabs_full);
INIT_LIST_HEAD(&cachep->slabs_partial);
INIT_LIST_HEAD(&cachep->slabs_free);初始化3個隊列

if (flags & CFLGS_OFF_SLAB)
cachep->slabp_cache = kmem_find_general_cachep(slab_size,0);指向cache_cache中與slab_size對應的變量
cachep->ctor = ctor;構建者
cachep->dtor = dtor;析構
/* Copy name over so we don't have problems with unloaded modules */
strcpy(cachep->name, name);爲了避免模塊被卸載後出現問題,在這裏保存一下名字。

#ifdef CONFIG_SMP 條件編譯,對稱多處理器的情況下,如果普通緩存激活了,那麼激活cpu緩存。
if (g_cpucache_up)
enable_cpucache(cachep);
#endif
/* Need the semaphore to access the chain. */
down(&cache_chain_sem);設置信號量,以便可以訪問緩存鏈
{
struct list_head *p;

list_for_each(p, &cache_chain) {
kmem_cache_t *pc = list_entry(p, kmem_cache_t, next);

/* The name field is constant - no lock needed. */ 名稱出錯,則報錯
if (!strcmp(pc->name, name))
BUG();
}
}

/* There is no reason to lock our new cache before we
* link it in - no one knows about it yet...
*/
在加入鏈之前沒有必要鎖定新的緩存,因爲還沒有任何進程可以知道他
list_add(&cachep->next, &cache_chain);
up(&cache_chain_sem);鎖定
opps:
return cachep;返回新建緩存區的指針
}


#if DEBUG 條件編譯
/*
* This check if the kmem_cache_t pointer is chained in the cache_cache
* list. -arca
*/
檢查kmem_cache_t是否連接到了cache_cache鏈表中
static int is_chained_kmem_cache(kmem_cache_t * cachep)
{
struct list_head *p;
int ret = 0;

/* Find the cache in the chain of caches. */
down(&cache_chain_sem);
list_for_each(p, &cache_chain) {
if (p == &cachep->next) {
ret = 1;
break ;
}
}
up(&cache_chain_sem);

return ret;
}
#else 如果不要求調試,則定義空操作
#define is_chained_kmem_cache(x) 1
#endif

#ifdef CONFIG_SMP 條件編譯,對稱多處理器
/*
* Waits for all CPUs to execute func().
*/
在所有的CPU上都執行某個函數
static void smp_call_function_all_cpus(void (*func) (void *arg), void *arg)
{
local_irq_disable();關中斷
func(arg);執行函數
local_irq_enable();開中斷

if (smp_call_function(func, arg, 1, 1))如果掉用函數有問題,則報錯
BUG();
}
typedef struct ccupdate_struct_s
{
kmem_cache_t *cachep;
cpucache_t *new[NR_CPUS];
} ccupdate_struct_t;

static void do_ccupdate_local(void *info)
{
ccupdate_struct_t *new = (ccupdate_struct_t *)info;
cpucache_t *old = cc_data(new->cachep);

cc_data(new->cachep) = new->new[smp_processor_id()];
new->new[smp_processor_id()] = old;
}本地作cpu緩存更新

static void free_block (kmem_cache_t* cachep, void** objpp, int len) ;

static void drain_cpu_caches(kmem_cache_t *cachep)
{耗盡cpu緩存
ccupdate_struct_t new;
int i;

memset(&new.new,0,sizeof (new.new));

new.cachep = cachep;

down(&cache_chain_sem);
smp_call_function_all_cpus(do_ccupdate_local, (void *)&new);

for (i = 0; i < smp_num_cpus; i++) {
cpucache_t* ccold = new.new[cpu_logical_map(i)];
if (!ccold || (ccold->avail == 0))
continue ;
local_irq_disable();
free_block(cachep, cc_entry(ccold), ccold->avail);
local_irq_enable();
ccold->avail = 0;
}
smp_call_function_all_cpus(do_ccupdate_local, (void *)&new);
up(&cache_chain_sem);
}

#else 不是對稱多處理器的情況定義一個空操作
#define drain_cpu_caches(cachep) do { } while (0)
#endif

/*
* Called with the &cachep->spinlock held, returns number of slabs released
*/
調用的時候要保持自旋鎖,返回釋放的slab數目
static int __kmem_cache_shrink_locked(kmem_cache_t *cachep)
{
slab_t *slabp;
int ret = 0;

/* If the cache is growing, stop shrinking. */ 如果正在增長,那麼不可以收縮
while (!cachep->growing) {
struct list_head *p;

p = cachep->slabs_free.prev;
if (p == &cachep->slabs_free)
break ;

slabp = list_entry(cachep->slabs_free.prev, slab_t, list);
#if DEBUG
if (slabp->inuse)
BUG();
#endif
list_del(&slabp->list);遍歷鏈表,從後向前刪除

spin_unlock_irq(&cachep->spinlock);發解除自旋鎖的中斷
kmem_slab_destroy(cachep, slabp);刪除緩存區
ret++;
spin_lock_irq(&cachep->spinlock);重新加鎖
}
return ret;
}

static int __kmem_cache_shrink(kmem_cache_t *cachep) 收縮操作
{
int ret;

drain_cpu_caches(cachep);

spin_lock_irq(&cachep->spinlock);加自旋鎖
__kmem_cache_shrink_locked(cachep);執行刪除
ret = !list_empty(&cachep->slabs_full) ||
!list_empty(&cachep->slabs_partial);檢查是否爲空鏈表,如果是的話返回1,否則返回0
spin_unlock_irq(&cachep->spinlock);解鎖
return ret;
}

/**
* kmem_cache_shrink - Shrink a cache.
* @cachep: The cache to shrink.
*
* Releases as many slabs as possible for a cache.
* Returns number of pages released.
*/
收縮操作,釋放盡量多的slab,返回釋放的頁面數
int kmem_cache_shrink(kmem_cache_t *cachep)
{
int ret;

if (!cachep || in_interrupt() || !is_chained_kmem_cache(cachep))如果要收縮NULL,正在中斷中,要收縮鏈上的緩存,則報錯
BUG();

spin_lock_irq(&cachep->spinlock);發加鎖中斷
ret = __kmem_cache_shrink_locked(cachep);收縮
spin_unlock_irq(&cachep->spinlock);發解鎖中斷

return ret << cachep->gfporder;
}

/**
* kmem_cache_destroy - delete a cache
* @cachep: the cache to destroy
*
註釋大意:kmem_cache_destory刪除一個緩存,cachep是將要被刪除的緩存
* Remove a kmem_cache_t object from the slab cache.
* Returns 0 on success.
*
返回0則成功
* It is expected this function will be called by a module when it is
* unloaded. This will remove the cache completely, and avoid a duplicate
* cache being allocated each time a module is loaded and unloaded, if the
* module doesn't have persistent in-kernel storage across loads and unloads.
*
在模塊被卸載的時候,這個函數需要被調用。他將完全清除緩存,
以便避免不能永久訪問內核存儲區的模塊每次卸載和掛載的時候會出現重複的緩存被要求分配。

* The cache must be empty before calling this function.
*
在調用前,緩存必須是空的
* The caller must guarantee that noone will allocate memory from the cache
* during the kmem_cache_destroy().
*/
調用者必須確認,在執行清除緩存操作的時候,不會有人申請內存
int kmem_cache_destroy (kmem_cache_t * cachep)
{
if (!cachep || in_interrupt() || cachep->growing)如果要銷燬NULL,正在中斷中,要銷燬正在增長的緩存,則報錯
BUG();

/* Find the cache in the chain of caches. */
down(&cache_chain_sem);允許訪問緩存鏈
/* the chain is never empty, cache_cache is never destroyed */
if (clock_searchp == cachep)如果鏈不空,則無法銷燬
clock_searchp = list_entry(cachep->next.next,
kmem_cache_t, next);
list_del(&cachep->next);
up(&cache_chain_sem);

if (__kmem_cache_shrink(cachep)) {如果無法一直收縮(非空),則報錯
printk(KERN_ERR "kmem_cache_destroy: Can't free all objects %p/n" ,
cachep);
down(&cache_chain_sem);
list_add(&cachep->next,&cache_chain);
up(&cache_chain_sem);
return 1;
}
#ifdef CONFIG_SMP 條件編譯,多處理器情況
{
int i;
for (i = 0; i < NR_CPUS; i++)
kfree(cachep->cpudata[i]);釋放每個cpu相應的緩存的數組
}
#endif
kmem_cache_free(&cache_cache, cachep);釋放內存

return 0;
}

/* Get the memory for a slab management obj. */ 爲slab管理對象申請內存
static inline slab_t * kmem_cache_slabmgmt (kmem_cache_t *cachep,
void *objp, int colour_off, int local_flags)
{
slab_t *slabp;

if (OFF_SLAB(cachep)) {
/* Slab management obj is off-slab. */ 如果是off-slab模式的
slabp = kmem_cache_alloc(cachep->slabp_cache, local_flags);
if (!slabp)出錯處理
return NULL;
} else {on-slab模式
/* FIXME: change to
slabp = objp
* if you enable OPTIMIZE
*/

slabp = objp+colour_off;根據着色空間進行偏移
colour_off += L1_CACHE_ALIGN(cachep->num *
sizeof (kmem_bufctl_t) + sizeof (slab_t));計算着色空間更新
}
slabp->inuse = 0;活動對象爲0
slabp->colouroff = colour_off;着色空間更新
slabp->s_mem = objp+colour_off;slab塊中第一個對象的地址

return slabp;
}

static inline void kmem_cache_init_objs (kmem_cache_t * cachep,
slab_t * slabp, unsigned long ctor_flags)

{
int i;

for (i = 0; i < cachep->num; i++) {對slab塊中每一個對象進行構造
void* objp = slabp->s_mem+cachep->objsize*i;
#if DEBUG

if (cachep->flags & SLAB_RED_ZONE) {
*((unsigned long*)(objp)) = RED_MAGIC1;
*((unsigned long*)(objp + cachep->objsize -
BYTES_PER_WORD)) = RED_MAGIC1;
objp += BYTES_PER_WORD;
}
#endif

/*
* Constructors are not allowed to allocate memory from
* the same cache which they are a constructor for.
* Otherwise, deadlock. They must also be threaded.
*/
構造函數不能從他們自己構造的同一個緩存中申請內存,否則會造成死鎖。
if (cachep->ctor)
cachep->ctor(objp, cachep, ctor_flags);
#if DEBUG
if (cachep->flags & SLAB_RED_ZONE)
objp -= BYTES_PER_WORD;
if (cachep->flags & SLAB_POISON)
/* need to poison the objs */
kmem_poison_obj(cachep, objp);
if (cachep->flags & SLAB_RED_ZONE) {
if (*((unsigned long*)(objp)) != RED_MAGIC1)
BUG();
if (*((unsigned long*)(objp + cachep->objsize -
BYTES_PER_WORD)) != RED_MAGIC1)
BUG();
}
#endif
slab_bufctl(slabp)[i] = i+1;
}
slab_bufctl(slabp)[i-1] = BUFCTL_END;標示結束(0xffffFFFF)
slabp->free = 0;第一個空閒對象(對象數組的下標設置爲0)
}

/*
* Grow (by 1) the number of slabs within a cache. This is called by
* kmem_cache_alloc() when there are no active objs left in a cache.
*/
增加1個slab在cache中。這個函數緩存中沒有活躍對象時,被kmem_cache_alloc()調用
static int kmem_cache_grow (kmem_cache_t * cachep, int flags)
{
slab_t *slabp;
struct page *page;
void *objp;
size_t offset;
unsigned int i, local_flags;
unsigned long ctor_flags;
unsigned long save_flags;

/* Be lazy and only check for valid flags here,
* keeping it out of the critical path in kmem_cache_alloc().
*/
只檢查不合法的標示,重要的完整性檢查在kmem_cache_alloc中完成
if (flags & ~(SLAB_DMA|SLAB_LEVEL_MASK|SLAB_NO_GROW))
BUG();
if (flags & SLAB_NO_GROW)如果不允許增加,則直接返回
return 0;

/*
* The test for missing atomic flag is performed here, rather than
* the more obvious place, simply to reduce the critical path length
* in kmem_cache_alloc(). If a caller is seriously mis-behaving they
* will eventually be caught here (where it matters).
*/
原子操作位一定要設置,以便防止被中斷處理程序調用的時候中斷處理時間過長。
如果GFP_WAIT GFP_HIGH GFP_IO GFP_HIGHMEM沒有被設置則報錯
如果正在中斷中,報錯

if (in_interrupt() && (flags & SLAB_LEVEL_MASK) != SLAB_ATOMIC)
BUG();

ctor_flags = SLAB_CTOR_CONSTRUCTOR;
local_flags = (flags & SLAB_LEVEL_MASK);
if (local_flags == SLAB_ATOMIC)
/*
* Not allowed to sleep. Need to tell a constructor about
* this - it might need to know...
*/
告訴構造者,不能進入睡眠狀態
ctor_flags |= SLAB_CTOR_ATOMIC;

/* About to mess with non-constant members - lock. */ 對變量的處理,存中斷
spin_lock_irqsave(&cachep->spinlock, save_flags);

/* Get colour for the slab, and cal the next value. */ 獲得slab顏色,算下一個着色值
offset = cachep->colour_next;
cachep->colour_next++;
if (cachep->colour_next >= cachep->colour)如果計數器>=最大值
cachep->colour_next = 0;繞了一圈,回到起點
offset *= cachep->colour_off; offset等於當前slab所有偏移量的大小(offset=數目*每一個對象的偏移量)
cachep->dflags |= DFLGS_GROWN;設置動態增長位

cachep->growing++;標誌正在增長
spin_unlock_irqrestore(&cachep->spinlock, save_flags);恢復中斷

/* A series of memory allocations for a new slab.
* Neither the cache-chain semaphore, or cache-lock, are
* held, but the incrementing c_growing prevents this
* cache from being reaped or shrunk.
* Note: The cache could be selected in for reaping in
* kmem_cache_reap(), but when the final test is made the
* growing value will be seen.
*/
註釋大意:爲新的slab做的一系列的內存申請操作。
通過c_growing來避免這塊緩存被回收或者收縮,而不是通過加互斥鎖或者緩存鎖來實現。
需要注意,這塊緩存可以被選進回收的函數kmem_cache_reap(),但是在那裏面會發現這塊緩存正在增長所以不回收。


/* Get mem for the objs. */
if (!(objp = kmem_getpages(cachep, flags)))爲對象申請頁面
goto failed;出錯處理

/* Get slab management. */ 獲得slab管理對象
if (!(slabp = kmem_cache_slabmgmt(cachep, objp, offset, local_flags)))
goto opps1;出錯處理

/* Nasty!!!!!! I hope this is OK. */
i = 1 << cachep->gfporder;i=slab塊的頁面數
page = virt_to_page(objp);page是指向這個對象所在頁面指針
do {
SET_PAGE_CACHE(page, cachep);設置頁所在的cache,即(page->list.next=(struct list_head *)cachep
SET_PAGE_SLAB(page, slabp);設置頁所在的slab,即(page->list.prev=(struct list_head *)slabp
PageSetSlab(page);設置頁在slab中
page++;下一頁
} while (--i);循環,直到在slab中的頁都設置好了

kmem_cache_init_objs(cachep, slabp, ctor_flags);初始化對象

spin_lock_irqsave(&cachep->spinlock, save_flags);存中斷
cachep->growing--;標記緩存不增長

/* Make slab active. */
list_add_tail(&slabp->list, &cachep->slabs_free);將新創建的slab放在該緩存區slab鏈的尾部
STATS_INC_GROWN(cachep);增長狀態
cachep->failures = 0;緩存失效次數等於0

spin_unlock_irqrestore(&cachep->spinlock, save_flags);解鎖,回覆中斷狀態
return 1;正常返回
opps1:
kmem_freepages(cachep, objp);釋放頁面
failed:
spin_lock_irqsave(&cachep->spinlock, save_flags);
cachep->growing--;不增長狀態
spin_unlock_irqrestore(&cachep->spinlock, save_flags);
return 0;
}

/*
* Perform extra freeing checks:
* - detect double free
* - detect bad pointers.
* Called with the cache-lock held.
*/
檢查是否有多餘的free,是否有野指針,調用的時候需要有緩存鎖

#if DEBUG 條件編譯,如果是查錯模式
static int kmem_extra_free_checks (kmem_cache_t * cachep,
slab_t *slabp, void * objp)

{
int i;
unsigned int objnr = (objp-slabp->s_mem)/cachep->objsize;

if (objnr >= cachep->num)
BUG();
if (objp != slabp->s_mem + objnr*cachep->objsize)
BUG();

/* Check slab's freelist to see if this obj is there. */ 檢查對象是否在施放列表裏
for (i = slabp->free; i != BUFCTL_END; i = slab_bufctl(slabp)[i]) {
if (i == objnr)
BUG();
}
return 0;
}
#endif

static inline void kmem_cache_alloc_head(kmem_cache_t *cachep, int flags)
{
if (flags & SLAB_DMA) {
if (!(cachep->gfpflags & GFP_DMA))如果flags包含SLAB_DMA並且緩存要求新頁的flag不包含GFP_DMA則報錯
BUG();
} else {
if (cachep->gfpflags & GFP_DMA)如果flags不包含SLAB_DMA並且緩存要求新頁的flag包含GFP_DMA則報錯
BUG();
}
}

static inline void * kmem_cache_alloc_one_tail (kmem_cache_t *cachep,
slab_t *slabp)
{
void *objp;

STATS_INC_ALLOCED(cachep);
STATS_INC_ACTIVE(cachep);
STATS_SET_HIGH(cachep);統計操作(如果STATS不是1,則相當於空操作)

/* get obj pointer */
slabp->inuse++;slab的活動對象加1
objp = slabp->s_mem + slabp->free*cachep->objsize;
指向該slab中第一個空閒對象地址的指針(=第一個對象地址+空閒數目*對象的大小)

slabp->free=slab_bufctl(slabp)[slabp->free];當前對象的下一個空閒對象在對象數組中的下標(即第幾個)

if (unlikely(slabp->free == BUFCTL_END)) {滿了,把它放到滿了的slab列表中。
list_del(&slabp->list);從當前列表中刪除
list_add(&slabp->list, &cachep->slabs_full);放到滿了的列表裏
}
#if DEBUG 條件編譯,調試模式下
if (cachep->flags & SLAB_POISON)在未初始化的slab區域
if (kmem_check_poison_obj(cachep, objp))如果是未初始化的對象,則報錯
BUG();
if (cachep->flags & SLAB_RED_ZONE) {在紅區
/* Set alloc red-zone, and check old one. */
if (xchg((unsigned long *)objp, RED_MAGIC2) !=如果邊界有問題則報錯
RED_MAGIC1)
BUG();
if (xchg((unsigned long *)(objp+cachep->objsize -
BYTES_PER_WORD), RED_MAGIC2) != RED_MAGIC1)對象大小出錯則報錯
BUG();
objp += BYTES_PER_WORD;
}
#endif
return objp;返回新分配的對象地址
}

/*
* Returns a ptr to an obj in the given cache.
* caller must guarantee synchronization
* #define for the goto optimization 8-)
*/
返回一個指向給定的cache中一個對象的指針
調用者必須確保同步
使用define以便保證對goto的優化

#define kmem_cache_alloc_one(cachep) /
({ /
struct list_head * slabs_partial, * entry; /
slab_t *slabp; /
/
slabs_partial = &(cachep)->slabs_partial; /
entry = slabs_partial->next; /
if (unlikely(entry == slabs_partial)) { /
struct list_head * slabs_free; /
slabs_free = &(cachep)->slabs_free; /
entry = slabs_free->next; /
if (unlikely(entry == slabs_free)) /
goto alloc_new_slab; /
list_del(entry); /
list_add(entry, slabs_partial); /
} /
/
slabp = list_entry(entry, slab_t, list); /
kmem_cache_alloc_one_tail(cachep, slabp); /
})


#ifdef CONFIG_SMP 條件編譯,對稱多處理機的支持,申請一批緩存
void* kmem_cache_alloc_batch(kmem_cache_t* cachep, cpucache_t* cc, int flags)
{
int batchcount = cachep->batchcount;

spin_lock(&cachep->spinlock);
while (batchcount--) {
struct list_head * slabs_partial, * entry;
slab_t *slabp;
/* Get slab alloc is to come from. */
slabs_partial = &(cachep)->slabs_partial;
entry = slabs_partial->next;
if (unlikely(entry == slabs_partial)) {
struct list_head * slabs_free;
slabs_free = &(cachep)->slabs_free;
entry = slabs_free->next;
if (unlikely(entry == slabs_free))
break ;
list_del(entry);
list_add(entry, slabs_partial);
}

slabp = list_entry(entry, slab_t, list);
cc_entry(cc)[cc->avail++] =
kmem_cache_alloc_one_tail(cachep, slabp);
}
spin_unlock(&cachep->spinlock);

if (cc->avail)
return cc_entry(cc)[--cc->avail];
return NULL;
}
#endif

static inline void * __kmem_cache_alloc (kmem_cache_t *cachep, int flags)
{
unsigned long save_flags;
void* objp;

kmem_cache_alloc_head(cachep, flags);檢查申請是否合理
try_again:
local_irq_save(save_flags);保存中斷
#ifdef CONFIG_SMP 條件編譯,對稱多處理器的情況
{
cpucache_t *cc = cc_data(cachep);cpu緩存指針指向送入的緩存

if (cc) {有緩存的情況
if (cc->avail) {如果CPU緩存可用
STATS_INC_ALLOCHIT(cachep);命中計數
objp = cc_entry(cc)[--cc->avail];對象的指針指向cpu緩存,並且cpu緩存可用數目減1
} else {cpu緩存不可用
STATS_INC_ALLOCMISS(cachep);沒有命中計數加1
objp = kmem_cache_alloc_batch(cachep,cc,flags);對象指針指向批量申請緩存返回的地址
if (!objp)
goto alloc_new_slab_nolock;沒有申請下緩存,則不加鎖申請新的slab塊
}
} else {沒有緩存的情況
spin_lock(&cachep->spinlock);加鎖
objp = kmem_cache_alloc_one(cachep);申請新的緩存
spin_unlock(&cachep->spinlock);解鎖
}
}
#else
objp = kmem_cache_alloc_one(cachep);單處理器,直接申請一個新的對象空間
#endif
local_irq_restore(save_flags);恢復中斷
return objp;返回申請的對象指針
alloc_new_slab:
#ifdef CONFIG_SMP 條件編譯,如果是對稱多處理器,則解鎖
spin_unlock(&cachep->spinlock);
alloc_new_slab_nolock:
#endif
local_irq_restore(save_flags);恢復中斷
if (kmem_cache_grow(cachep, flags))如果在給定的cache中成功增加了一個slab
/* Someone may have stolen our objs. Doesn't matter, we'll
* just come back here again.
*/
其他的人可能會直接要走剛剛申請下來的對象空間。如果出現這種情況,那麼還會執行這裏
goto try_again;則重新分配對象的空間
return NULL;如果不能給對象分配空間,並且不能在緩存裏創建新的slab塊,則返回空指針。
}

/*
* Release an obj back to its cache. If the obj has a constructed
* state, it should be in this state _before_ it is released.
* - caller is responsible for the synchronization
*/


#if DEBUG
# define CHECK_NR(pg) /
do { /
if (!VALID_PAGE(pg)) { /
printk(KERN_ERR "kfree: out of range ptr %lxh./n" , /
(unsigned long)objp); /
BUG(); /
} /
} while (0)
# define CHECK_PAGE(page) /
do { /
CHECK_NR(page); /
if (!PageSlab(page)) { /
printk(KERN_ERR "kfree: bad ptr %lxh./n" , /
(unsigned long)objp); /
BUG(); /
} /
} while (0)

#else
# define CHECK_PAGE(pg) do { } while (0)
#endif
之所以使用do{}while(0)這樣的寫法是爲了保證在任何情況宏被調用的時候都能保持相同的語義
參考:http://www.rtems.com/rtems/maillistArchives/rtems-users/2001/august/msg00056.html

static inline void kmem_cache_free_one(kmem_cache_t *cachep, void *objp)
{
slab_t* slabp;

CHECK_PAGE(virt_to_page(objp));檢查指向對象的頁
/* reduces memory footprint
*
if (OPTIMIZE(cachep))
slabp = (void*)((unsigned long)objp&(~(PAGE_SIZE-1)));
else
*/

slabp = GET_PAGE_SLAB(virt_to_page(objp));獲取對象所在的slab塊

#if DEBUG
if (cachep->flags & SLAB_DEBUG_INITIAL)
/* Need to call the slab's constructor so the
* caller can perform a verify of its state (debugging).
* Called without the cache-lock held.
*/

cachep->ctor(objp, cachep, SLAB_CTOR_CONSTRUCTOR|SLAB_CTOR_VERIFY);

if (cachep->flags & SLAB_RED_ZONE) {
objp -= BYTES_PER_WORD;
if (xchg((unsigned long *)objp, RED_MAGIC1) != RED_MAGIC2)
/* Either write before start, or a double free. */
BUG();
if (xchg((unsigned long *)(objp+cachep->objsize -
BYTES_PER_WORD), RED_MAGIC1) != RED_MAGIC2)
/* Either write past end, or a double free. */
BUG();
}
if (cachep->flags & SLAB_POISON)
kmem_poison_obj(cachep, objp);
if (kmem_extra_free_checks(cachep, slabp, objp))
return ;
#endif
{
unsigned int objnr = (objp-slabp->s_mem)/cachep->objsize;

slab_bufctl(slabp)[objnr] = slabp->free;
slabp->free = objnr;上限等於新的對象的數目
}
STATS_DEC_ACTIVE(cachep);減少一個活躍的對象計數

/* fixup slab chains */ 修復slab鏈
{
int inuse = slabp->inuse;
if (unlikely(!--slabp->inuse)) {
/* Was partial or full, now empty. */ 滿的或者半滿的移到空的裏面(只有一個在用的)
list_del(&slabp->list);
list_add(&slabp->list, &cachep->slabs_free);
} else if (unlikely(inuse == cachep->num)) {
/* Was full. */ 滿的變成半滿
list_del(&slabp->list);
list_add(&slabp->list, &cachep->slabs_partial);
}
}
}

#ifdef CONFIG_SMP 條件編譯,對稱多處理器
static inline void __free_block (kmem_cache_t* cachep,
void** objpp, int len)

{
for ( ; len > 0; len--, objpp++)
kmem_cache_free_one(cachep, *objpp);只要還有,就一直釋放,直到空了
}

static void free_block (kmem_cache_t* cachep, void** objpp, int len)
{
spin_lock(&cachep->spinlock);
__free_block(cachep, objpp, len);加鎖,調用__free_block(),解鎖
spin_unlock(&cachep->spinlock);
}
#endif

/*
* __kmem_cache_free
* called with disabled ints
*/

static inline void __kmem_cache_free (kmem_cache_t *cachep, void* objp)
{
#ifdef CONFIG_SMP 條件編譯,對稱多處理器支持,對每個CPU的緩存作釋放操作並且計數(如果定義STATS爲1,否則不計數)。
cpucache_t *cc = cc_data(cachep);

CHECK_PAGE(virt_to_page(objp));
if (cc) {
int batchcount;
if (cc->avail < cc->limit) {
STATS_INC_FREEHIT(cachep);
cc_entry(cc)[cc->avail++] = objp;
return ;
}
STATS_INC_FREEMISS(cachep);
batchcount = cachep->batchcount;
cc->avail -= batchcount;
free_block(cachep,
&cc_entry(cc)[cc->avail],batchcount);
cc_entry(cc)[cc->avail++] = objp;
return ;
} else {
free_block(cachep, &objp, 1);
}
#else
kmem_cache_free_one(cachep, objp); 單CPU,直接釋放
#endif
}

/**
* kmem_cache_alloc - Allocate an object
* @cachep: The cache to allocate from.
* @flags: See kmalloc().
*
* Allocate an object from this cache. The flags are only relevant
* if the cache has no available objects.
*/
申請一個對象空間
void * kmem_cache_alloc (kmem_cache_t *cachep, int flags)
{
return __kmem_cache_alloc(cachep, flags);
}

/**
* kmalloc - allocate memory
申請內核態內存
* @size: how many bytes of memory are required.
需要申請的大小
* @flags: the type of memory to allocate.
申請的類型
*
* kmalloc is the normal method of allocating memory
* in the kernel.
*
* The @flags argument may be one of:
*
* %GFP_USER - Allocate memory on behalf of user. May sleep.
根據用戶來申請,申請的時候有可能會轉入睡眠
*
* %GFP_KERNEL - Allocate normal kernel ram. May sleep.
申請普通內核內存,申請的時候有可能會轉入睡眠
*
* %GFP_ATOMIC - Allocation will not sleep. Use inside interrupt handlers.
申請原子操作過程,申請的過程不會被打斷。在中斷處理中使用。
*
* Additionally, the %GFP_DMA flag may be set to indicate the memory
* must be suitable for DMA. This can mean different things on different
* platforms. For example, on i386, it means that the memory must come
* from the first 16MB.
*/
申請支持DMA方式訪問。在不同的平臺上有不同的含義,比如i386平臺指必須來自前16M內存。
void * kmalloc (size_t size, int flags)
{
cache_sizes_t *csizep = cache_sizes;

for (; csizep->cs_size; csizep++) {
if (size > csizep->cs_size)遍歷緩存鏈表,找到第一個符合要求的slab塊
continue ;
return __kmem_cache_alloc(flags & GFP_DMA ? 根據是否是DMA來決定在哪個鏈上找
csizep->cs_dmacachep : csizep->cs_cachep, flags);
}
return NULL;不成功則返回NULL
}

/**
* kmem_cache_free - Deallocate an object
* @cachep: The cache the allocation was from.
* @objp: The previously allocated object.
*
* Free an object which was previously allocated from this
* cache.
*/
釋放緩存中的一個對象
void kmem_cache_free (kmem_cache_t *cachep, void *objp)
{
unsigned long flags;
#if DEBUG 條件編譯
CHECK_PAGE(virt_to_page(objp));
if (cachep != GET_PAGE_CACHE(virt_to_page(objp)))如果對象不在緩存的頁裏,那麼報錯
BUG();
#endif

local_irq_save(flags);
__kmem_cache_free(cachep, objp);存中斷,執行釋放,恢復中斷
local_irq_restore(flags);
}

/**
* kfree - free previously allocated memory
* @objp: pointer returned by kmalloc.
*
* Don't free memory not originally allocated by kmalloc()
* or you will run into trouble.
*/
釋放以前申請的內存,不要釋放不是kmalloc()分配的內存,否則會出現問題
void kfree (const void *objp)
{
kmem_cache_t *c;
unsigned long flags;

if (!objp)空無法釋放
return ;
local_irq_save(flags);存中斷
CHECK_PAGE(virt_to_page(objp));檢查對象所在的頁
c = GET_PAGE_CACHE(virt_to_page(objp));c指向對象所在的頁
__kmem_cache_free(c, (void*)objp);釋放這個對象
local_irq_restore(flags);恢復中斷
}

unsigned int kmem_cache_size(kmem_cache_t *cachep)
{
#if DEBUG 條件編譯,調試模式下
if (cachep->flags & SLAB_RED_ZONE)
return (cachep->objsize - 2*BYTES_PER_WORD);如果是紅區剪掉邊沿的2個字佔用的空間
#endif
return cachep->objsize;返回緩存中對象空間的大小
}

kmem_cache_t * kmem_find_general_cachep (size_t size, int gfpflags)
{尋找普通緩存
cache_sizes_t *csizep = cache_sizes;

/* This function could be moved to the header file, and
* made inline so consumers can quickly determine what
* cache pointer they require.
*/
註釋建議將本函數移動到某一頭文件中
for ( ; csizep->cs_size; csizep++) {
if (size > csizep->cs_size)
continue ;
break ;
}
return (gfpflags & GFP_DMA) ? csizep->cs_dmacachep : csizep->cs_cachep;判讀是否要求dma,返回相應的
}

#ifdef CONFIG_SMP 條件編譯,對稱多處理器的支持:

/* called with cache_chain_sem acquired. */ 需要有cache_chain_sem允許纔可以被調用
static int kmem_tune_cpucache (kmem_cache_t* cachep, int limit, int batchcount)
{對於cpu緩存的優化
ccupdate_struct_t new;
int i;

/*
* These are admin-provided, so we are more graceful.
*/

if (limit < 0)
return -EINVAL;
if (batchcount < 0)
return -EINVAL;
if (batchcount > limit)
return -EINVAL;
if (limit != 0 && !batchcount)
return -EINVAL;

memset(&new.new,0,sizeof (new.new));對新的cpu緩存更新結構體置0
if (limit) {
for (i = 0; i< smp_num_cpus; i++) {對每個CPU都按照上限大小+結構體大小分配內核態內存
cpucache_t* ccnew;

ccnew = kmalloc(sizeof (void*)*limit+
sizeof (cpucache_t), GFP_KERNEL);
if (!ccnew)
goto oom;出錯處理
ccnew->limit = limit;設置上限
ccnew->avail = 0;0個可用
new.new[cpu_logical_map(i)] = ccnew;新的cpu邏輯映射
}
}
new.cachep = cachep;指向原有緩存區
spin_lock_irq(&cachep->spinlock);加鎖
cachep->batchcount = batchcount;計數
spin_unlock_irq(&cachep->spinlock);解鎖

smp_call_function_all_cpus(do_ccupdate_local, (void *)&new);給每個cpu都做這個操作

for (i = 0; i < smp_num_cpus; i++) {每個釋放原有的cpu緩存
cpucache_t* ccold = new.new[cpu_logical_map(i)];
if (!ccold)
continue ;
local_irq_disable();
free_block(cachep, cc_entry(ccold), ccold->avail);
local_irq_enable();
kfree(ccold);
}
return 0;
oom:
for (i--; i >= 0; i--)
kfree(new.new[cpu_logical_map(i)]);釋放新的cpu邏輯映射
return -ENOMEM;沒有內存錯誤
}

static void enable_cpucache (kmem_cache_t *cachep)
{
int err;
int limit;

/* FIXME: optimize */ 設上限
if (cachep->objsize > PAGE_SIZE)
return ;
if (cachep->objsize > 1024)
limit = 60;
else if (cachep->objsize > 256)
limit = 124;
else
limit = 252;

err = kmem_tune_cpucache(cachep, limit, limit/2);
if (err)
printk(KERN_ERR "enable_cpucache failed for %s, error %d./n" ,
cachep->name, -err);
}激活CPU緩存

static void enable_all_cpucaches (void)
{
struct list_head * p;

down(&cache_chain_sem);允許操作緩存鏈

p = &cache_cache.next;
do {
kmem_cache_t* cachep = list_entry(p, kmem_cache_t, next);

enable_cpucache(cachep);挨個激活
p = cachep->next.next;
} while (p != &cache_cache.next);

up(&cache_chain_sem);禁止操作緩存鏈
}
#endif

/**
* kmem_cache_reap - Reclaim memory from caches.
* @gfp_mask: the type of memory required.
*
* Called from do_try_to_free_pages() and __alloc_pages()
*/
從緩存中回收內存,在do_try_to_free_pages()和alloc_pages()中調用
int kmem_cache_reap (int gfp_mask)
{
slab_t *slabp;
kmem_cache_t *searchp;
kmem_cache_t *best_cachep;
unsigned int best_pages;
unsigned int best_len;
unsigned int scan;
int ret = 0;

if (gfp_mask & __GFP_WAIT)需要的內存中包含了GFP_WAIT位
down(&cache_chain_sem);允許鏈操作
else
if (down_trylock(&cache_chain_sem))非阻塞允許鏈操作
return 0;

scan = REAP_SCANLEN (掃描長度10)
best_len = 0;
best_pages = 0;
best_cachep = NULL;
searchp = clock_searchp;指上次成功回收內存的緩存區結構
do {
unsigned int pages;
struct list_head * p;
unsigned int full_free;

/* It's safe to test this without holding the cache-lock. */
if (searchp->flags & SLAB_NO_REAP)如果不讓回收,則到下一輪
goto next;
spin_lock_irq(&searchp->spinlock);加鎖
if (searchp->growing)如果正在增長,則到下一輪(同時解鎖定)
goto next_unlock;
if (searchp->dflags & DFLGS_GROWN) {
searchp->dflags &= ~DFLGS_GROWN;
goto next_unlock;如果是動態變量,則將dflags清0,然後到下一輪(同時解鎖定)
}
#ifdef CONFIG_SMP 條件編譯。對稱多處理器的情況
{
cpucache_t *cc = cc_data(searchp);
if (cc && cc->avail) { 如果有cpu緩存,並且可用,則釋放
__free_block(searchp, cc_entry(cc), cc->avail);
cc->avail = 0;
}
}
#endif

full_free = 0;
p = searchp->slabs_free.next;指向下一個空slab的對象空間
while (p != &searchp->slabs_free) {
slabp = list_entry(p, slab_t, list);
#if DEBUG 條件編譯,如果空的裏面有正在使用的對象則報錯
if (slabp->inuse)
BUG();
#endif
full_free++;
p = p->next;
}統計共有多少空的(保存在full_free)

/*
* Try to avoid slabs with constructors and/or
* more than one page per slab (as it can be difficult
* to get high orders from gfp()).
*/

pages = full_free * (1<<searchp->gfporder); page等於所有的空閒slab塊佔有的頁面數
if (searchp->ctor)如果有自己的構建函數
pages = (pages*4+1)/5;
if (searchp->gfporder)如果超過了1個頁面
pages = (pages*4+1)/5;
if (pages > best_pages) {如果頁面數比原有最多的頁面數多,則最佳的準備釋放的緩存變爲新的
best_cachep = searchp;
best_len = full_free;
best_pages = pages;
if (pages >= REAP_PERFECT) {如果大於REAP_PREFECT(10),則將clock_searchp指向這些頁,然後直接跳轉到perfect
clock_searchp = list_entry(searchp->next.next,
kmem_cache_t,next);
goto perfect;
}
}
next_unlock:
spin_unlock_irq(&searchp->spinlock);解鎖
next:
searchp = list_entry(searchp->next.next,kmem_cache_t,next);指向下一個鏈
} while (--scan && searchp != clock_searchp);搜索達到限制次數或者已經找了所有的slab

clock_searchp = searchp;

if (!best_cachep)找不到可以回收的
/* couldn't find anything to reap */
goto out;退出

spin_lock_irq(&best_cachep->spinlock);加鎖
perfect:
/* free only 50% of the free slabs */
best_len = (best_len + 1)/2;認爲可以釋放的只有一半
for (scan = 0; scan < best_len; scan++) {
struct list_head *p;

if (best_cachep->growing)如果正在增長,則推出
break ;
p = best_cachep->slabs_free.prev;
if (p == &best_cachep->slabs_free)如果指向了頭,則退出
break ;
slabp = list_entry(p,slab_t,list);
#if DEBUG 條件編譯
if (slabp->inuse) 如果正在使用,則報錯
BUG();
#endif
list_del(&slabp->list);刪除鏈
STATS_INC_REAPED(best_cachep);回收計數

/* Safe to drop the lock. The slab is no longer linked to the
* cache.
*/
由於slab已經不在鏈中,所以可以解鎖
spin_unlock_irq(&best_cachep->spinlock);解鎖
kmem_slab_destroy(best_cachep, slabp);刪除這個slab,將內存還給系統
spin_lock_irq(&best_cachep->spinlock);加鎖
}
spin_unlock_irq(&best_cachep->spinlock);解鎖
ret = scan * (1 << best_cachep->gfporder);返回釋放的個數
out:
up(&cache_chain_sem);禁止對緩存鏈操作
return ret;
}

#ifdef CONFIG_PROC_FS 下面是對/proc/slabinfo支持的操作。
static void *s_start(struct seq_file *m, loff_t *pos)
{
loff_t n = *pos;
struct list_head *p;

down(&cache_chain_sem);
if (!n)
return (void *)1;
p = &cache_cache.next;
while (--n) {
p = p->next;
if (p == &cache_cache.next)
return NULL;
}
return list_entry(p, kmem_cache_t, next);
}

static void *s_next(struct seq_file *m, void *p, loff_t *pos)
{
kmem_cache_t *cachep = p;
++*pos;
if (p == (void *)1)
return &cache_cache;
cachep = list_entry(cachep->next.next, kmem_cache_t, next);
return cachep == &cache_cache ? NULL : cachep;
}

static void s_stop(struct seq_file *m, void *p)
{
up(&cache_chain_sem);
}

static int s_show(struct seq_file *m, void *p)
{
kmem_cache_t *cachep = p;
struct list_head *q;
slab_t *slabp;
unsigned long active_objs;
unsigned long num_objs;
unsigned long active_slabs = 0;
unsigned long num_slabs;
const char *name;

if (p == (void*)1) {
/*
* Output format version, so at least we can change it
* without _too_ many complaints.
*/

seq_puts(m, "slabinfo - version: 1.1"
#if STATS
" (statistics)"
#endif
#ifdef CONFIG_SMP
" (SMP)"
#endif
"/n" );
return 0;
}

spin_lock_irq(&cachep->spinlock);
active_objs = 0;
num_slabs = 0;
list_for_each(q,&cachep->slabs_full) {
slabp = list_entry(q, slab_t, list);
if (slabp->inuse != cachep->num)
BUG();
active_objs += cachep->num;
active_slabs++;
}
list_for_each(q,&cachep->slabs_partial) {
slabp = list_entry(q, slab_t, list);
if (slabp->inuse == cachep->num || !slabp->inuse)
BUG();
active_objs += slabp->inuse;
active_slabs++;
}
list_for_each(q,&cachep->slabs_free) {
slabp = list_entry(q, slab_t, list);
if (slabp->inuse)
BUG();
num_slabs++;
}
num_slabs+=active_slabs;
num_objs = num_slabs*cachep->num;

name = cachep->name;
{
char tmp;
mm_segment_t old_fs;
old_fs = get_fs();
set_fs(KERNEL_DS);
if (__get_user(tmp, name))
name = "broken" ;
set_fs(old_fs);
}

seq_printf(m, "%-17s %6lu %6lu %6u %4lu %4lu %4u" ,
name, active_objs, num_objs, cachep->objsize,
active_slabs, num_slabs, (1<<cachep->gfporder));

#if STATS
{
unsigned long errors = cachep->errors;
unsigned long high = cachep->high_mark;
unsigned long grown = cachep->grown;
unsigned long reaped = cachep->reaped;
unsigned long allocs = cachep->num_allocations;

seq_printf(m, " : %6lu %7lu %5lu %4lu %4lu" ,
high, allocs, grown, reaped, errors);
}
#endif
#ifdef CONFIG_SMP
{
cpucache_t *cc = cc_data(cachep);
unsigned int batchcount = cachep->batchcount;
unsigned int limit;

if (cc)
limit = cc->limit;
else
limit = 0;
seq_printf(m, " : %4u %4u" ,
limit, batchcount);
}
#endif
#if STATS && defined(CONFIG_SMP)
{
unsigned long allochit = atomic_read(&cachep->allochit);
unsigned long allocmiss = atomic_read(&cachep->allocmiss);
unsigned long freehit = atomic_read(&cachep->freehit);
unsigned long freemiss = atomic_read(&cachep->freemiss);
seq_printf(m, " : %6lu %6lu %6lu %6lu" ,
allochit, allocmiss, freehit, freemiss);
}
#endif
spin_unlock_irq(&cachep->spinlock);
seq_putc(m, '/n');
return 0;
}

/**
* slabinfo_op - iterator that generates /proc/slabinfo
*
* Output layout:
* cache-name
* num-active-objs
* total-objs
* object size
* num-active-slabs
* total-slabs
* num-pages-per-slab
* + further values on SMP and with statistics enabled
*/


struct seq_operations slabinfo_op = {
start: s_start,
next: s_next,
stop: s_stop,
show: s_show
};

#define MAX_SLABINFO_WRITE 128
/**
* slabinfo_write - SMP tuning for the slab allocator
* @file: unused
* @buffer: user buffer
* @count: data len
* @data: unused
*/

ssize_t slabinfo_write(struct file *file, const char *buffer,
size_t count, loff_t *ppos)

{
#ifdef CONFIG_SMP
char kbuf[MAX_SLABINFO_WRITE+1], *tmp;
int limit, batchcount, res;
struct list_head *p;

if (count > MAX_SLABINFO_WRITE)
return -EINVAL;
if (copy_from_user(&kbuf, buffer, count))
return -EFAULT;
kbuf[MAX_SLABINFO_WRITE] = '/0';

tmp = strchr(kbuf, ' ');
if (!tmp)
return -EINVAL;
*tmp = '/0';
tmp++;
limit = simple_strtol(tmp, &tmp, 10);
while (*tmp == ' ')
tmp++;
batchcount = simple_strtol(tmp, &tmp, 10);

/* Find the cache in the chain of caches. */
down(&cache_chain_sem);
res = -EINVAL;
list_for_each(p,&cache_chain) {
kmem_cache_t *cachep = list_entry(p, kmem_cache_t, next);

if (!strcmp(cachep->name, kbuf)) {
res = kmem_tune_cpucache(cachep, limit, batchcount);
break ;
}
}
up(&cache_chain_sem);
if (res >= 0)
res = count;
return res;
#else
return -EINVAL;
#endif
}
#endif
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章