轉自：https://www.cnblogs.com/pengdonglin137/p/16342898.html

參考

作者

[email protected]

內核版本

linux-5.14

實現分析

Kfence (Kernel Electric Fence) 是 Linux 內核引入的一種低開銷的內存錯誤檢測機制，因爲是低開銷的所以它可以在運行的生產環境中開啓，同樣由於是低開銷所以它的功能相比較 KASAN 會偏弱。

Kfence是一種基於採樣的低開銷的內存安全錯誤檢測技術。可以檢測UAF、非法釋放、OOB三種內存錯誤，目前支持x86和ARM64，它在slab和slub內存分配器中添加了hook函數。
Kfence的設計理念：如果有足夠長的總的運行時間，kfence可以在非生產環境的測試程序無法充分測試的代碼路徑上檢測到bug。可以通過大範圍部署kfence來快速達到足夠長的總運行時間。
Kfence管理的每個object都分別存放在一個單獨的內存頁的左邊或者右邊，跟這個內存頁緊鄰的左右兩側的內存頁被成爲保護頁，這些保護頁的內存屬性被設置成保護狀態（PTE頁表項的P位），如果訪問這些保護頁，就會導致缺頁異常，而kfence在缺頁異常中會解析和報告發生的錯誤。
從kfence內存池中分配object是基於一個採樣間隔，這個間隔可以通過內核啓動參數kfence.sample_interval來修改。當經過了一個採樣間隔的時間，下一次從slab或slub中分配的object將會來自kfence內存池。然後需要再經過一個採樣間隔，slab或者slub才能從kfence內存池中分配一個object。
由於採用了static key機制，可以省去判斷邏輯，所以不管是否開啓kfence，從slub或者slab的的快速路徑分配內存時的性能都不會受到影響。
Kfence內存池的大小是固定的，如果Kfence內存池被用光了，那麼就不能再從kfence內存池分配內存了。默認的內核配置是kfence內存池大小爲2MB，可以分配到255的object，每個object對應一個內存頁。

初始化

kfence內存池框圖：

其中data區域是用來分配的，fence區域是用來檢測內存越界的。metadata數組的元素跟data區域一一對應，用於描述data區域的信息。

	start_kernel
	-> mm_init
	-> kfence_alloc_pool
	// 將memblock分配器中的空閒頁面釋放給夥伴分配器，之前被memblock分配出去還沒有釋放的內存也就不會出現在夥伴系統裏，雖然如此，這部分內存還是有
	// 與之對應的page結構體
	-> mem_init
	-> kfence_init

kfence_alloc_pool [mm\kfence\core.c]

	void __init kfence_alloc_pool(void)
	{
	// 如果採樣間隔爲0的話，不初始化kfence。需要通過內核配置選項CONFIG_KFENCE_SAMPLE_INTERVAL或者內核啓動參數kfence.sample_interval來設置
	if (!kfence_sample_interval)
	return;

	// 申請kfence pool內存池，大小爲：((CONFIG_KFENCE_NUM_OBJECTS + 1) * 2 * PAGE_SIZE)，對齊到PAGE_SIZE
	// CONFIG_KFENCE_NUM_OBJECTS最大爲65535，最小爲1.
	__kfence_pool = memblock_alloc(KFENCE_POOL_SIZE, PAGE_SIZE);
	}

此時夥伴分配器不能使用，所以給kfence的內存在夥伴系統之外，不屬於夥伴系統管理，所以也就不用擔心被夥伴系統分配出去。

kfence_init

	void __init kfence_init(void)
	{
	/* 如果採樣間隔爲0，那麼會關閉kfence */
	if (!kfence_sample_interval)
	return;

	// 初始化kfence內存池
	kfence_init_pool();

	// 表示kfence可以工作了
	WRITE_ONCE(kfence_enabled, true);
	/*
	用於週期性開啓kfence內存池的任務，這裏delay時間爲0，表示立刻開啓，見下文toggle_allocation_gate
	*/
	queue_delayed_work(system_unbound_wq, &kfence_timer, 0);

	pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE,
	CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool,
	(void *)(__kfence_pool + KFENCE_POOL_SIZE));
	}

kfence_init_pool [kfence_init -> kfence_init_pool]

	static bool __init kfence_init_pool(void)
	{
	unsigned long addr = (unsigned long)__kfence_pool;
	struct page *pages;
	int i;

	/* 對於x86架構，會檢查__kfence_pool是否映射到物理地址了 */
	arch_kfence_init_pool();

	/* 獲取將kfence內存池首地址對應的page結構體 */
	pages = virt_to_page(addr);

	for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
	if (!i \|\| (i % 2)) // 跳過第0頁和所有的奇數頁
	continue;
	/* 1. 設置所有的偶數頁的struct page結構體的slab標誌，因爲在調用kmem_cache_free時會檢查
	虛擬地址對應的page結構體是否設置了slab標誌，如果沒有設置，那麼無法釋放
	2. 如果用kfree釋放，這個標誌可以保證調用slab_free -> __slab_free -> kfence_free
	*/
	__SetPageSlab(&pages[i]);
	}

	// 將前兩頁在頁表中的PTE項的Present標誌去掉，這樣當cpu訪問前兩頁時，就會觸發缺頁異常
	for (i = 0; i < 2; i++) {
	kfence_protect(addr);
	addr += PAGE_SIZE;
	}

	// kfence_metadata是一個數據類型爲struct kfence_metadata的數組，元素個數是CONFIG_KFENCE_NUM_OBJECTS
	// 從這裏可以看出，每一個kfence_metadata數組成員管理一個object
	for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) {
	struct kfence_metadata *meta = &kfence_metadata[i];

	/* Initialize metadata. */
	INIT_LIST_HEAD(&meta->list);
	raw_spin_lock_init(&meta->lock);
	meta->state = KFENCE_OBJECT_UNUSED; // object的初始狀態爲UNUSED
	meta->addr = addr; /* object所在的4KB內存的起始地址 */
	list_add_tail(&meta->list, &kfence_freelist); // 添加到全局鏈表中

	// 將object所在的4KB內存的下一個4KB的頁表映射信息置爲無效，用來檢測內存越界訪問
	kfence_protect(addr + PAGE_SIZE);

	addr += 2 * PAGE_SIZE;
	}

	// 之前在調用memblock_alloc時在kmemleak中有記錄，這裏先刪除這部分記錄，防止後面調用kfence_alloc出現衝突
	kmemleak_free(__kfence_pool);

	return true;
	}

摺疊

週期性開啓kfence內存池

在kfence_init中還添加了一個kfence_timer的延遲任務，用於週期性開啓kfence內存分配，實現如下：

toggle_allocation_gate

	/*
	* Set up delayed work, which will enable and disable the static key. We need to
	* use a work queue (rather than a simple timer), since enabling and disabling a
	* static key cannot be done from an interrupt.
	*
	* Note: Toggling a static branch currently causes IPIs, and here we'll end up
	* with a total of 2 IPIs to all CPUs. If this ends up a problem in future (with
	* more aggressive sampling intervals), we could get away with a variant that
	* avoids IPIs, at the cost of not immediately capturing allocations if the
	* instructions remain cached.
	*/
	static struct delayed_work kfence_timer;
	static void toggle_allocation_gate(struct work_struct *work)
	{
	if (!READ_ONCE(kfence_enabled))
	return;

	// 週期性將kfence_allocation_gate設置爲0，這個作爲一個kfence內存池開啓的標誌位，0表示開啓，非0表示關閉，
	// 保證每隔一定時間最多隻允許從kfence內存池分配一次內存
	atomic_set(&kfence_allocation_gate, 0);
	// 使用static key來優化性能，因爲直接通過讀取kfence_allocation_gate的值是否爲0來判斷的性能開銷比較大
	#ifdef CONFIG_KFENCE_STATIC_KEYS
	/* 打開static key，並且等待從kfence內存池分配 */
	static_branch_enable(&kfence_allocation_key);

	if (sysctl_hung_task_timeout_secs) { // 內核發出hang task警告的時間最短時間長度，一般爲120秒
	/*
	* 如果內存分配沒有那麼頻繁，就有可能出現等待時間過長的問題，這裏將等待超時時間設置爲hang task警告時間的一半，
	這樣內核就不會因爲處於D狀態過長導致內核出現警告。

	被喚醒的原因：
	1. 當有人從kfence分配了內存，會將kfence_allocation_gate設置爲1，然後喚醒阻塞在allocation_wait裏的任務
	2. 超時
	*/
	wait_event_idle_timeout(allocation_wait, atomic_read(&kfence_allocation_gate),
	sysctl_hung_task_timeout_secs * HZ / 2);
	} else {
	/* 如果hangtask檢測時間爲0，表示時間無限長，那麼可以放心地等待下去，直到有人從kfence分配了內存，會將kfence_allocation_gate
	設置爲1，然後喚醒阻塞在allocation_wait裏的任務
	*/
	wait_event_idle(allocation_wait, atomic_read(&kfence_allocation_gate));
	}

	/* 將static keys關閉，保證不會進入__kfence_alloc */
	static_branch_disable(&kfence_allocation_key);
	#endif
	// 等待kfence_sample_interval，單位時毫秒，然後再此開啓kfence內存池
	queue_delayed_work(system_unbound_wq, &kfence_timer,
	msecs_to_jiffies(kfence_sample_interval));
	}
	static DECLARE_DELAYED_WORK(kfence_timer, toggle_allocation_gate);

摺疊

分配內存

框圖：

入口1：

	kmalloc
	-> kmem_cache_alloc_trace
	-> slab_alloc
	-> return
	-> __kmalloc
	-> slab_alloc
	-> return

入口2

	kmem_cache_alloc
	-> slab_alloc

上面兩個路徑最後都會調用到slab_alloc：

	slab_alloc
	-> slab_alloc_node
	-> kfence_alloc
	-> 如果kfence_alloc返回NULL的話，走常規的slub分配

kfence_alloc

	static __always_inline void kfence_alloc(struct kmem_cache s, size_t size, gfp_t flags)
	{
	#ifdef CONFIG_KFENCE_STATIC_KEYS
	/* 如果內核配置了kfence_static_keys，那麼走這個優化分支 */
	if (static_branch_unlikely(&kfence_allocation_key))
	#else
	/* 常規的判斷分支，性能比static key分支差 */
	if (unlikely(!atomic_read(&kfence_allocation_gate)))
	#endif
	return __kfence_alloc(s, size, flags);
	return NULL;
	}

__kfence_alloc

	void __kfence_alloc(struct kmem_cache s, size_t size, gfp_t flags)
	{
	/*
	目前kfence內存池僅支持大小不超過一頁的內存大小object分配
	*/
	if (size > PAGE_SIZE)
	return NULL;

	/*
	* 需要從DMA、DMA32、HIGHMEM分配內存的話，kfence內存池不支持。因爲kfence內存池的內存
	屬性不一定滿足需求，比如dma一般要求內存是不帶cache的，而kfence內存池中的內存不能保證這一點。
	*/
	if ((flags & GFP_ZONEMASK) \|\|
	(s->flags & (SLAB_CACHE_DMA \| SLAB_CACHE_DMA32)))
	return NULL;

	/*
	下面判斷可以保證只有一個分配者可以進入，進入後kfence內存池就關閉後，在下次開啓之前，所有的分配者
	都無法進入，只能返回NULL，從而走常規的slub分配器。
	*/
	if (atomic_read(&kfence_allocation_gate) \|\| atomic_inc_return(&kfence_allocation_gate) > 1)
	return NULL;
	#ifdef CONFIG_KFENCE_STATIC_KEYS
	/*
	* 檢查allocation_wait中是否有進程在阻塞，有的話，會起一個work來喚醒被阻塞的進程
	*/
	if (waitqueue_active(&allocation_wait)) {
	/*
	* Calling wake_up() here may deadlock when allocations happen
	* from within timer code. Use an irq_work to defer it.
	*/
	irq_work_queue(&wake_up_kfence_timer_work);
	}
	#endif
	// 判斷kfence功能是否使能了
	if (!READ_ONCE(kfence_enabled))
	return NULL;

	// 從kfence內存池中分配object
	return kfence_guarded_alloc(s, size, flags);
	}

kfence_guarded_alloc [kfence_alloc -> __kfence_alloc -> kfence_guarded_alloc]

	static void kfence_guarded_alloc(struct kmem_cache cache, size_t size, gfp_t gfp)
	{
	struct kfence_metadata *meta = NULL;
	unsigned long flags;
	struct page *page;
	void *addr;

	// 檢查kfence內存池是否還有空閒的內存頁
	if (!list_empty(&kfence_freelist)) {
	// 獲取空閒內存頁對應的kfence_metadata數據結構
	meta = list_entry(kfence_freelist.next, struct kfence_metadata, list);
	list_del_init(&meta->list);
	}

	// 如果爲空，表示kfence內存池已經分配完了。需要用常規的slub分配器分配。
	if (!meta)
	return NULL;

	// 獲取meta對應的空閒內存頁的虛擬首地址
	meta->addr = metadata_to_pageaddr(meta);
	/* 如果是空閒的，那麼需要恢復這個內存頁在頁表的PTE的present標誌，保證cpu可以正常訪問這頁內存而不發生缺頁異常

	這裏爲什麼要判斷freed呢？因爲在初始函數kfence_init_pool中設置的初始狀態是KFENCE_OBJECT_UNUSED，表示還
	這頁內存還沒有使用過，而且初始化時也沒有調用kfence_protect來保護該頁，所以對於UNUSED的頁就沒有必要kfence_unprotect

	只有當這頁被分配出去，然後釋放的時候會將該頁設置爲freed，並且調用kfence_protect來保護該頁，用於檢查use after free。
	所以對於free的內存頁在下次分配的時候當然要進行kfence_unprotect處理。
	*/
	if (meta->state == KFENCE_OBJECT_FREED)
	kfence_unprotect(meta->addr);

	/*
	* Note: for allocations made before RNG initialization, will always
	* return zero. We still benefit from enabling KFENCE as early as
	* possible, even when the RNG is not yet available, as this will allow
	* KFENCE to detect bugs due to earlier allocations. The only downside
	* is that the out-of-bounds accesses detected are deterministic for
	* such allocations.
	如果隨機數發生器初始化之前分配，那麼object的地址是從這頁內存的起始位置開始。當隨機數
	發生器可以工作了，那麼將object放到這頁內存的最右側
	*/
	if (prandom_u32_max(2)) {
	/* Allocate on the "right" side, re-calculate address. */
	meta->addr += PAGE_SIZE - size;
	meta->addr = ALIGN_DOWN(meta->addr, cache->align);
	}

	// object起始地址
	addr = (void *)meta->addr;

	/*
	這個函數做了幾件事：
	1. 將當前進程的調用棧記錄到meta的alloc_track中，即內存分配棧
	2. 將當前進程的pid記錄到meta的pid中
	3. 設置meta的狀態爲KFENCE_OBJECT_ALLOCATED，表示meta描述的一頁內存已經被分配
	*/
	metadata_update_state(meta, KFENCE_OBJECT_ALLOCATED);
	/* 將當前kmem_cache記錄到meta中 */
	WRITE_ONCE(meta->cache, cache);
	/* 記錄object的大小 */
	meta->size = size;
	/* 將這頁內存中除了給object用的size大小的空間之外的填充成一個跟地址相關的pattern數
	目的是在釋放時檢查有沒有發生內存越界訪問
	*/
	for_each_canary(meta, set_canary_byte);

	/* 獲取這頁內存對應的struct page結構 */
	page = virt_to_page(meta->addr);
	/* 在page中記錄對應的kmem_cache，將來釋放的時候要用到 */
	page->slab_cache = cache;
	/* 由於kfence內存池中一個頁只放了一個object，所以這裏將objects設置爲1 */
	if (IS_ENABLED(CONFIG_SLUB))
	page->objects = 1;
	// 如果是slab分配器，s_smem會記錄第一個object的地址
	if (IS_ENABLED(CONFIG_SLAB))
	page->s_mem = addr;

	/* Memory initialization. */

	/*
	* We check slab_want_init_on_alloc() ourselves, rather than letting
	* SL*B do the initialization, as otherwise we might overwrite KFENCE's
	* redzone.
	*/
	if (unlikely(slab_want_init_on_alloc(gfp, cache))) // 如果設置了__GFP_ZERO標誌，返回true
	memzero_explicit(addr, size); // 將object使用的那部分區域清零
	if (cache->ctor) // 如果有構造函數
	cache->ctor(addr);

	/* KFENCE_COUNTER_ALLOCATED 表示kfence內存池中有多少object被分配出去了，在釋放的時候會減一 */
	atomic_long_inc(&counters[KFENCE_COUNTER_ALLOCATED]);
	/* KFENCE_COUNTER_ALLOCS 表示發生從kfence內存池分配內存的次數，單調遞增 */
	atomic_long_inc(&counters[KFENCE_COUNTER_ALLOCS]);

	return addr;
	}

摺疊

釋放內存

路徑1：

	kfree
	-> slab_free
	-> slab_free_hook
	-> do_slab_free
	-> __slab_free
	-> kfence_free

路徑2

	kmem_cache_free
	-> slab_free

釋放內存時，最終會調用到kfence_free

kfence_free

	static __always_inline __must_check bool kfence_free(void *addr)
	{
	// 檢查要釋放的虛擬地址是否在kfence內存池的虛擬地址範圍內
	if (!is_kfence_address(addr))
	return false;
	__kfence_free(addr);
	return true;
	}

__kfence_free

	void __kfence_free(void *addr)
	{
	/*
	根據object的地址可以獲取對應的meta。根據addr跟kfence內存池起始地址的偏移可以計算出一個索引，然後從kfence_metadata數組
	中就可以得到索引對應的meta
	*/
	struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr);

	/*
	* 如果meta對應的kmem_cache有SLAB_TYPESAFE_BY_RCU，那麼不能立刻釋放，需要異步處理，當過了一個寬限期再釋放
	在rcu_guarded_free會直接調用kfence_guarded_free
	*/
	if (unlikely(meta->cache && (meta->cache->flags & SLAB_TYPESAFE_BY_RCU)))
	call_rcu(&meta->rcu_head, rcu_guarded_free);
	else
	kfence_guarded_free(addr, meta, false);
	}

kfence_guarded_free [kfence_free -> __kfence_free -> kfence_guarded_free]

	static void kfence_guarded_free(void addr, struct kfence_metadata meta, bool zombie)
	{
	struct kcsan_scoped_access assert_page_exclusive;
	unsigned long flags;

	// 如果meta的狀態不是已分配的話或者地址不匹配，或者是釋放了兩次，或者是釋放時傳的地址跟申請時獲得的不一樣
	if (meta->state != KFENCE_OBJECT_ALLOCATED \|\| meta->addr != (unsigned long)addr) {
	/* Invalid or double-free, bail out. */
	atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]); // 將kfence檢測到的內存問題的個數加1
	kfence_report_error((unsigned long)addr, false, NULL, meta,
	KFENCE_ERROR_INVALID_FREE);
	raw_spin_unlock_irqrestore(&meta->lock, flags);
	return;
	}

	/* 如果在缺頁異常中檢測到OOB內存錯誤，那麼unprotected_page會記錄發生異常的地址 */
	if (meta->unprotected_page) {
	// 將發生OOB的地址所在的page頁清零
	memzero_explicit((void *)ALIGN_DOWN(meta->unprotected_page, PAGE_SIZE), PAGE_SIZE);
	// 將發生OOB的地址所在的內存頁設置爲保護，因爲缺頁異常的最後會取消保護髮生異常的地址所在的頁
	kfence_protect(meta->unprotected_page);
	meta->unprotected_page = 0;
	}

	/* 檢查object所在的內存頁的空閒區域的pattern值是否發生了改變，以此來判斷是否發生了OOB
	for_eatch_canary首先檢查object左側的pattern，將第一個pattern不一致的信息輸出。然後檢查object右側
	的pattern，也只輸出第一個pattern不一致的信息輸出
	*/
	for_each_canary(meta, check_canary_byte);

	/*
	* Clear memory if init-on-free is set. While we protect the page, the
	* data is still there, and after a use-after-free is detected, we
	* unprotect the page, so the data is still accessible.
	*/
	if (!zombie && unlikely(slab_want_init_on_free(meta->cache)))
	memzero_explicit(addr, meta->size);

	/* 這個函數做如下幾件事：
	1. 將當前進程的調用棧存放到meta的free_track中，即內存釋放棧
	2. 記錄當前進程的pid到meta的pid成員中
	3. 設置meta的狀態爲KFENCE_OBJECT_FREED，表示對應的內存頁空閒了
	*/
	metadata_update_state(meta, KFENCE_OBJECT_FREED);

	/* 將這頁內存保護起來，用來檢測use after free類型的內存訪問錯誤 */
	kfence_protect((unsigned long)addr);

	if (!zombie) {
	/* 將meta重新放回空閒鏈表 */
	list_add_tail(&meta->list, &kfence_freelist);

	// 將KFENCE_COUNTER_ALLOCATED的計數減1，表示當前有多少kfence內存池裏有多少object被分配出去了
	atomic_long_dec(&counters[KFENCE_COUNTER_ALLOCATED]);
	// 將KFENCE_COUNTER_FREES的計數加1，表示kfence內存池發生了多少次object釋放，單調遞增
	atomic_long_inc(&counters[KFENCE_COUNTER_FREES]);
	} else {
	/* 當kmem_cache被銷燬時，所有尚未釋放的object個數會記錄到KFENCE_COUNTER_ZOMBIES中
	處於zombie的object也時free的，但是不能被分配了
	*/
	atomic_long_inc(&counters[KFENCE_COUNTER_ZOMBIES]);
	}
	}

摺疊

檢查pattern區

for_each_canary [kfence_free -> __kfence_free -> kfence_guarded_free -> for_each_canary]

	/* __always_inline this to ensure we won't do an indirect call to fn. */
	static __always_inline void for_each_canary(const struct kfence_metadata meta, bool (fn)(u8 *))
	{
	const unsigned long pageaddr = ALIGN_DOWN(meta->addr, PAGE_SIZE);
	unsigned long addr;

	/* 檢查object所在的內存頁的左側的pattern區域 */
	for (addr = pageaddr; addr < meta->addr; addr++) {
	if (!fn((u8 *)addr)) // 如果不匹配，會輸出kfence錯誤log，並返回false
	break;
	}

	/* 檢查object所在的內存頁的右側的pattern區域 */
	for (addr = meta->addr + meta->size; addr < pageaddr + PAGE_SIZE; addr++) {
	if (!fn((u8 *)addr)) // 如果不匹配，會輸出kfence錯誤log，並返回false
	break;
	}
	}

check_canary_byte [kfence_free -> __kfence_free -> kfence_guarded_free -> for_each_canary -> check_canary_byte ]

	/* Check canary byte at @addr. */
	static inline bool check_canary_byte(u8 *addr)
	{
	if (likely(*addr == KFENCE_CANARY_PATTERN(addr)))
	return true;

	// 如果內存頁中的空閒區域的值跟之前的pattern值不同，表示在該頁內部發生了越界，這種越界不會觸發缺頁
	// KFENCE_COUNTER_BUGS的計數加1，表示kfence檢測到的內存問題的個數
	atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);
	kfence_report_error((unsigned long)addr, false, NULL, addr_to_metadata((unsigned long)addr),
	KFENCE_ERROR_CORRUPTION);
	return false;
	}

kmem_cache銷燬

	kmem_cache_destroy
	-> shutdown_cache
	-> kfence_shutdown_cache

kfence_shutdown_cache

	void kfence_shutdown_cache(struct kmem_cache *s)
	{
	unsigned long flags;
	struct kfence_metadata *meta;
	int i;

	for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) {
	bool in_use;

	meta = &kfence_metadata[i];

	/* 跳過不跟指定kmem_cache匹配的meta以及狀態不是已分配的meta
	*/
	if (READ_ONCE(meta->cache) != s \|\|
	READ_ONCE(meta->state) != KFENCE_OBJECT_ALLOCATED)
	continue;

	raw_spin_lock_irqsave(&meta->lock, flags);
	in_use = meta->cache == s && meta->state == KFENCE_OBJECT_ALLOCATED;
	raw_spin_unlock_irqrestore(&meta->lock, flags);

	if (in_use) {
	/*
	* This cache still has allocations, and we should not
	* release them back into the freelist so they can still
	* safely be used and retain the kernel's default
	* behaviour of keeping the allocations alive (leak the
	* cache); however, they effectively become "zombie
	* allocations" as the KFENCE objects are the only ones
	* still in use and the owning cache is being destroyed.
	*
	* We mark them freed, so that any subsequent use shows
	* more useful error messages that will include stack
	* traces of the user of the object, the original
	* allocation, and caller to shutdown_cache().
	*/
	kfence_guarded_free((void )meta->addr, meta, /zombie=*/true);
	// 將zombie設置爲true，被釋放的meta並不會加入到kfence_freelist中，也就不會分分配出去
	// 處於zombie的object也屬於free，但是不能再被分配
	}
	}

	for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) {
	meta = &kfence_metadata[i];

	/* See above. */
	if (READ_ONCE(meta->cache) != s \|\| READ_ONCE(meta->state) != KFENCE_OBJECT_FREED)
	continue;

	raw_spin_lock_irqsave(&meta->lock, flags);
	// 將meta的cache字段清除，這樣通過/sys/kernel/debug/kfence/objects知道哪些object是zombie的
	if (meta->cache == s && meta->state == KFENCE_OBJECT_FREED)
	meta->cache = NULL;
	raw_spin_unlock_irqrestore(&meta->lock, flags);
	}
	}

摺疊

缺頁異常

當發生內存越界訪問導致被protect的頁被訪問，此時會發生缺頁。
當發生了use after free，即object被釋放後在沒有申請的情況下，又訪問這個object，也會發生缺頁。因爲在釋放時，空閒object所在的內存頁已經被保護了。

路徑：

	handle_page_fault
	-> do_kern_addr_fault
	-> bad_area_nosemaphore
	-> __bad_area_nosemaphore
	-> kernelmode_fixup_or_oops
	-> page_fault_oops
	-> kfence_handle_page_fault

kfence_handle_page_fault

	/*
	addr是導致缺頁的地址
	is_write表示是否是寫訪問
	regs記錄缺頁發生時的cpu寄存器上下文
	*/
	bool kfence_handle_page_fault(unsigned long addr, bool is_write, struct pt_regs *regs)
	{
	/*
	根據缺頁發生的地址計算在kfence內存池中的索引
	*/
	const int page_index = (addr - (unsigned long)__kfence_pool) / PAGE_SIZE;
	struct kfence_metadata *to_report = NULL;
	enum kfence_error_type error_type;
	unsigned long flags;

	// 判斷是否爲kfence內存池的地址範圍
	if (!is_kfence_address((void *)addr))
	return false;

	// 檢查kfence是否被關閉了，可以向/sys/module/kfence/parameters/sample_interval寫入0關閉kfence
	if (!READ_ONCE(kfence_enabled)) /* If disabled at runtime ... */
	return kfence_unprotect(addr); /* ... unprotect and proceed. */

	// KFENCE_COUNTER_BUGS計數加1，表示檢測到的內存錯誤的個數
	atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);

	if (page_index % 2) {
	/*
	如果是在kfence內存池中奇數頁上發生的缺頁，表示發生了內存越界。因爲在初始化時，已經將奇數頁保護起來了
	*/

	/* This is a redzone, report a buffer overflow. */
	struct kfence_metadata *meta;
	int distance = 0;

	// 獲取缺頁地址左邊的一頁對應的meta，因爲奇數頁不用來存放object。
	meta = addr_to_metadata(addr - PAGE_SIZE);
	if (meta && READ_ONCE(meta->state) == KFENCE_OBJECT_ALLOCATED) { // 檢查左邊的頁是否分配了
	to_report = meta;
	/* Data race ok; distance calculation approximate.
	計算髮生缺頁的地址跟左邊被分配出去的object的結尾地址之間的距離
	*/
	distance = addr - data_race(meta->addr + meta->size);
	}

	// 檢查缺頁地址右邊的頁對應的meta
	meta = addr_to_metadata(addr + PAGE_SIZE);
	if (meta && READ_ONCE(meta->state) == KFENCE_OBJECT_ALLOCATED) { // 檢查右邊的頁是否分配了
	/* Data race ok; distance calculation approximate.
	如果to_report是空，表示左邊的頁沒有分配，那麼當前右邊的頁就是發生越界的object所在的頁
	如果左邊的頁也分配了，需要比較右邊的的頁中object的起始地址距離缺頁發生的地址之間的距離跟左邊頁計算來的
	的距離，距離小的一邊就是發生越界的object所在的頁
	*/
	if (!to_report \|\| distance > data_race(meta->addr) - addr)
	to_report = meta;
	}

	// 如果左邊和右邊的頁都沒有分配出去，這是一種kfence也不敢確定的異常行爲，可能是UAF或者OOB
	if (!to_report)
	goto out;

	raw_spin_lock_irqsave(&to_report->lock, flags);
	// 記錄缺頁發生的地址
	to_report->unprotected_page = addr;
	// kfence檢測到的錯誤類型爲越界訪問
	error_type = KFENCE_ERROR_OOB;

	/*
	* If the object was freed before we took the look we can still
	* report this as an OOB -- the report will simply show the
	* stacktrace of the free as well.
	*/
	} else {
	// 表示發生了UAF，在偶數頁上發生了缺頁，只有一種可能，就是object被釋放後，沒有申請的情況下，又訪問了這個object。
	// 在前面的分析中直到，對於偶數頁，只有在free後纔會被protect起來。
	to_report = addr_to_metadata(addr);
	if (!to_report)
	goto out;

	raw_spin_lock_irqsave(&to_report->lock, flags);
	// kfence檢測到UAF內存訪問錯誤
	error_type = KFENCE_ERROR_UAF;
	/*
	* We may race with __kfence_alloc(), and it is possible that a
	* freed object may be reallocated. We simply report this as a
	* use-after-free, with the stack trace showing the place where
	* the object was re-allocated.
	*/
	}

	out:
	if (to_report) {
	// 報告OOB內存訪問錯誤
	kfence_report_error(addr, is_write, regs, to_report, error_type);
	raw_spin_unlock_irqrestore(&to_report->lock, flags);
	} else {
	/* 觸發OOB的左側和右側的內存頁都沒有分配，既可能使UAF，也可能是OOB
	This may be a UAF or OOB access, but we can't be sure. */
	kfence_report_error(addr, is_write, regs, NULL, KFENCE_ERROR_INVALID);
	}

	// 執行到這裏，說明kfence不希望系統宕機，所以撤銷發生缺頁的地址所在的內存區的保護，保證系統還可以正常跑下去
	return kfence_unprotect(addr); /* Unprotect and let access proceed. */
	}

摺疊

錯誤報告

當檢測到內存錯誤訪問時，會調用kfence_report_error輸出錯誤log。

錯誤種類分爲如下幾種：

缺頁異常中檢測到的訪問了protect頁的oob：KFENCE_ERROR_OOB
釋放內存時檢測到的訪問了object所在的內存區的空閒區域的OOB：KFENCE_ERROR_CORRUPTION
缺頁異常中檢測到的訪問了被釋放的object所在的內存頁的UAF：KFENCE_ERROR_UAF
釋放內存時檢測到的kfence到重複釋放或者申請和釋放的地址不一致：KFENCE_ERROR_INVALID_FREE
缺頁異常中檢測到的kfence無法確定的內存訪問錯誤，比如發生OOB時但是protect頁左右的內存頁都沒有分配出去：KFENCE_ERROR_INVALID

kfence_report_error

	/*
	address: 導致內存問題的地址
	is_write: 是不是寫訪問、
	regs：發生缺頁異常時的cpu上下文
	meta：跟導致內存異常的地址關聯的meta，對於訪問protect區域的oob來說，meta表示的是因爲訪問那個object導致的oob，這個object對應的meta
	type：內存問題的類型
	*/

	void kfence_report_error(unsigned long address, bool is_write, struct pt_regs *regs,
	const struct kfence_metadata *meta, enum kfence_error_type type)
	{
	unsigned long stack_entries[KFENCE_STACK_DEPTH] = { 0 };
	const ptrdiff_t object_index = meta ? meta - kfence_metadata : -1;
	int num_stack_entries;
	int skipnr = 0;

	/*
	對於regs非空，是因爲觸發了缺頁的情況，此時根據regs得到的調用棧不需要skip任何一項，所以skipnr爲0，因爲regs記錄的就是異常發生那
	一刻的棧的狀態；

	對於regs爲空的場景，是通過釋放內存觸發的，記錄調用棧的時候，調用棧裏不可避免的會出現kfence、slab以及kmem_cache相關的函數，這些
	函數對於分析問題沒啥幫助，所以對分析問題有幫助的是誰調用了這些函數，即誰在哪裏執行了釋放內存的操作，因爲需要將這部分的調用棧輸出出來，
	以節省開發人員時間，所以skipnr非0
	*/
	if (regs) {
	/* 根據pt_regs獲取發生異常時的調用棧，並且存放到stack_entries中，深度爲64 */
	num_stack_entries = stack_trace_save_regs(regs, stack_entries, KFENCE_STACK_DEPTH, 0);
	} else {
	/* 如果沒有傳遞pt_regs，那麼記錄的當前的調用棧，但是會將堆棧的去掉調用棧的第一項，即stack_trace_save */
	num_stack_entries = stack_trace_save(stack_entries, KFENCE_STACK_DEPTH, 1);
	/* 解析調用棧，目的是儘量得到導致內存問題的業務邏輯的位置，跳過kfence、slab、kfree、kmem_cache、kmalloc相關的函數
	這樣更加方便定位問題
	*/
	skipnr = get_stack_skipnr(stack_entries, num_stack_entries, &type);
	}

	/* Require non-NULL meta, except if KFENCE_ERROR_INVALID. */
	if (WARN_ON(type != KFENCE_ERROR_INVALID && !meta))
	return;

	if (meta)
	lockdep_assert_held(&meta->lock);
	/*
	* Because we may generate reports in printk-unfriendly parts of the
	* kernel, such as scheduler code, the use of printk() could deadlock.
	* Until such time that all printing code here is safe in all parts of
	* the kernel, accept the risk, and just get our message out (given the
	* system might already behave unpredictably due to the memory error).
	* As such, also disable lockdep to hide warnings, and avoid disabling
	* lockdep for the rest of the kernel.
	*/
	lockdep_off();

	pr_err("==================================================================\n");
	/* Print report header. */
	switch (type) {
	case KFENCE_ERROR_OOB: { // 訪問了protect的內存頁導致的OOB

	// 如果觸發異常的地址小於meta對應的object地址，意味着訪問了與object所在的內存頁緊鄰的左邊的protect內存頁
	// 否則，意味着訪問的是與object所在的內存頁緊鄰的右邊的protect內存頁
	const bool left_of_object = address < meta->addr;

	pr_err("BUG: KFENCE: out-of-bounds %s in %pS\n\n", get_access_type(is_write),
	(void *)stack_entries[skipnr]);

	// 輸出訪問類型，缺頁地址，缺頁地址跟object之間的字節偏移，缺頁地址在object的左邊內存頁還是右邊內存頁，以及object的索引
	pr_err("Out-of-bounds %s at 0x%p (%luB %s of kfence-#%td):\n",
	get_access_type(is_write), (void *)address,
	left_of_object ? meta->addr - address : address - meta->addr,
	left_of_object ? "left" : "right", object_index);
	break;
	}
	case KFENCE_ERROR_UAF: // object被釋放了，沒有申請，又訪問了
	pr_err("BUG: KFENCE: use-after-free %s in %pS\n\n", get_access_type(is_write),
	(void *)stack_entries[skipnr]);
	pr_err("Use-after-free %s at 0x%p (in kfence-#%td):\n",
	get_access_type(is_write), (void *)address, object_index);
	break;
	case KFENCE_ERROR_CORRUPTION: // object所在的內存頁的空閒區域的pattern被破壞，也屬於OOB
	pr_err("BUG: KFENCE: memory corruption in %pS\n\n", (void *)stack_entries[skipnr]);
	pr_err("Corrupted memory at 0x%p ", (void *)address); // 發生pattern不一致的地址
	print_diff_canary(address, 16, meta); // 顯示pattern不一致的地址右側16字節地址範圍內的數據的匹配信息
	pr_cont(" (in kfence-#%td):\n", object_index); // object的索引
	break;
	case KFENCE_ERROR_INVALID: // 缺頁異常裏檢測到的無效的錯誤
	pr_err("BUG: KFENCE: invalid %s in %pS\n\n", get_access_type(is_write),
	(void *)stack_entries[skipnr]);
	pr_err("Invalid %s at 0x%p:\n", get_access_type(is_write),
	(void *)address);
	break;
	case KFENCE_ERROR_INVALID_FREE: // kfence_free檢測到的重複釋放以及申請和釋放的地址不一致的錯誤
	pr_err("BUG: KFENCE: invalid free in %pS\n\n", (void *)stack_entries[skipnr]);
	pr_err("Invalid free of 0x%p (in kfence-#%td):\n", (void *)address,
	object_index);
	break;
	}

	/* 輸出內存錯誤發生的調用棧，其中skipnr用於幫助跳過一些對分析問題沒有幫助的mm內部函數 */
	stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr, 0);

	if (meta) {
	pr_err("\n");
	/*
	1. 輸出meta的狀態信息，object的地址範圍，kmem_cache以及進程pid
	2. 輸出object被分配出去時的調用棧
	3. 如果meta是free狀態，那麼還會輸出內存釋放時的調用棧，以及調用者的pid
	*/
	kfence_print_object(NULL, meta);
	}

	/* Print report footer. */
	pr_err("\n");
	if (no_hash_pointers && regs) // 可以通過啓動參數no_hash_pointers來設置爲1
	show_regs(regs); // 輸出缺頁異常發生時的CPU寄存器內容以及調用棧
	else
	dump_stack_print_info(KERN_ERR); // 簡略的debug信息
	trace_error_report_end(ERROR_DETECTOR_KFENCE, address);
	pr_err("==================================================================\n");

	lockdep_on();

	if (panic_on_warn) // 可以通過將/proc/sys/kernel/panic_on_warn設置爲1讓系統宕機
	panic("panic_on_warn set ...\n");

	/* We encountered a memory safety error, taint the kernel!
	可以通過給啓動參數設置'panic_on_taint=0x20'，這樣當添加TAINT_BAD_PAGE類型的taint時，會發生宕機
	*/
	add_taint(TAINT_BAD_PAGE, LOCKDEP_STILL_OK);
	}

摺疊

get_stack_skipnr [kfence_report_error -> get_stack_skipnr ]

從調用棧裏將mm的內部函數跳過。

	/*
	* Get the number of stack entries to skip to get out of MM internals. @type is
	* optional, and if set to NULL, assumes an allocation or free stack.
	*/
	static int get_stack_skipnr(const unsigned long stack_entries[], int num_entries,
	const enum kfence_error_type *type)
	{
	char buf[64];
	int skipnr, fallback = 0;

	if (type) {
	/* Depending on error type, find different stack entries. */
	switch (*type) {
	case KFENCE_ERROR_UAF:
	case KFENCE_ERROR_OOB:
	case KFENCE_ERROR_INVALID:
	/*
	* kfence_handle_page_fault() may be called with pt_regs
	* set to NULL; in that case we'll simply show the full
	* stack trace.
	*/
	return 0;
	case KFENCE_ERROR_CORRUPTION:
	case KFENCE_ERROR_INVALID_FREE:
	break;
	}
	}

	for (skipnr = 0; skipnr < num_entries; skipnr++) {
	int len = scnprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skipnr]);

	if (str_has_prefix(buf, ARCH_FUNC_PREFIX "kfence_") \|\|
	str_has_prefix(buf, ARCH_FUNC_PREFIX "__kfence_") \|\|
	!strncmp(buf, ARCH_FUNC_PREFIX "__slab_free", len)) {
	/*
	* In case of tail calls from any of the below
	* to any of the above.
	*/
	fallback = skipnr + 1;
	}

	/* Also the _bulk() variants by only checking prefixes. /
	if (str_has_prefix(buf, ARCH_FUNC_PREFIX "kfree") \|\|
	str_has_prefix(buf, ARCH_FUNC_PREFIX "kmem_cache_free") \|\|
	str_has_prefix(buf, ARCH_FUNC_PREFIX "__kmalloc") \|\|
	str_has_prefix(buf, ARCH_FUNC_PREFIX "kmem_cache_alloc"))
	goto found;
	}
	if (fallback < num_entries)
	return fallback;
	found:
	skipnr++;
	return skipnr < num_entries ? skipnr : 0;
	}

摺疊

print_diff_canary [kfence_report_error -> print_diff_canary]

	/*
	* Show bytes at @addr that are different from the expected canary values, up to
	* @max_bytes.

	address： pattern不一致的地址，這個地址可能是左側pattern區域或者右側pattern區域的，通過跟meta->addr比較就可以知道，參考下圖
	bytes_to_show: 最長輸出多少個地址的的匹配信息
	meta：pattern區所在的內存頁對應的meta信息
	*/
	static void print_diff_canary(unsigned long address, size_t bytes_to_show,
	const struct kfence_metadata *meta)
	{
	const unsigned long show_until_addr = address + bytes_to_show; //
	const u8 cur, end;

	/* 計算結束地址，不能越出pattern區的範圍。比如左側的pattern區，最長輸出到meta->addr-1。
	對於右側的pattern區，最長到右邊保護區起始地址-1 */
	end = (const u8 *)(address < meta->addr ? min(show_until_addr, meta->addr)
	: min(show_until_addr, PAGE_ALIGN(address)));

	pr_cont("[");
	for (cur = (const u8 *)address; cur < end; cur++) {
	if (*cur == KFENCE_CANARY_PATTERN(cur))
	pr_cont(" ."); // 對於pattern一致的地址，輸出 '.'
	else if (no_hash_pointers) // 可以通過啓動參數no_hash_pointers來設置爲1
	pr_cont(" 0x%02x", *cur);
	else /* Do not leak kernel memory in non-debug builds. */
	pr_cont(" !"); // 對於pattern不一致的地址，輸出 '!'
	}
	pr_cont(" ]");
	}

內存異常log分析

OOB錯誤

讀左側保護區導致的OOB: KFENCE_ERROR_OOB

示例：

	size = kmalloc_cache_alignment(size);
	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_LEFT);
	expect.addr = buf - 1;
	READ_ONCE(*expect.addr);
	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
	test_free(buf);

log:

	==================================================================
	BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xad/0x1f2 [kfence_test]

	# 觸發異常時的內核棧
	Out-of-bounds read at 0x000000008e1b5d12 (1B left of kfence-#109):
	test_out_of_bounds_read+0xad/0x1f2 [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	# 分配object的調用棧
	kfence-#109 [0x00000000753194ac-0x000000000d237ced, size=32, cache=kmalloc-32] allocated by task 35779:
	test_alloc+0xe9/0x36f [kfence_test]
	test_out_of_bounds_read+0x86/0x1f2 [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	CPU: 5 PID: 35779 Comm: kunit_try_catch Kdump: loaded Not tainted 5.14.0+ #4
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
	==================================================================

讀右側保護區導致的OOB: KFENCE_ERROR_OOB

示例：

	size = kmalloc_cache_alignment(size);
	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT);
	expect.addr = buf + size;
	READ_ONCE(*expect.addr);
	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
	test_free(buf);

log：

	==================================================================
	BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0x14a/0x1f2 [kfence_test]

	# 觸發異常的調用棧
	Out-of-bounds read at 0x0000000002d76451 (32B right of kfence-#111):
	test_out_of_bounds_read+0x14a/0x1f2 [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	# 分配object的調用棧
	kfence-#111 [0x00000000432dce97-0x000000008d6138c3, size=32, cache=kmalloc-32] allocated by task 35779:
	test_alloc+0xe9/0x36f [kfence_test]
	test_out_of_bounds_read+0x140/0x1f2 [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	CPU: 5 PID: 35779 Comm: kunit_try_catch Kdump: loaded Tainted: G B 5.14.0+ #4
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
	==================================================================

寫左側保護區導致的OOB: KFENCE_ERROR_OOB

示例：

	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_LEFT);
	expect.addr = buf - 1;
	WRITE_ONCE(*expect.addr, 42);

log:

	==================================================================
	BUG: KFENCE: out-of-bounds write in test_out_of_bounds_write+0x7a/0x116 [kfence_test]

	# 觸發異常的調用棧
	Out-of-bounds write at 0x000000003f50719f (1B left of kfence-#134):
	test_out_of_bounds_write+0x7a/0x116 [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	# 分配object的調用棧
	kfence-#134 [0x0000000080436418-0x0000000052b079df, size=32, cache=kmalloc-32] allocated by task 35781:
	test_alloc+0xe9/0x36f [kfence_test]
	test_out_of_bounds_write+0x65/0x116 [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	CPU: 5 PID: 35781 Comm: kunit_try_catch Kdump: loaded Tainted: G B 5.14.0+ #4
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
	==================================================================

UAF

KFENCE_ERROR_UAF

示例：

	expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
	test_free(expect.addr);
	READ_ONCE(*expect.addr);

log:

	==================================================================
	BUG: KFENCE: use-after-free read in test_use_after_free_read+0x89/0x10b [kfence_test]

	# 觸發UAF時的調用棧
	Use-after-free read at 0x0000000067fb284c (in kfence-#152):
	test_use_after_free_read+0x89/0x10b [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	# 分配object的調用棧
	kfence-#152 [0x0000000067fb284c-0x00000000cd45daeb, size=32, cache=kmalloc-32] allocated by task 35783:
	test_alloc+0xe9/0x36f [kfence_test]
	test_use_after_free_read+0x63/0x10b [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	# 釋放object的調用棧
	freed by task 35783:
	test_use_after_free_read+0x85/0x10b [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	CPU: 7 PID: 35783 Comm: kunit_try_catch Kdump: loaded Tainted: G B 5.14.0+ #4
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
	==================================================================

pattern區不一致

右側pattern區不一致：KFENCE_ERROR_CORRUPTION

示例：

	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_LEFT);
	expect.addr = buf + size;
	WRITE_ONCE(*expect.addr, 42);
	test_free(buf);

log:

	==================================================================
	BUG: KFENCE: memory corruption in test_corruption+0x9c/0x1cb [kfence_test]

	# 輸出pattern不一致的地址及其右側一共16個地址（不超出右側pattern區）的匹配結果，'!'表示不一致，'.'表示一致。
	Corrupted memory at 0x000000003b880c36 [ ! . . . . . . . . . . . . . . . ] (in kfence-#139):
	test_corruption+0x9c/0x1cb [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	# 分配object的調用棧
	kfence-#139 [0x0000000084320c94-0x00000000ebf5c6c5, size=32, cache=kmalloc-32] allocated by task 35789:
	test_alloc+0xe9/0x36f [kfence_test]
	test_corruption+0x72/0x1cb [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	CPU: 5 PID: 35789 Comm: kunit_try_catch Kdump: loaded Tainted: G B 5.14.0+ #4
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
	==================================================================

左側pattern區不一致：KFENCE_ERROR_CORRUPTION

示例：

	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT);
	expect.addr = buf - 1;
	WRITE_ONCE(*expect.addr, 42);
	test_free(buf);

log:

	==================================================================
	BUG: KFENCE: memory corruption in test_corruption+0x14e/0x1cb [kfence_test]

	# 輸出pattern不一致的地址及其右側一共16個地址（不超出左側pattern區）的匹配結果，'!'表示不一致，'.'表示一致。
	Corrupted memory at 0x00000000d7861e9d [ ! ] (in kfence-#155):
	test_corruption+0x14e/0x1cb [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	kfence-#155 [0x000000009acdf655-0x00000000008cbfb7, size=32, cache=kmalloc-32] allocated by task 35789:
	test_alloc+0xe9/0x36f [kfence_test]
	test_corruption+0x124/0x1cb [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	CPU: 5 PID: 35789 Comm: kunit_try_catch Kdump: loaded Tainted: G B 5.14.0+ #4
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
	==================================================================

無效的釋放

重複釋放：KFENCE_ERROR_INVALID_FREE

示例：

	expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
	test_free(expect.addr);
	test_free(expect.addr); /* Double-free. */

log:

	==================================================================
	BUG: KFENCE: invalid free in test_double_free+0x9a/0x124 [kfence_test]

	# 觸發重複釋放的調用棧
	Invalid free of 0x000000007fb6a8f8 (in kfence-#136):
	test_double_free+0x9a/0x124 [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	# 分配objcet的調用棧
	kfence-#136 [0x000000007fb6a8f8-0x00000000d967e9cd, size=32, cache=test] allocated by task 35786:
	test_alloc+0xdf/0x36f [kfence_test]
	test_double_free+0x63/0x124 [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	# 釋放object的調用棧
	freed by task 35786:
	test_double_free+0x7b/0x124 [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	CPU: 5 PID: 35786 Comm: kunit_try_catch Kdump: loaded Tainted: G B 5.14.0+ #4
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
	==================================================================

申請和釋放的地址不一致：KFENCE_ERROR_INVALID_FREE

示例：

	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
	expect.addr = buf + 1; /* Free on invalid address. */
	test_free(expect.addr); /* Invalid address free. */
	test_free(buf); /* No error. */

log:

	==================================================================
	BUG: KFENCE: invalid free in test_invalid_addr_free+0x8b/0x12b [kfence_test]

	Invalid free of 0x0000000000b3e82d (in kfence-#124):
	test_invalid_addr_free+0x8b/0x12b [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	kfence-#124 [0x000000002aecf77f-0x0000000046ff045a, size=32, cache=kmalloc-32] allocated by task 35787:
	test_alloc+0xe9/0x36f [kfence_test]
	test_invalid_addr_free+0x65/0x12b [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	CPU: 5 PID: 35787 Comm: kunit_try_catch Kdump: loaded Tainted: G B 5.14.0+ #4
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
	==================================================================

其他無法識別的內存錯誤

如觸發缺頁的OOB區域左側和右側的內存頁都沒有分配出去：KFENCE_ERROR_INVALID

示例：

READ_ONCE(__kfence_pool[10]);

log:

	==================================================================
	BUG: KFENCE: invalid read in test_invalid_access+0x48/0xd0 [kfence_test]

	Invalid read at 0x0000000023713263:
	test_invalid_access+0x48/0xd0 [kfence_test]
	kunit_try_run_case+0x51/0x80
	kunit_generic_run_threadfn_adapter+0x16/0x30
	kthread+0x11a/0x140
	ret_from_fork+0x22/0x30

	CPU: 5 PID: 35936 Comm: kunit_try_catch Kdump: loaded Tainted: G B 5.14.0+ #4
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
	==================================================================

debugfs調試節點

在/sys/kernel/debug/kfence下面有兩個用於查看kfence狀態的節點：objects和stats

stats節點

	# cat stats
	enabled: 1
	currently allocated: 47
	total allocations: 2416
	total frees: 2369
	zombie allocations: 0
	total bugs: 21

含義

名字	含義
enabled	kfence功能是否處於開啓狀態。可以通過內核啓動參數開啓，啓動後可以通過模塊參數關閉
currently allocated	kfence內存池中有多少個object被分配出去了
total allocations	在kfence內存池中發生過object分配的總次數，當掉遞增
total frees	在kfence內存池中發生過object釋放的總次數，當掉遞增
zombie allocations	當某個kmem_cache被銷燬時，在kfence中與之對應的尚未釋放的object個數
total bugs	kfence檢測到的內存錯誤的次數

實現

	static int stats_show(struct seq_file seq, void v)
	{
	int i;

	seq_printf(seq, "enabled: %i\n", READ_ONCE(kfence_enabled));
	for (i = 0; i < KFENCE_COUNTER_COUNT; i++)
	seq_printf(seq, "%s: %ld\n", counter_names[i], atomic_long_read(&counters[i]));

	return 0;
	}
	DEFINE_SHOW_ATTRIBUTE(stats);

其中用到的統計數據定義如下：

	/* Statistics counters for debugfs. */
	enum kfence_counter_id {
	KFENCE_COUNTER_ALLOCATED,
	KFENCE_COUNTER_ALLOCS,
	KFENCE_COUNTER_FREES,
	KFENCE_COUNTER_ZOMBIES,
	KFENCE_COUNTER_BUGS,
	KFENCE_COUNTER_COUNT,
	};
	static atomic_long_t counters[KFENCE_COUNTER_COUNT];
	static const char *const counter_names[] = {
	[KFENCE_COUNTER_ALLOCATED] = "currently allocated",
	[KFENCE_COUNTER_ALLOCS] = "total allocations",
	[KFENCE_COUNTER_FREES] = "total frees",
	[KFENCE_COUNTER_ZOMBIES] = "zombie allocations",
	[KFENCE_COUNTER_BUGS] = "total bugs",
	};

objects節點

輸出kfence中每個meta的信息，當前狀態以及調用棧。

	# cat objects
	kfence-#0 [0xffff89c43b202000-0xffff89c43b202067, size=104, cache=kmalloc-128] allocated by task 8:
	set_kthread_struct+0x30/0x40
	kthread+0x2e/0x140
	ret_from_fork+0x22/0x30
	---------------------------------
	kfence-#1 [0xffff89c43b204000-0xffff89c43b20400f, size=16, cache=kmalloc-16] allocated by task 1:
	__smpboot_create_thread.part.9+0x3c/0x120
	smpboot_create_threads+0x67/0x90
	cpuhp_invoke_callback+0x105/0x400
	cpuhp_invoke_callback_range+0x40/0x80
	_cpu_up+0xd8/0x1e0
	cpu_up+0x85/0x90
	bringup_nonboot_cpus+0x4f/0x60
	smp_init+0x26/0x74
	kernel_init_freeable+0x10e/0x246
	kernel_init+0x16/0x120
	ret_from_fork+0x22/0x30
	---------------------------------
	...
	kfence-#40 [0xffff89c43b252dc0-0xffff89c43b252fff, size=576, cache=inode_cache] allocated by task 531:
	alloc_inode+0x87/0xa0
	new_inode_pseudo+0xb/0x50
	create_pipe_files+0x32/0x200
	__do_pipe_flags+0x2c/0xd0
	do_pipe2+0x2d/0xb0
	__x64_sys_pipe+0x10/0x20
	do_syscall_64+0x3a/0x80
	entry_SYSCALL_64_after_hwframe+0x44/0xae

	freed by task 531:
	destroy_inode+0x3b/0x70
	__dentry_kill+0xc5/0x150
	__fput+0xd9/0x230
	task_work_run+0x74/0xb0
	exit_to_user_mode_prepare+0x191/0x1a0
	syscall_exit_to_user_mode+0x19/0x30
	do_syscall_64+0x46/0x80
	entry_SYSCALL_64_after_hwframe+0x44/0xae
	...
	---------------------------------
	kfence-#254 unused
	---------------------------------

含義

對於被分配出去且尚未釋放的object，只顯示分配棧。
對於當前處於free狀態的object，既顯示分配棧，也顯示釋放棧。處於zombie的object也屬於free。
對於從來沒有被分配出去過的object，顯示unused
對於zombie的object，雖然是free的，但是已經不能被分配了，對應的kmem_cache被銷燬的了，所以cache會顯示爲<destroyed>

實現

	static int show_object(struct seq_file seq, void v)
	{
	struct kfence_metadata *meta = &kfence_metadata[(long)v - 1];
	unsigned long flags;

	raw_spin_lock_irqsave(&meta->lock, flags);
	kfence_print_object(seq, meta);
	raw_spin_unlock_irqrestore(&meta->lock, flags);
	seq_puts(seq, "---------------------------------\n");

	return 0;
	}

kfence_print_object

	void kfence_print_object(struct seq_file seq, const struct kfence_metadata meta)
	{
	const int size = abs(meta->size);
	const unsigned long start = meta->addr;
	const struct kmem_cache *const cache = meta->cache;

	lockdep_assert_held(&meta->lock);

	if (meta->state == KFENCE_OBJECT_UNUSED) { // 尚未使用的meta
	seq_con_printf(seq, "kfence-#%td unused\n", meta - kfence_metadata);
	return;
	}

	seq_con_printf(seq,
	"kfence-#%td [0x%p-0x%p"
	", size=%d, cache=%s] allocated by task %d:\n",
	meta - kfence_metadata, (void )start, (void )(start + size - 1), size,
	(cache && cache->name) ? cache->name : "<destroyed>", meta->alloc_track.pid);
	kfence_print_stack(seq, meta, true); // 輸出meta對應的object被分配出去時的調用棧

	if (meta->state == KFENCE_OBJECT_FREED) { // 如果meta對應的object被釋放了
	seq_con_printf(seq, "\nfreed by task %d:\n", meta->free_track.pid);
	kfence_print_stack(seq, meta, false); // 輸出meta對應的object被釋放時的調用棧
	}
	}

測試框架

kfence提供了測試用例，在mm\kfence\kfence_test.c中。

	static int __init kfence_test_init(void)
	{
	/* 遍歷內核中的tracepoint，在名爲"console"的tracepoint上掛載一個hook函數 */
	for_each_kernel_tracepoint(register_tracepoints, NULL);

	/* 執行測試用例 */
	return __kunit_test_suites_init(kfence_test_suites);
	}

register_tracepoints

	static void register_tracepoints(struct tracepoint tp, void ignore)
	{
	check_trace_callback_type_console(probe_console);
	if (!strcmp(tp->name, "console"))
	WARN_ON(tracepoint_probe_register(tp, probe_console, NULL));
	}

當kfence_report_error輸出錯誤log時，"console"這個tracepoint會觸發，然後會回調到probe_console，在probe_console中會過濾kfence_report_error中輸出的錯誤log，並記錄到observed，用於跟期望的錯誤類型比對，比對通過表示測試成功。

probe_console

過濾kfence_report_error中輸出的錯誤log，並記錄到observed，用於跟期望的錯誤類型比對，比對通過表示測試成功。

	/* Probe for console output: obtains observed lines of interest. */
	static void probe_console(void ignore, const char buf, size_t len)
	{
	unsigned long flags;
	int nlines;

	spin_lock_irqsave(&observed.lock, flags);
	nlines = observed.nlines;

	if (strnstr(buf, "BUG: KFENCE: ", len) && strnstr(buf, "test_", len)) {
	/*
	* KFENCE report and related to the test.
	*
	* The provided @buf is not NUL-terminated; copy no more than
	* @len bytes and let strscpy() add the missing NUL-terminator.
	*/
	strscpy(observed.lines[0], buf, min(len + 1, sizeof(observed.lines[0])));
	nlines = 1;
	} else if (nlines == 1 && (strnstr(buf, "at 0x", len) \|\| strnstr(buf, "of 0x", len))) {
	strscpy(observed.lines[nlines++], buf, min(len + 1, sizeof(observed.lines[0])));
	}

	WRITE_ONCE(observed.nlines, nlines); /* Publish new nlines. */
	spin_unlock_irqrestore(&observed.lock, flags);
	}

kfence_test_suites

記錄了測試case的具體內容：

	#define KFENCE_KUNIT_CASE(test_name) \
	{ .run_case = test_name, .name = #test_name }, \
	{ .run_case = test_name, .name = #test_name "-memcache" }

	static struct kunit_case kfence_test_cases[] = {
	KFENCE_KUNIT_CASE(test_out_of_bounds_read),
	KFENCE_KUNIT_CASE(test_out_of_bounds_write),
	KFENCE_KUNIT_CASE(test_use_after_free_read),
	KFENCE_KUNIT_CASE(test_double_free),
	KFENCE_KUNIT_CASE(test_invalid_addr_free),
	KFENCE_KUNIT_CASE(test_corruption),
	KFENCE_KUNIT_CASE(test_free_bulk),
	KFENCE_KUNIT_CASE(test_init_on_free),
	KUNIT_CASE(test_kmalloc_aligned_oob_read),
	KUNIT_CASE(test_kmalloc_aligned_oob_write),
	KUNIT_CASE(test_shrink_memcache),
	KUNIT_CASE(test_memcache_ctor),
	KUNIT_CASE(test_invalid_access),
	KUNIT_CASE(test_gfpzero),
	KUNIT_CASE(test_memcache_typesafe_by_rcu),
	KUNIT_CASE(test_krealloc),
	KUNIT_CASE(test_memcache_alloc_bulk),
	{},
	};

	static struct kunit_suite kfence_test_suite = {
	.name = "kfence",
	.test_cases = kfence_test_cases,
	.init = test_init,
	.exit = test_exit,
	};
	static struct kunit_suite *kfence_test_suites[] = { &kfence_test_suite, NULL };

以test_out_of_bounds_read爲例：

	static void test_out_of_bounds_read(struct kunit *test)
	{
	size_t size = 32;
	struct expect_report expect = { // 期望發生的結果
	.type = KFENCE_ERROR_OOB, // 期望發生的錯誤類型
	.fn = test_out_of_bounds_read, // 期望導致錯誤發生的函數
	.is_write = false, // 期望的讀寫方向，這裏是讀
	};
	char *buf;

	setup_test_cache(test, size, 0, NULL);

	/*
	* If we don't have our own cache, adjust based on alignment, so that we
	* actually access guard pages on either side.
	*/
	if (!test_cache)
	size = kmalloc_cache_alignment(size);

	/* Test both sides. */

	// 從kfence中分配內存，構造訪問左邊保護頁的OOB，返回的是object所在頁的首地址
	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_LEFT);
	expect.addr = buf - 1; // 期望在哪個地址上發生OOB,地址減1就是左邊保護頁的結尾地址
	READ_ONCE(*expect.addr); // 觸發OOB異常
	KUNIT_EXPECT_TRUE(test, report_matches(&expect)); // 調用report_matche比對實際發生的錯誤跟期望發生的錯誤是否一致
	test_free(buf);

	// 從kfence中分配內存，構造訪問右邊保護頁的OOB，返回的是object所在頁的首地址
	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT);
	expect.addr = buf + size; // 期望發生缺頁的地址，地址加上size就是右邊保護頁的首地址
	READ_ONCE(*expect.addr); // 觸發OOB異常
	KUNIT_EXPECT_TRUE(test, report_matches(&expect)); // 覈對結果
	test_free(buf);
	}

report_matches

	static bool report_matches(const struct expect_report *r)
	{
	bool ret = false;
	unsigned long flags;
	typeof(observed.lines) expect;
	const char *end;
	char *cur;

	/* Doubled-checked locking. */
	if (!report_available())
	return false;

	/* Generate expected report contents. */

	/* Title */
	cur = expect[0];
	end = &expect[0][sizeof(expect[0]) - 1];
	switch (r->type) {
	case KFENCE_ERROR_OOB:
	cur += scnprintf(cur, end - cur, "BUG: KFENCE: out-of-bounds %s",
	get_access_type(r));
	break;
	case KFENCE_ERROR_UAF:
	cur += scnprintf(cur, end - cur, "BUG: KFENCE: use-after-free %s",
	get_access_type(r));
	break;
	case KFENCE_ERROR_CORRUPTION:
	cur += scnprintf(cur, end - cur, "BUG: KFENCE: memory corruption");
	break;
	case KFENCE_ERROR_INVALID:
	cur += scnprintf(cur, end - cur, "BUG: KFENCE: invalid %s",
	get_access_type(r));
	break;
	case KFENCE_ERROR_INVALID_FREE:
	cur += scnprintf(cur, end - cur, "BUG: KFENCE: invalid free");
	break;
	}

	scnprintf(cur, end - cur, " in %pS", r->fn);
	/* The exact offset won't match, remove it; also strip module name. */
	cur = strchr(expect[0], '+');
	if (cur)
	*cur = '\0';

	/* Access information */
	cur = expect[1];
	end = &expect[1][sizeof(expect[1]) - 1];

	switch (r->type) {
	case KFENCE_ERROR_OOB:
	cur += scnprintf(cur, end - cur, "Out-of-bounds %s at", get_access_type(r));
	break;
	case KFENCE_ERROR_UAF:
	cur += scnprintf(cur, end - cur, "Use-after-free %s at", get_access_type(r));
	break;
	case KFENCE_ERROR_CORRUPTION:
	cur += scnprintf(cur, end - cur, "Corrupted memory at");
	break;
	case KFENCE_ERROR_INVALID:
	cur += scnprintf(cur, end - cur, "Invalid %s at", get_access_type(r));
	break;
	case KFENCE_ERROR_INVALID_FREE:
	cur += scnprintf(cur, end - cur, "Invalid free of");
	break;
	}

	cur += scnprintf(cur, end - cur, " 0x%p", (void *)r->addr);

	spin_lock_irqsave(&observed.lock, flags);
	if (!report_available())
	goto out; /* A new report is being captured. */

	/* Finally match expected output to what we actually observed. */
	ret = strstr(observed.lines[0], expect[0]) && strstr(observed.lines[1], expect[1]);
	out:
	spin_unlock_irqrestore(&observed.lock, flags);
	return ret;
	}

摺疊

完。

kfence源碼分析【轉】

參考

作者

內核版本

實現分析

初始化

週期性開啓kfence內存池

分配內存

釋放內存

檢查pattern區

kmem_cache銷燬

缺頁異常

錯誤報告

內存異常log分析

OOB錯誤

UAF

pattern區不一致

無效的釋放

其他無法識別的內存錯誤

debugfs調試節點

stats節點

含義

實現

objects節點

含義

實現

測試框架

如何編譯出linux內核驅動中的.i文件【原創】

【ARMv8/v9 異常模型入門及漸進 8 -- 安全中斷介紹】【轉】

內核softlockup和hardlockup的一些參數分析【轉】

小明哥學linux驅動之USB-OTG（基於gadget框架）【轉】

VMware 虛擬機 Ubuntu 系統沒有IP地址解決：UP BROADCAST MULTICAST 問題【轉】

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結