前邊分析了qemu對內存的建模, 整個過程有三種內存地址 gpa, hva, gva, 在qemu中
如何表現這三種內存呢,
首先qemu把所有的ram片段(由MemoryRegion生成的RamBlock)平坦的鋪開,串聯起來, 放在
ram_list 鏈表裏面, 用於尋址ram, 這個平坦的ram地址用ram_addr_t表示(這裏的ram不光指ram還有rom地址), hva呢用unit8_t類型標示
gpa用hwaddr, 而target_ulong 表示gva
好了下面來看下qemu x86的cpu的mmu實現
864 /* NOTE: this function can trigger an exception */
865 /* NOTE2: the returned address is not exactly the physical address: it
866 * is actually a ram_addr_t (in system mode; the user mode emulation
867 * version of this function returns a guest virtual address).
868 */
869 tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr)
870 {
871 int mmu_idx, index, pd;
872 void *p;
873 MemoryRegion *mr;
874 CPUState *cpu = ENV_GET_CPU(env);
875 CPUIOTLBEntry *iotlbentry;
876 hwaddr physaddr;
877
878 index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
879 mmu_idx = cpu_mmu_index(env, true);
880 if (unlikely(env->tlb_table[mmu_idx][index].addr_code !=
881 (addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK)))) {
882 if (!VICTIM_TLB_HIT(addr_read, addr)) {
883 tlb_fill(ENV_GET_CPU(env), addr, 0, MMU_INST_FETCH, mmu_idx, 0);
884 }
885 }
886 iotlbentry = &env->iotlb[mmu_idx][index];
887 pd = iotlbentry->addr & ~TARGET_PAGE_MASK;
888 mr = iotlb_to_region(cpu, pd, iotlbentry->attrs);
889 if (memory_region_is_unassigned(mr)) {
890 qemu_mutex_lock_iothread();
891 if (memory_region_request_mmio_ptr(mr, addr)) {
892 qemu_mutex_unlock_iothread();
893 /* A MemoryRegion is potentially added so re-run the
894 * get_page_addr_code.
895 */
896 return get_page_addr_code(env, addr);
897 }
898 qemu_mutex_unlock_iothread();
899
900 /* Give the new-style cpu_transaction_failed() hook first chance
901 * to handle this.
902 * This is not the ideal place to detect and generate CPU
903 * exceptions for instruction fetch failure (for instance
904 * we don't know the length of the access that the CPU would
905 * use, and it would be better to go ahead and try the access
906 * and use the MemTXResult it produced). However it is the
907 * simplest place we have currently available for the check.
908 */
909 physaddr = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
910 cpu_transaction_failed(cpu, physaddr, addr, 0, MMU_INST_FETCH, mmu_idx,
911 iotlbentry->attrs, MEMTX_DECODE_ERROR, 0);
912
913 cpu_unassigned_access(cpu, addr, false, true, 0, 4);
914 /* The CPU's unassigned access hook might have longjumped out
915 * with an exception. If it didn't (or there was no hook) then
916 * we can't proceed further.
917 */
918 report_bad_exec(cpu, addr);
919 exit(1);
920 }
921 p = (void *)((uintptr_t)addr + env->tlb_table[mmu_idx][index].addend);
922 return qemu_ram_addr_from_host_nofail(p);
923 }
對於虛擬地址到物理地址的轉換, 其實有兩種模式, 一種是實地址模式,一種是虛地址模式, 虛地址模式需要進行虛擬地址到物理地址的轉換,實地址模式則gpa=gva, 虛地址轉換需要使用內存中的頁表來輔助轉換,爲了加快轉換過程,使用tlb進行緩存.
對於模擬的x86 cpu對於實地址模式也使用tlb進行地址查詢(具體真是的硬件是這樣的還是qemu做了簡化,這裏不去考證).
tlb相關的知識參考https://blog.csdn.net/leishangwen/article/details/27190959 tlb的工作過程, 這裏模擬的是直接映射方式時的TLB, 應爲這裏不需要對操作系統透明,所以我們模擬的cpu只需要實現tlb的硬件接口支持就好了.
整個尋址過程就是拿虛擬地址的13-18位(6位 64項) 作爲索引在tlb表中找到對應的項目. 另外根據19-31位比對地址是否命中, 另外輔助信息中也有一位表示tlb項是否有效,如果無效也未命中,則需要進行轉換填充tlb
另外由於內存使用的局部性原理, tlb的大小是有限的,tlb表象可能會被換出去,但是換出去後馬上又被訪問了,爲了解決這種問題,qemu增加了另外一張表叫tlb_v_table, 裏面緩存一些被換出去的表象.這個部分不使用虛擬地址進行索引, 通過遍歷該表找到地址轉換的緩存. v是victim這個單詞.
所以tlb的過程只是爲了加快地址轉換,mmu真正的目的還是要找到gva對應的gpa.
再來說下qemu相關的數據結構
CPUArchState 標示當前cpu的狀態信息, 比如寄存器信息,tlb信息等cpu的狀態,對於分析mmu比較重要的有如下這些項
/* use a fully associative victim tlb of 8 entries */
#define CPU_VTLB_SIZE 8
#if HOST_LONG_BITS == 32 && TARGET_LONG_BITS == 32
#define CPU_TLB_ENTRY_BITS 4
#else
#define CPU_TLB_ENTRY_BITS 5
#endif
/* TCG_TARGET_TLB_DISPLACEMENT_BITS is used in CPU_TLB_BITS to ensure that
* the TLB is not unnecessarily small, but still small enough for the
* TLB lookup instruction sequence used by the TCG target.
*
* TCG will have to generate an operand as large as the distance between
* env and the tlb_table[NB_MMU_MODES - 1][0].addend. For simplicity,
* the TCG targets just round everything up to the next power of two, and
* count bits. This works because: 1) the size of each TLB is a largish
* power of two, 2) and because the limit of the displacement is really close
* to a power of two, 3) the offset of tlb_table[0][0] inside env is smaller
* than the size of a TLB.
*
* For example, the maximum displacement 0xFFF0 on PPC and MIPS, but TCG
* just says "the displacement is 16 bits". TCG_TARGET_TLB_DISPLACEMENT_BITS
* then ensures that tlb_table at least 0x8000 bytes large ("not unnecessarily
* small": 2^15). The operand then will come up smaller than 0xFFF0 without
* any particular care, because the TLB for a single MMU mode is larger than
* 0x10000-0xFFF0=16 bytes. In the end, the maximum value of the operand
* could be something like 0xC000 (the offset of the last TLB table) plus
* 0x18 (the offset of the addend field in each TLB entry) plus the offset
* of tlb_table inside env (which is non-trivial but not huge).
*/
#define CPU_TLB_BITS \
MIN(8, \
TCG_TARGET_TLB_DISPLACEMENT_BITS - CPU_TLB_ENTRY_BITS - \
(NB_MMU_MODES <= 1 ? 0 : \
NB_MMU_MODES <= 2 ? 1 : \
NB_MMU_MODES <= 4 ? 2 : \
NB_MMU_MODES <= 8 ? 3 : 4))
#define CPU_TLB_SIZE (1 << CPU_TLB_BITS)
typedef struct CPUTLBEntry {
/* bit TARGET_LONG_BITS to TARGET_PAGE_BITS : virtual address
bit TARGET_PAGE_BITS-1..4 : Nonzero for accesses that should not
go directly to ram.
bit 3 : indicates that the entry is invalid
bit 2..0 : zero
*/
union {
struct {
target_ulong addr_read;
target_ulong addr_write;
target_ulong addr_code;
/* Addend to virtual address to get host address. IO accesses
use the corresponding iotlb value. */
uintptr_t addend;
};
/* padding to get a power of two size */
uint8_t dummy[1 << CPU_TLB_ENTRY_BITS];
};
} CPUTLBEntry;
QEMU_BUILD_BUG_ON(sizeof(CPUTLBEntry) != (1 << CPU_TLB_ENTRY_BITS));
/* The IOTLB is not accessed directly inline by generated TCG code,
* so the CPUIOTLBEntry layout is not as critical as that of the
* CPUTLBEntry. (This is also why we don't want to combine the two
* structs into one.)
*/
typedef struct CPUIOTLBEntry {
hwaddr addr;
MemTxAttrs attrs;
} CPUIOTLBEntry;
#define CPU_COMMON_TLB \
/* The meaning of the MMU modes is defined in the target code. */ \
CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE]; \
CPUTLBEntry tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE]; \
CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE]; \
CPUIOTLBEntry iotlb_v[NB_MMU_MODES][CPU_VTLB_SIZE]; \
size_t tlb_flush_count; \
target_ulong tlb_flush_addr; \
target_ulong tlb_flush_mask; \
target_ulong vtlb_index;
tlb_table 標示tlb表
tlb_v_table 就是tlb_table的victim表
iotlb 這個其實是和tlb_table一起使用的
iotlb_v 同理是iotlb的victim表
tlb_table的主要作用的進行地址翻譯, iotlb的主要作用是幫轉qemu進行gpa→hva的轉換 (這裏的hva不光包括ram模擬還包括rom和mmio)
另外tlb_table和 io_tlb 都是二唯數組, 第一個唯獨取決於cpu的工作模式,我們不去分析不同模式,這分析標準模式
有了上面的背景知識再來分析mmu其實是比較簡單的.
880行從tlb_table裏面查虛擬地址, 去過沒有 則882行從tlb_v_table 表裏面查詢, 最終如果沒有命中, 怎麼辦呢, 883行調用
tlb_fill填充tlb
886-896行處理需要mmio的情況
900-919行爲異常情況, 不去分析
最後922 行使用qemu_ram_addr_from_host_nofail 來獲取對應的ram_addr_t .
mmu的具體地址轉換過程其實是在tlb_fill函數中實現,我們今天只分析實模式。
void tlb_fill(CPUState *cs, target_ulong addr, int size,
MMUAccessType access_type, int mmu_idx, uintptr_t retaddr)
{
int ret;
ret = x86_cpu_handle_mmu_fault(cs, addr, size, access_type, mmu_idx);
if (ret) {
X86CPU *cpu = X86_CPU(cs);
CPUX86State *env = &cpu->env;
raise_exception_err_ra(env, cs->exception_index, env->error_code, retaddr);
}
}
這裏是通過x86_cpu_handle_mmu_fault函數進行的地址轉換,如果轉換失敗則發生異常, 這裏轉換完成之後會直接填充tlb,後面再從tlb中查詢,所以tlb_fill函數並無返回值。
這裏x86_cpu_handle_mmu_fault的參數cs爲cpu狀態,addr爲要轉換的虛擬地址, size爲要翻譯的地址大小(可能是多個頁面), access_type爲觸發mm的操作類型,mmu_idx用於索引當前mmu的模式。
60 /* return value:
161 * -1 = cannot handle fault
162 * 0 = nothing more to do
163 * 1 = generate PF fault
164 */
165 int x86_cpu_handle_mmu_fault(CPUState *cs, vaddr addr, int size,
166 int is_write1, int mmu_idx)
167 {
168 X86CPU *cpu = X86_CPU(cs);
169 CPUX86State *env = &cpu->env;
170 uint64_t ptep, pte;
171 int32_t a20_mask;
172 target_ulong pde_addr, pte_addr;
173 int error_code = 0;
174 int is_dirty, prot, page_size, is_write, is_user;
175 hwaddr paddr;
176 uint64_t rsvd_mask = PG_HI_RSVD_MASK;
177 uint32_t page_offset;
178 target_ulong vaddr;
179
180 is_user = mmu_idx == MMU_USER_IDX;
......
185 is_write = is_write1 & 1;
186
187 a20_mask = x86_get_a20_mask(env);
188 if (!(env->cr[0] & CR0_PG_MASK)) {
189 pte = addr;
190 #ifdef TARGET_X86_64
191 if (!(env->hflags & HF_LMA_MASK)) {
192 /* Without long mode we can only address 32bits in real mode */
193 pte = (uint32_t)pte;
194 }
195 #endif
196 prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
197 page_size = 4096;
198 goto do_mapping;
199 }
200
......
440 do_mapping:
441 pte = pte & a20_mask;
442
443 /* align to page_size */
444 pte &= PG_ADDRESS_MASK & ~(page_size - 1);
445
446 /* Even if 4MB pages, we map only one 4KB page in the cache to
447 avoid filling it too fast */
448 vaddr = addr & TARGET_PAGE_MASK;
449 page_offset = vaddr & (page_size - 1);
450 paddr = pte + page_offset;
451
452 assert(prot & (1 << is_write1));
453 tlb_set_page_with_attrs(cs, vaddr, paddr, cpu_get_mem_attrs(env),
454 prot, mmu_idx, page_size);
455 return 0;
456 do_fault_rsvd:
457 error_code |= PG_ERROR_RSVD_MASK;
458 do_fault_protect:
459 error_code |= PG_ERROR_P_MASK;
460 do_fault:
461 error_code |= (is_write << PG_ERROR_W_BIT);
462 if (is_user)
463 error_code |= PG_ERROR_U_MASK;
464 if (is_write1 == 2 &&
465 (((env->efer & MSR_EFER_NXE) &&
466 (env->cr[4] & CR4_PAE_MASK)) ||
467 (env->cr[4] & CR4_SMEP_MASK)))
468 error_code |= PG_ERROR_I_D_MASK;
469 if (env->intercept_exceptions & (1 << EXCP0E_PAGE)) {
470 /* cr2 is not modified in case of exceptions */
471 x86_stq_phys(cs,
472 env->vm_vmcb + offsetof(struct vmcb, control.exit_info_2),
473 addr);
474 } else {
475 env->cr[2] = addr;
476 }
477 env->error_code = error_code;
478 cs->exception_index = EXCP0E_PAGE;
479 return 1;
480 }
187-198行獲取地址的總線的寬度,一般在i386cpu上,a20地址先開了之後處於保護模式,地址寬度爲32位, 否則爲20位,如果沒有開cr0的CR0_PG_MASK位則是實模式, 直接進行映射,也就是do_mapping後的操作。
調用tlb_set_page_with_attrs 填充tlb。
606 /* Add a new TLB entry. At most one entry for a given virtual address
607 * is permitted. Only a single TARGET_PAGE_SIZE region is mapped, the
608 * supplied size is only used by tlb_flush_page.
609 *
610 * Called from TCG-generated code, which is under an RCU read-side
611 * critical section.
612 */
613 void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
614 hwaddr paddr, MemTxAttrs attrs, int prot,
615 int mmu_idx, target_ulong size)
616 {
617 CPUArchState *env = cpu->env_ptr;
618 MemoryRegionSection *section;
619 unsigned int index;
620 target_ulong address;
621 target_ulong code_address;
622 uintptr_t addend;
623 CPUTLBEntry *te, *tv, tn;
624 hwaddr iotlb, xlat, sz;
625 unsigned vidx = env->vtlb_index++ % CPU_VTLB_SIZE;
626 int asidx = cpu_asidx_from_attrs(cpu, attrs);
627
628 assert_cpu_is_self(cpu);
629 assert(size >= TARGET_PAGE_SIZE);
630 if (size != TARGET_PAGE_SIZE) {
631 tlb_add_large_page(env, vaddr, size);
632 }
633
634 sz = size;
635 section = address_space_translate_for_iotlb(cpu, asidx, paddr, &xlat, &sz);
636 assert(sz >= TARGET_PAGE_SIZE);
637
638 tlb_debug("vaddr=" TARGET_FMT_lx " paddr=0x" TARGET_FMT_plx
639 " prot=%x idx=%d\n",
640 vaddr, paddr, prot, mmu_idx);
641
642 address = vaddr;
643 if (!memory_region_is_ram(section->mr) && !memory_region_is_romd(section->mr)) {
644 /* IO memory case */
645 address |= TLB_MMIO;
646 addend = 0;
647 } else {
648 /* TLB_MMIO for rom/romd handled below */
649 addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
650 }
651
652 code_address = address;
653 iotlb = memory_region_section_get_iotlb(cpu, section, vaddr, paddr, xlat,
654 prot, &address);
655
656 index = (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
657 te = &env->tlb_table[mmu_idx][index];
658 /* do not discard the translation in te, evict it into a victim tlb */
659 tv = &env->tlb_v_table[mmu_idx][vidx];
660
661 /* addr_write can race with tlb_reset_dirty_range */
662 copy_tlb_helper(tv, te, true);
663
664 env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
665
666 /* refill the tlb */
667 env->iotlb[mmu_idx][index].addr = iotlb - vaddr;
668 env->iotlb[mmu_idx][index].attrs = attrs;
669
670 /* Now calculate the new entry */
671 tn.addend = addend - vaddr;
672 if (prot & PAGE_READ) {
673 tn.addr_read = address;
674 } else {
675 tn.addr_read = -1;
676 }
677
678 if (prot & PAGE_EXEC) {
679 tn.addr_code = code_address;
680 } else {
681 tn.addr_code = -1;
682 }
683
684 tn.addr_write = -1;
685 if (prot & PAGE_WRITE) {
686 if ((memory_region_is_ram(section->mr) && section->readonly)
687 || memory_region_is_romd(section->mr)) {
688 /* Write access calls the I/O callback. */
689 tn.addr_write = address | TLB_MMIO;
690 } else if (memory_region_is_ram(section->mr)
691 && cpu_physical_memory_is_clean(
692 memory_region_get_ram_addr(section->mr) + xlat)) {
693 tn.addr_write = address | TLB_NOTDIRTY;
694 } else {
695 tn.addr_write = address;
696 }
697 if (prot & PAGE_WRITE_INV) {
698 tn.addr_write |= TLB_INVALID_MASK;
699 }
700 }
701
702 /* Pairs with flag setting in tlb_reset_dirty_range */
703 copy_tlb_helper(te, &tn, true);
704 /* atomic_mb_set(&te->addr_write, write_address); */
705 }
要弄懂這個函數必須要說下tlb_table 和io_tlb, 從名字也可以看出來tlb_table用於tlb轉換和ram類型的內存讀寫(直接訪問hva),mmio類型的讀訪問,所以CPUTLBEntry裏面包含addr_read, addr_write和addr_code,分別驗證讀寫執行是否可以直接訪問hva。 io_tlb則不負責rom,mmio類型內存的讀寫訪存。
CPUTLBEntry中的addend用於計算hva, (CPUTLBEntry->addend&PAGE_MASK) + gva = hva
CPUIOTLBEntry的addr用於addr有兩部分 , ( CPUIOTLBEntry->addr & (PAGE_MASK)) 用於指向MemoryRegionSection。
當內存地址爲爲定義的ram或者rom的時候, (CPUIOTLBEntry->addr & PAGE_MASK) + gva = ram_addr_t
當CPUIOTLBEntry爲mmio的時候CPUIOTLBEntry->addr其實沒有什麼用, 只需要找到MemoryRegionSection即可完成訪存操作
另外說下CPUTLBEntry->addr_read 當讀內存的時候會比對tlb_table該屬性,如果不可讀該值爲-1, 如果TLB_MMIO被設置則使用io_tlb進行訪存
如果不是MMIO地址,如果可讀則該值可以用於定位hva。
CPUTLBEntry->addr_code 用於tlb緩存對比,和定位hva
CPUTLBEntry->addr_write 當寫內存的時候會比對tlb_table該屬性,如果不可寫該值爲-1, 如果TLB_MMIO被設置則使用io_tlb進行訪存
如果不是MMIO地址,如果可寫則該值可以用於定位hva。
知道這些之後上面的代碼就一目瞭然了