qemu內存模型(6)mm實現(一)實模式

前邊分析了qemu對內存的建模, 整個過程有三種內存地址 gpa, hva, gva, 在qemu中

如何表現這三種內存呢,

首先qemu把所有的ram片段(由MemoryRegion生成的RamBlock)平坦的鋪開,串聯起來, 放在

ram_list 鏈表裏面, 用於尋址ram, 這個平坦的ram地址用ram_addr_t表示(這裏的ram不光指ram還有rom地址), hva呢用unit8_t類型標示
gpa用hwaddr, 而target_ulong 表示gva

好了下面來看下qemu x86的cpu的mmu實現

864 /* NOTE: this function can trigger an exception */
865 /* NOTE2: the returned address is not exactly the physical address: it
866  * is actually a ram_addr_t (in system mode; the user mode emulation
867  * version of this function returns a guest virtual address).
868  */
869 tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr)
870 {
871     int mmu_idx, index, pd;
872     void *p;
873     MemoryRegion *mr;
874     CPUState *cpu = ENV_GET_CPU(env);
875     CPUIOTLBEntry *iotlbentry;
876     hwaddr physaddr;
877
878     index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
879     mmu_idx = cpu_mmu_index(env, true);
880     if (unlikely(env->tlb_table[mmu_idx][index].addr_code !=
881                  (addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK)))) {
882         if (!VICTIM_TLB_HIT(addr_read, addr)) {
883             tlb_fill(ENV_GET_CPU(env), addr, 0, MMU_INST_FETCH, mmu_idx, 0);
884         }
885     }
886     iotlbentry = &env->iotlb[mmu_idx][index];
887     pd = iotlbentry->addr & ~TARGET_PAGE_MASK;
888     mr = iotlb_to_region(cpu, pd, iotlbentry->attrs);
889     if (memory_region_is_unassigned(mr)) {
890         qemu_mutex_lock_iothread();
891         if (memory_region_request_mmio_ptr(mr, addr)) {
892             qemu_mutex_unlock_iothread();
893             /* A MemoryRegion is potentially added so re-run the
894              * get_page_addr_code.
895              */
896             return get_page_addr_code(env, addr);
897         }
898         qemu_mutex_unlock_iothread();
899
900         /* Give the new-style cpu_transaction_failed() hook first chance
901          * to handle this.
902          * This is not the ideal place to detect and generate CPU
903          * exceptions for instruction fetch failure (for instance
904          * we don't know the length of the access that the CPU would
905          * use, and it would be better to go ahead and try the access
906          * and use the MemTXResult it produced). However it is the
907          * simplest place we have currently available for the check.
908          */
909         physaddr = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
910         cpu_transaction_failed(cpu, physaddr, addr, 0, MMU_INST_FETCH, mmu_idx,
911                                iotlbentry->attrs, MEMTX_DECODE_ERROR, 0);
912
913         cpu_unassigned_access(cpu, addr, false, true, 0, 4);
914         /* The CPU's unassigned access hook might have longjumped out
915          * with an exception. If it didn't (or there was no hook) then
916          * we can't proceed further.
917          */
918         report_bad_exec(cpu, addr);
919         exit(1);
920     }
921     p = (void *)((uintptr_t)addr + env->tlb_table[mmu_idx][index].addend);
922     return qemu_ram_addr_from_host_nofail(p);
923 }

對於虛擬地址到物理地址的轉換, 其實有兩種模式, 一種是實地址模式,一種是虛地址模式, 虛地址模式需要進行虛擬地址到物理地址的轉換,實地址模式則gpa=gva, 虛地址轉換需要使用內存中的頁表來輔助轉換,爲了加快轉換過程,使用tlb進行緩存.

對於模擬的x86 cpu對於實地址模式也使用tlb進行地址查詢(具體真是的硬件是這樣的還是qemu做了簡化,這裏不去考證).

tlb相關的知識參考https://blog.csdn.net/leishangwen/article/details/27190959 tlb的工作過程, 這裏模擬的是直接映射方式時的TLB, 應爲這裏不需要對操作系統透明,所以我們模擬的cpu只需要實現tlb的硬件接口支持就好了.

整個尋址過程就是拿虛擬地址的13-18位(6位 64項) 作爲索引在tlb表中找到對應的項目. 另外根據19-31位比對地址是否命中, 另外輔助信息中也有一位表示tlb項是否有效,如果無效也未命中,則需要進行轉換填充tlb

另外由於內存使用的局部性原理, tlb的大小是有限的,tlb表象可能會被換出去,但是換出去後馬上又被訪問了,爲了解決這種問題,qemu增加了另外一張表叫tlb_v_table, 裏面緩存一些被換出去的表象.這個部分不使用虛擬地址進行索引, 通過遍歷該表找到地址轉換的緩存. v是victim這個單詞.

所以tlb的過程只是爲了加快地址轉換,mmu真正的目的還是要找到gva對應的gpa.

再來說下qemu相關的數據結構

CPUArchState 標示當前cpu的狀態信息, 比如寄存器信息,tlb信息等cpu的狀態,對於分析mmu比較重要的有如下這些項

/* use a fully associative victim tlb of 8 entries */
#define CPU_VTLB_SIZE 8
 
#if HOST_LONG_BITS == 32 && TARGET_LONG_BITS == 32
#define CPU_TLB_ENTRY_BITS 4
#else
#define CPU_TLB_ENTRY_BITS 5
#endif
 
/* TCG_TARGET_TLB_DISPLACEMENT_BITS is used in CPU_TLB_BITS to ensure that
 * the TLB is not unnecessarily small, but still small enough for the
 * TLB lookup instruction sequence used by the TCG target.
 *
 * TCG will have to generate an operand as large as the distance between
 * env and the tlb_table[NB_MMU_MODES - 1][0].addend.  For simplicity,
 * the TCG targets just round everything up to the next power of two, and
 * count bits.  This works because: 1) the size of each TLB is a largish
 * power of two, 2) and because the limit of the displacement is really close
 * to a power of two, 3) the offset of tlb_table[0][0] inside env is smaller
 * than the size of a TLB.
 *
 * For example, the maximum displacement 0xFFF0 on PPC and MIPS, but TCG
 * just says "the displacement is 16 bits".  TCG_TARGET_TLB_DISPLACEMENT_BITS
 * then ensures that tlb_table at least 0x8000 bytes large ("not unnecessarily
 * small": 2^15).  The operand then will come up smaller than 0xFFF0 without
 * any particular care, because the TLB for a single MMU mode is larger than
 * 0x10000-0xFFF0=16 bytes.  In the end, the maximum value of the operand
 * could be something like 0xC000 (the offset of the last TLB table) plus
 * 0x18 (the offset of the addend field in each TLB entry) plus the offset
 * of tlb_table inside env (which is non-trivial but not huge).
 */
#define CPU_TLB_BITS                                             \
    MIN(8,                                                       \
        TCG_TARGET_TLB_DISPLACEMENT_BITS - CPU_TLB_ENTRY_BITS -  \
        (NB_MMU_MODES <= 1 ? 0 :                                 \
         NB_MMU_MODES <= 2 ? 1 :                                 \
         NB_MMU_MODES <= 4 ? 2 :                                 \
         NB_MMU_MODES <= 8 ? 3 : 4))
 
#define CPU_TLB_SIZE (1 << CPU_TLB_BITS)
 
typedef struct CPUTLBEntry {
    /* bit TARGET_LONG_BITS to TARGET_PAGE_BITS : virtual address
       bit TARGET_PAGE_BITS-1..4  : Nonzero for accesses that should not
                                    go directly to ram.
       bit 3                      : indicates that the entry is invalid
       bit 2..0                   : zero
    */
    union {
        struct {
            target_ulong addr_read;
            target_ulong addr_write;
            target_ulong addr_code;
            /* Addend to virtual address to get host address.  IO accesses
               use the corresponding iotlb value.  */
            uintptr_t addend;
        };
        /* padding to get a power of two size */
        uint8_t dummy[1 << CPU_TLB_ENTRY_BITS];
    };
} CPUTLBEntry;
 
QEMU_BUILD_BUG_ON(sizeof(CPUTLBEntry) != (1 << CPU_TLB_ENTRY_BITS));
 
/* The IOTLB is not accessed directly inline by generated TCG code,
 * so the CPUIOTLBEntry layout is not as critical as that of the
 * CPUTLBEntry. (This is also why we don't want to combine the two
 * structs into one.)
 */
typedef struct CPUIOTLBEntry {
    hwaddr addr;
    MemTxAttrs attrs;
} CPUIOTLBEntry;
 
#define CPU_COMMON_TLB \
    /* The meaning of the MMU modes is defined in the target code. */   \
    CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];                  \
    CPUTLBEntry tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE];               \
    CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE];                    \
    CPUIOTLBEntry iotlb_v[NB_MMU_MODES][CPU_VTLB_SIZE];                 \
    size_t tlb_flush_count;                                             \
    target_ulong tlb_flush_addr;                                        \
    target_ulong tlb_flush_mask;                                        \
    target_ulong vtlb_index;

tlb_table 標示tlb表
tlb_v_table 就是tlb_table的victim表
iotlb 這個其實是和tlb_table一起使用的
iotlb_v 同理是iotlb的victim表

tlb_table的主要作用的進行地址翻譯, iotlb的主要作用是幫轉qemu進行gpa→hva的轉換 (這裏的hva不光包括ram模擬還包括rom和mmio)

另外tlb_table和 io_tlb 都是二唯數組, 第一個唯獨取決於cpu的工作模式,我們不去分析不同模式,這分析標準模式

有了上面的背景知識再來分析mmu其實是比較簡單的.

880行從tlb_table裏面查虛擬地址, 去過沒有則882行從tlb_v_table 表裏面查詢, 最終如果沒有命中, 怎麼辦呢, 883行調用

tlb_fill填充tlb

886-896行處理需要mmio的情況

900-919行爲異常情況, 不去分析

最後922 行使用qemu_ram_addr_from_host_nofail 來獲取對應的ram_addr_t .

mmu的具體地址轉換過程其實是在tlb_fill函數中實現，我們今天只分析實模式。

void tlb_fill(CPUState *cs, target_ulong addr, int size,
              MMUAccessType access_type, int mmu_idx, uintptr_t retaddr)
{
    int ret;

    ret = x86_cpu_handle_mmu_fault(cs, addr, size, access_type, mmu_idx);
    if (ret) {
        X86CPU *cpu = X86_CPU(cs);
        CPUX86State *env = &cpu->env;

        raise_exception_err_ra(env, cs->exception_index, env->error_code, retaddr);
    }
}

這裏是通過x86_cpu_handle_mmu_fault函數進行的地址轉換，如果轉換失敗則發生異常，這裏轉換完成之後會直接填充tlb，後面再從tlb中查詢，所以tlb_fill函數並無返回值。

這裏x86_cpu_handle_mmu_fault的參數cs爲cpu狀態，addr爲要轉換的虛擬地址， size爲要翻譯的地址大小（可能是多個頁面）， access_type爲觸發mm的操作類型，mmu_idx用於索引當前mmu的模式。

60 /* return value:
161  * -1 = cannot handle fault
162  * 0  = nothing more to do
163  * 1  = generate PF fault
164  */
165 int x86_cpu_handle_mmu_fault(CPUState *cs, vaddr addr, int size,
166                              int is_write1, int mmu_idx)
167 {
168     X86CPU *cpu = X86_CPU(cs);
169     CPUX86State *env = &cpu->env;
170     uint64_t ptep, pte;
171     int32_t a20_mask;
172     target_ulong pde_addr, pte_addr;
173     int error_code = 0;
174     int is_dirty, prot, page_size, is_write, is_user;
175     hwaddr paddr;
176     uint64_t rsvd_mask = PG_HI_RSVD_MASK;
177     uint32_t page_offset;
178     target_ulong vaddr;
179
180     is_user = mmu_idx == MMU_USER_IDX;

......

185     is_write = is_write1 & 1;
186
187     a20_mask = x86_get_a20_mask(env);
188     if (!(env->cr[0] & CR0_PG_MASK)) {
189         pte = addr;
190 #ifdef TARGET_X86_64
191         if (!(env->hflags & HF_LMA_MASK)) {
192             /* Without long mode we can only address 32bits in real mode */
193             pte = (uint32_t)pte;
194         }
195 #endif
196         prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
197         page_size = 4096;
198         goto do_mapping;
199     }
200

......

440  do_mapping:
441     pte = pte & a20_mask;
442
443     /* align to page_size */
444     pte &= PG_ADDRESS_MASK & ~(page_size - 1);
445
446     /* Even if 4MB pages, we map only one 4KB page in the cache to
447        avoid filling it too fast */
448     vaddr = addr & TARGET_PAGE_MASK;
449     page_offset = vaddr & (page_size - 1);
450     paddr = pte + page_offset;
451
452     assert(prot & (1 << is_write1));
453     tlb_set_page_with_attrs(cs, vaddr, paddr, cpu_get_mem_attrs(env),
454                             prot, mmu_idx, page_size);
455     return 0;
456  do_fault_rsvd:
457     error_code |= PG_ERROR_RSVD_MASK;
458  do_fault_protect:
459     error_code |= PG_ERROR_P_MASK;
460  do_fault:
461     error_code |= (is_write << PG_ERROR_W_BIT);
462     if (is_user)
463         error_code |= PG_ERROR_U_MASK;
464     if (is_write1 == 2 &&
465         (((env->efer & MSR_EFER_NXE) &&
466           (env->cr[4] & CR4_PAE_MASK)) ||
467          (env->cr[4] & CR4_SMEP_MASK)))
468         error_code |= PG_ERROR_I_D_MASK;
469     if (env->intercept_exceptions & (1 << EXCP0E_PAGE)) {
470         /* cr2 is not modified in case of exceptions */
471         x86_stq_phys(cs,
472                  env->vm_vmcb + offsetof(struct vmcb, control.exit_info_2),
473                  addr);
474     } else {
475         env->cr[2] = addr;
476     }
477     env->error_code = error_code;
478     cs->exception_index = EXCP0E_PAGE;
479     return 1;
480 }

187-198行獲取地址的總線的寬度，一般在i386cpu上，a20地址先開了之後處於保護模式，地址寬度爲32位，否則爲20位，如果沒有開cr0的CR0_PG_MASK位則是實模式，直接進行映射，也就是do_mapping後的操作。

調用tlb_set_page_with_attrs 填充tlb。

606 /* Add a new TLB entry. At most one entry for a given virtual address
 607  * is permitted. Only a single TARGET_PAGE_SIZE region is mapped, the
 608  * supplied size is only used by tlb_flush_page.
 609  *
 610  * Called from TCG-generated code, which is under an RCU read-side
 611  * critical section.
 612  */
 613 void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
 614                              hwaddr paddr, MemTxAttrs attrs, int prot,
 615                              int mmu_idx, target_ulong size)
 616 {
 617     CPUArchState *env = cpu->env_ptr;
 618     MemoryRegionSection *section;
 619     unsigned int index;
 620     target_ulong address;
 621     target_ulong code_address;
 622     uintptr_t addend;
 623     CPUTLBEntry *te, *tv, tn;
 624     hwaddr iotlb, xlat, sz;
 625     unsigned vidx = env->vtlb_index++ % CPU_VTLB_SIZE;
 626     int asidx = cpu_asidx_from_attrs(cpu, attrs);
 627
 628     assert_cpu_is_self(cpu);
 629     assert(size >= TARGET_PAGE_SIZE);
 630     if (size != TARGET_PAGE_SIZE) {
 631         tlb_add_large_page(env, vaddr, size);
 632     }
 633
 634     sz = size;
 635     section = address_space_translate_for_iotlb(cpu, asidx, paddr, &xlat, &sz);
 636     assert(sz >= TARGET_PAGE_SIZE);
 637
 638     tlb_debug("vaddr=" TARGET_FMT_lx " paddr=0x" TARGET_FMT_plx
 639               " prot=%x idx=%d\n",
 640               vaddr, paddr, prot, mmu_idx);
 641
 642     address = vaddr;
 643     if (!memory_region_is_ram(section->mr) && !memory_region_is_romd(section->mr)) {
 644         /* IO memory case */
 645         address |= TLB_MMIO;
 646         addend = 0;
 647     } else {
 648         /* TLB_MMIO for rom/romd handled below */
 649         addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
 650     }
 651
 652     code_address = address;
 653     iotlb = memory_region_section_get_iotlb(cpu, section, vaddr, paddr, xlat,
 654                                             prot, &address);
  655
 656     index = (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
 657     te = &env->tlb_table[mmu_idx][index];
 658     /* do not discard the translation in te, evict it into a victim tlb */
 659     tv = &env->tlb_v_table[mmu_idx][vidx];
 660
 661     /* addr_write can race with tlb_reset_dirty_range */
 662     copy_tlb_helper(tv, te, true);
 663
 664     env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
 665
 666     /* refill the tlb */
 667     env->iotlb[mmu_idx][index].addr = iotlb - vaddr;
 668     env->iotlb[mmu_idx][index].attrs = attrs;
 669
 670     /* Now calculate the new entry */
 671     tn.addend = addend - vaddr;
 672     if (prot & PAGE_READ) {
 673         tn.addr_read = address;
 674     } else {
 675         tn.addr_read = -1;
 676     }
 677
 678     if (prot & PAGE_EXEC) {
 679         tn.addr_code = code_address;
 680     } else {
 681         tn.addr_code = -1;
 682     }
 683
 684     tn.addr_write = -1;
 685     if (prot & PAGE_WRITE) {
 686         if ((memory_region_is_ram(section->mr) && section->readonly)
 687             || memory_region_is_romd(section->mr)) {
 688             /* Write access calls the I/O callback.  */
 689             tn.addr_write = address | TLB_MMIO;
 690         } else if (memory_region_is_ram(section->mr)
 691                    && cpu_physical_memory_is_clean(
 692                         memory_region_get_ram_addr(section->mr) + xlat)) {
 693             tn.addr_write = address | TLB_NOTDIRTY;
 694         } else {
 695             tn.addr_write = address;
 696         }
 697         if (prot & PAGE_WRITE_INV) {
 698             tn.addr_write |= TLB_INVALID_MASK;
 699         }
 700     }
 701
 702     /* Pairs with flag setting in tlb_reset_dirty_range */
 703     copy_tlb_helper(te, &tn, true);
 704     /* atomic_mb_set(&te->addr_write, write_address); */
 705 }

要弄懂這個函數必須要說下tlb_table 和io_tlb, 從名字也可以看出來tlb_table用於tlb轉換和ram類型的內存讀寫（直接訪問hva），mmio類型的讀訪問，所以CPUTLBEntry裏面包含addr_read, addr_write和addr_code，分別驗證讀寫執行是否可以直接訪問hva。 io_tlb則不負責rom，mmio類型內存的讀寫訪存。
CPUTLBEntry中的addend用於計算hva， (CPUTLBEntry->addend&PAGE_MASK) + gva = hva

CPUIOTLBEntry的addr用於addr有兩部分， ( CPUIOTLBEntry->addr & (PAGE_MASK)) 用於指向MemoryRegionSection。
當內存地址爲爲定義的ram或者rom的時候， (CPUIOTLBEntry->addr & PAGE_MASK) + gva = ram_addr_t
當CPUIOTLBEntry爲mmio的時候CPUIOTLBEntry->addr其實沒有什麼用，只需要找到MemoryRegionSection即可完成訪存操作

另外說下CPUTLBEntry->addr_read 當讀內存的時候會比對tlb_table該屬性，如果不可讀該值爲-1，如果TLB_MMIO被設置則使用io_tlb進行訪存
如果不是MMIO地址，如果可讀則該值可以用於定位hva。

CPUTLBEntry->addr_code 用於tlb緩存對比，和定位hva

CPUTLBEntry->addr_write 當寫內存的時候會比對tlb_table該屬性，如果不可寫該值爲-1，如果TLB_MMIO被設置則使用io_tlb進行訪存
如果不是MMIO地址，如果可寫則該值可以用於定位hva。

知道這些之後上面的代碼就一目瞭然了

qemu內存模型(6)mm實現(一)實模式

lightdb hash index的性能和限制

qemu-參數解析

Android虛擬機內存參數說明

arm32 stack check

ida使用技巧

netty nio模型

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結