(技術分析)kvm虛擬化原理

VMCS結構
VMCS是保持在內存中的數據結構,包含了虛擬cpu的相關寄存器的內容和虛擬cpu相關的控制信息,每個VMCS對應一個虛擬CPU。
VMCS在使用時需要與物理CPU綁定。在任意給定時候,VMCS與物理CPU是一對一的綁定關係,即一個物理CPU只能綁定一個VMCS,一個VMCS也只能與一個物理CPU綁定。VMCS在不同的時候可以綁定到不同的物理CPU,如在某個VMCS先和物理CPU1綁定,並在某個時候解除綁定關係,並重新綁定到物理CPU2.這種綁定關係的變化稱爲VMCS的遷移。
VT-x提供了二條指令用於VMCS的綁定與解除綁定。
VMPTRLD <VMCS地址>: 將指定的VMCS與執行該指令的物理CPU綁定。
VMCLEAR: 將執行該指令的物理CPU與它的VMCS解除綁定。該指令會將物理CPU緩存中的VMCS結構同步到內存中去,從而保證VMCS和新的物理CPU綁定時,內存中的值是最新的。
VT-x定義了VMCS的具體格式和內容。規定它是一個最大不超過4KB的內存塊,並且要求是4KB對齊。VMCS的格式,各域描述如下:
偏移0處是VMCS版本標識,表示VMCS數據格式的版本號。
偏移4處是VMX中止指示,VM-Exit執行不成功時產生VMX中止,CPU會在此處存入VMX中止的原因,以方便調試。
偏移8處時VMCS數據域,該域的格式是CPU相關的,不同型號的CPU可能使用不同格式,具體使用哪種格式由VMCS版本標識確定。
VMCS主要的信息存放在VMCS數據域,VT-x提供了二條指令用於訪問VMCS。
VMREAD < 索引>:讀VMCS 中索引 指定的域。
VMWRITE <索引><數據>:寫VMCS中索引指定的域。
VT-x爲VMCS數據域的每個字段也定義了相應的索引,通過上述二條指令也可以直接訪問VMCS數據域中的各個域。
具體而言,VMCS數據域包括下列六大類信息。

  1. guest-state(客戶機狀態域):保存客戶機運行時,即非根模式時的CPU狀態。當VM-Exit發生時,CPU把當前狀態存入客戶機狀態域;當VM-Entry發生時,CPU從客戶機狀態域恢復狀態。
  2. host-state(宿主機狀態域):保存VMM運行時,即根模式時的CPU狀態。當VM-Exit發生時,CPU從該域恢復CPU狀態。
  3. VM-Entry控制域:控制在VM-Entry時處理器的行爲。
  4. VM-Execution控制域:控制處理器在VMX非根模式下行爲,典型地可以控制某些條件引發VM-Exit事件,也控制着VMX的某些虛擬化功能的開啓,例如APIC的虛擬化及EPT機制。
  5. VM-Exit控制域:控制發生VM-Exi時的處理器的行爲。
  6. VM-Exit信息域:提供VM-Exit事件的原因及明細信息,VMM利用這些信息來決定如何管理和控制VM,VM-Exit信息域只是只讀的。

        VMCS中各個域的詳細分析:

    VM-execution控制類字段
    VIRTUAL_PROCESSOR_ID = 0x00000000, /SECONDARY_EXEC_ENABLE_VPID爲1,有效,提供16位的VPID/
    POSTED_INTR_NV = 0x00000002, /PIN_BASED_POSTED_INTR爲1時有效/
    IO_BITMAP_A = 0x00002000, /CPU_BASED_USE_IO_BITMAPS啓用時,該字段生效/
    IO_BITMAP_A_HIGH = 0x00002001,
    IO_BITMAP_B = 0x00002002,
    IO_BITMAP_B_HIGH = 0x00002003,
    /當CPU_BASED_USE_MSR_BITMAPS爲1時有效,當某位1時,訪問該位所對應的MSR將產生VM-exit,MSR bitmap區域爲4k,
    低半部分read bitmap,對應MSR範圍從00000000H到00001FFFH,用來控制MSR的讀訪問;
    高半部分read bitmap,對應MSR範圍從C0000000H到C0001FFFH,用來控制MSR的讀訪問;
    低半部分write bitmap,對應MSR範圍從00000000H到00001FFFH,用來控制MSR的寫訪問;
    高半部分write bitmap,對應MSR範圍從C0000000H到C0001FFFH,用來控制MSR的寫訪問;
    MSR bitmap的某位爲0時,訪問該位所對應的MSR不會產生VM-exit
    /
    MSR_BITMAP = 0x00002004,
    MSR_BITMAP_HIGH = 0x00002005,
    EXCUTIVE_VMCSP = 0x0000200c,
    EXCUTIVE_VMCSP_HIGH = 0x0000200d,
    /CPU_BASED_USE_TSC_OFFSETING爲1時,該字段提供64位的偏移值,執行RDTSC,RDTSCP,RDMSR指令
    讀取TSC時,返回的值爲TSC+TSC offset
    /
    TSC_OFFSET = 0x00002010,
    TSC_OFFSET_HIGH = 0x00002011,
    /當CPU_BASED_TPR_SHADOW爲1時,該字段有效,需要提供一個物理地址作爲4k的頁面/
    VIRTUAL_APIC_PAGE_ADDR = 0x00002012, /Virtual-APIC address (full)/
    VIRTUAL_APIC_PAGE_ADDR_HIGH = 0x00002013, /Virtual-APIC address (high)/
    /當SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES爲1時,該字段有效,需要提供一個物理地址
    作爲4k的頁面
    /
    APIC_ACCESS_ADDR = 0x00002014, /APIC-access address (full)/
    APIC_ACCESS_ADDR_HIGH = 0x00002015, /APIC-access address (high)/
    POSTED_INTR_DESC_ADDR = 0x00002016,
    POSTED_INTR_DESC_ADDR_HIGH = 0x00002017,

    /當SECONDARY_EXEC_ENABLE_EPT爲1時,支持guest端物理地址轉換爲host端的最終物理地址
    bit2:0指示EPT paging-structure的內存類型(uc或WB);bit5:3指示EPT頁表結構層級,這個值加1纔是真正的級數;
    bit6 =1指示EPT頁表結構項裏的access與dirty標誌位有效(EPT表項的bit8:9),處理器會更新EPT表項的這二個標誌位
    bit N-1:12提供EPT PML4T表的物理地址。
    EPT 頁表被載入專門的 EPT 頁表指針寄存器 EPTP。EPT 頁表對地址的映射機理與客戶機頁表對地址的映射機理相同
    /
    EPT_POINTER = 0x0000201a, /EPT pointer (EPTP; full)/
    EPT_POINTER_HIGH = 0x0000201b, /EPT pointer (EPTP; high)/

    /當SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY爲1時,該字段有效,用於控制發送EOI命令時是否
    產生VM-exit,對應的位爲1時,將產生VM-exit
    /
    EOI_EXIT_BITMAP0 = 0x0000201c, /對應向量號從0H到3FH/
    EOI_EXIT_BITMAP0_HIGH = 0x0000201d,
    EOI_EXIT_BITMAP1 = 0x0000201e, /對應向量號從40H到7FH/
    EOI_EXIT_BITMAP1_HIGH = 0x0000201f,
    EOI_EXIT_BITMAP2 = 0x00002020, /對應向量號從80H到BFH/
    EOI_EXIT_BITMAP2_HIGH = 0x00002021,
    EOI_EXIT_BITMAP3 = 0x00002022, /對應向量號從C0H到FFH/
    EOI_EXIT_BITMAP3_HIGH = 0x00002023,
    /VMCS Shadowing Bitmap Addresses/
    VMREAD_BITMAP = 0x00002026,
    VMWRITE_BITMAP = 0x00002028,

    / bit0 =1 發生外部中斷則產生VM-exit;bit2:1 保留位,固定爲1;
    bit3 =1 發生NMI則產生VM-exit;bit4 保留位,固定爲1;
    bit5 =1 定義virtual NMI;bit6 =1 啓用VMX-preemption定時器;
    bit7 =1 啓用posted-interrupt processing機制處理虛擬中斷;
    bit31:8 保留位,固定爲0
    /
    PIN_BASED_VM_EXEC_CONTROL = 0x00004000, /Pin-based VM-execution controls/

    /bit0 保留位,固定爲0;bit1 保留位,固定爲1;
    bit2 =1在IF=1並且中斷沒被阻塞時,產生VM-exit;bit3 =1讀取TSC值時,返回TSC值加上偏移值;
    bit6:4 保留值,固定爲1;bit7 =1執行HLT指令產生VM-exit;bit8 保留值,固定爲1;
    bit9 =1執行INVLPG指令產生VM-exit;bit10 =1執行MWAIT指令產生VM-exit;
    bit11 =1執行RDPMC指令產生VM-exit;bit12 =1執行RDTSC指令產生VM-exit;bit14:13保留值,固定爲1;
    bit15 =1寫CR3寄存器產生VM-exit;bit16 =1讀CR3寄存器產生VM-exit;bit18:17保留值,固定爲1;
    bit19 =1寫CR8寄存器產生VM-exit;bit20 =1讀CR8寄存器產生VM-exit;bit21 =1啓用virtual-APIC page頁面虛擬化local APIC;
    bit22 =1開virtual-NMI window時產生VM-exit;bit23 =1讀寫DR寄存器產生VM-exit;
    bit24 =1執行IN/OUT或INS/OUTS類指令產生VM-exit;bit25 =1啓用I/O bitmap;bit26 保留位,固定爲1;
    bit27 =1啓用MTF調試功能;bit28 =1啓用MSR bitmap;bit29 =1執行MONITOR指令產生VM-exit;
    bit30 =1執行PAUSE指令產生VM-exit;bit31 =1Secondary processor-based VM-execution controls字段有效
    /
    CPU_BASED_VM_EXEC_CONTROL = 0x00004002, /Primary processor-based VM-execution controls/

    /EXCEPTION_BITMAP字段是一個32位的值,每位對應一個異常向量,在VMX non-root中,如果發生異常, 處理器檢查EXCEPTION_BITMAP相應的位,該位爲1時則產生VM-exit,爲0時通過guest-IDT執行異常處理例程,當triple-fault發生時,直接產生VM-exit/
    EXCEPTION_BITMAP = 0x00004004, /Exception bitmap,異常控制/
    PAGE_FAULT_ERROR_CODE_MASK = 0x00004006,
    PAGE_FAULT_ERROR_CODE_MATCH = 0x00004008,
    /最大值爲4/
    CR3_TARGET_COUNT = 0x0000400a,

/當CPU_BASED_TPR_SHADOW爲1時,該字段有效,提供中斷優先級的門檻值,低於該值,VM-exit/
TPR_THRESHOLD = 0x0000401c,

/*bit0 =1虛擬化訪問APIC-access page;bit1 =1啓用EPT;bit2 =1訪問GDTR,LDTR,IDTR,TR

產生VM-exit;
bit3 =0執行RDTSCP指令產生#UD異常;bit4 =1虛擬化訪問x2APIC MSR;bit5 =1啓用VPID機制;
bit6 =1執行WBINVD指令產生VM-exit;bit7=1guest可以使用非分頁保護模式或者實模式;
bit8 =1支持訪問virtual-APIC page內的虛擬寄存器;bit9 =1支持虛擬中斷的delivery;
bit10 =1決定PASUE指令是否產生VM-exit;bit11 =1執行RDRAND指令產生VM-exit;
bit12 =1執行INVPCID指令產生#UD異常;bit13 =1VMX non-root operation可以執行VMFUNC指令;
bit31:14保留位,固定爲0/
SECONDARY_VM_EXEC_CONTROL= 0x0000401e, /
Secondary processor-based VM-execution controls*/
PLE_GAP = 0x00004020,
PLE_WINDOW = 0x00004022,

/位爲1時,表示該位權利屬於host所有,爲0時,表示該位guest有權設置/
CR0_GUEST_HOST_MASK = 0x00006000, /加速客戶機寫CR0指令/
CR4_GUEST_HOST_MASK = 0x00006002,
CR0_READ_SHADOW = 0x00006004, /加速客戶機讀CR0指令/
CR4_READ_SHADOW = 0x00006006,
CR3_TARGET_VALUE0 = 0x00006008,
CR3_TARGET_VALUE1 = 0x0000600a,
CR3_TARGET_VALUE2 = 0x0000600c,
CR3_TARGET_VALUE3 = 0x0000600e,

VM-entry控制類字段
VM_ENTRY_MSR_LOAD_ADDR = 0x0000200a,
VM_ENTRY_MSR_LOAD_ADDR_HIGH = 0x0000200b,

/*bit1:0 保留位,固定爲1;bit2 =1加載debug寄存器;bit8:3保留位,固定爲1;
  bit9 =1進入IA-32e模式;bit10 =1進入SMM模式;bit11 =1返回executive monitor,關閉SMM雙重監控處理;
  bit12保留位,固定爲1;bit13 =1加載IA32_PERF_GLOBAL_CTRL;bit14 =1加載IA32_PAT;
  bit15 =1加載IA32_EFER;bit31:16保留值,固定爲0*/

VM_ENTRY_CONTROLS= 0x00004012, /VM-Entry Controls,由寄存器MSR_IA32_VMX_ENTRY_CTLS控制/
VM_ENTRY_MSR_LOAD_COUNT = 0x00004014,

/*bit7:0中斷或異常向量號;
  bit10:8Interruption type:
    0: External interrupt
    1: Reserved
    2: Non-maskable interrupt (NMI)
    3: Hardware exception
    4: Software interrupt
    5: Privileged software exception
    6: Software exception
    7: Other event
 bit11 =1指示有錯誤碼需要提交;bit30:12保留位;
 bit31 =1指示VM_ENTRY_INTR_INFO_FIELD字段有效*/

VM_ENTRY_INTR_INFO_FIELD = 0x00004016, /事件注入控制字段/
VM_ENTRY_EXCEPTION_ERROR_CODE = 0x00004018, /VM-entry exception error code/
VM_ENTRY_INSTRUCTION_LEN = 0x0000401a, /VM-entry instruction length/

VM-exit控制類字段
VM_EXIT_MSR_STORE_ADDR = 0x00002006,
VM_EXIT_MSR_STORE_ADDR_HIGH = 0x00002007,
VM_EXIT_MSR_LOAD_ADDR = 0x00002008,
VM_EXIT_MSR_LOAD_ADDR_HIGH = 0x00002009,
/bit1:0保留值,固定爲1;bit2 =1保存debug寄存器;bit8:3保留值,固定爲1;bit9=1返回到
IA-32e模式;
bit11:10保留值,固定爲1;bit12=1加載IA32_PERF_GLOBAL_CTRL;bit14:13保留值,固定爲1;
bit15=1VM-exit時處理器響應中斷控制器,讀取中斷向量號;bit17:16保留值,固定爲1;
bit18=1保存IA32_PAT;bit19=1加載IA32_PAT;bit20=1保存IA32_EFER;bit21=1加載IA32_EFER;
bit22=1VM-exit時保存VMX定時器計數值;bit31:23保留值,固定爲0
/
VM_EXIT_CONTROLS = 0x0000400c, /VM-exit controls/
VM_EXIT_MSR_STORE_COUNT = 0x0000400e,
VM_EXIT_MSR_LOAD_COUNT = 0x00004010,

VM-exit信息類字段
VM_INSTRUCTION_ERROR = 0x00004400, /指令失敗類/
/基本信息類/
GUEST_PHYSICAL_ADDRESS = 0x00002400, /Guest-physical address保存由於EPT violation或者/
GUEST_PHYSICAL_ADDRESS_HIGH= 0x00002401,/EPT misconfiguration故障引起VM-exit時的GPA值/
VM_EXIT_REASON = 0x00004402, /Exit reason/
EXIT_QUALIFICATION = 0x00006400, /執行指令VM-exit原因,不同指令,該字段有不同的格式/
GUEST_LINEAR_ADDRESS = 0x0000640a, /保存導致VM-exit的某些事件的線性地址值/
/直接向量事件類/
VM_EXIT_INTR_INFO = 0x00004404, /VM-exit interruption information虛擬機退出原因/
VM_EXIT_INTR_ERROR_CODE = 0x00004406,
/間接向量事件類信息字段/
IDT_VECTORING_INFO_FIELD = 0x00004408,
IDT_VECTORING_ERROR_CODE = 0x0000440a,
/指令信息類/
VM_EXIT_INSTRUCTION_LEN = 0x0000440c,
VMX_INSTRUCTION_INFO = 0x0000440e,
/end VM-exit信息類字段/
/start guest-state區域字段/
GUEST_DR7 = 0x0000681a, /調試寄存器/
GUEST_RSP = 0x0000681c, /棧指針/
GUEST_RIP = 0x0000681e, /指令指針/
GUEST_RFLAGS = 0x00006820, /標誌寄存器/
/控制寄存器/
GUEST_CR0 = 0x00006800,
GUEST_CR3 = 0x00006802,
GUEST_CR4 = 0x00006804,
/6個數據/代碼段寄存器字段,分別爲ES,CS,SS,DS,FS,GS寄存器,2個系統段寄存器,分別是
LDTR和TR寄存器。
每個段寄存器有4個字段對應,分別描述段寄存器的各個域:
selector:16位字段;base:64位系統爲64位,否則爲32位;
limit:32位;access right:32位
access right字段格式:
bit3:0 type段類型值;bit4 0=system,1=code/data;bit6:5段的訪問權限;
bit7: 0=no present,1=present;bit11:8 保留;bit12 系統軟件可用;
bit13在IA-32e模式下爲L標誌,在legacy下爲保留位;bit14默認操作數size,0=16位,1=32位;
bit15段limit粒度,0=1byte,1=4kb;bit16 0=usable,1=unusable;bit31:17保留
/
/ES/
GUEST_ES_SELECTOR = 0x00000800,
GUEST_ES_LIMIT = 0x00004800,
GUEST_ES_AR_BYTES = 0x00004814,
GUEST_ES_BASE = 0x00006806,
/CS/
GUEST_CS_SELECTOR = 0x00000802,
GUEST_CS_LIMIT = 0x00004802,
GUEST_CS_AR_BYTES = 0x00004816,
GUEST_CS_BASE = 0x00006808,
/SS/
GUEST_SS_SELECTOR = 0x00000804,
GUEST_SS_LIMIT = 0x00004804,
GUEST_SS_AR_BYTES = 0x00004818,
GUEST_SS_BASE = 0x0000680a,
/DS/
GUEST_DS_SELECTOR = 0x00000806,
GUEST_DS_LIMIT = 0x00004806,
GUEST_DS_AR_BYTES = 0x0000481a,
GUEST_DS_BASE = 0x0000680c,
/FS/
GUEST_FS_SELECTOR = 0x00000808,
GUEST_FS_LIMIT = 0x00004808,
GUEST_FS_AR_BYTES = 0x0000481c,
GUEST_FS_BASE = 0x0000680e,
/GS/
GUEST_GS_SELECTOR = 0x0000080a,
GUEST_GS_LIMIT = 0x0000480a,
GUEST_GS_AR_BYTES = 0x0000481e,
GUEST_GS_BASE = 0x00006810,
/LDTR局部描述符表寄存器,指令LLDT指令裝載到LDTR/
GUEST_LDTR_SELECTOR = 0x0000080c,
GUEST_LDTR_LIMIT = 0x0000480c,
GUEST_LDTR_AR_BYTES = 0x00004820,
GUEST_LDTR_BASE = 0x00006812,
/TR任務寄存器/
GUEST_TR_SELECTOR = 0x0000080e,
GUEST_TR_LIMIT = 0x0000480e,
GUEST_TR_AR_BYTES = 0x00004822,
GUEST_TR_BASE = 0x00006814,
/二個描述符寄存器,GDTR和IDTR.由二個字段組成: base:提供描述符表基地址;limit:提供描述符表的長度. GDTR全局描述符表寄存器,LGDT指令將GDT的入口地址裝入此寄存器。/
GUEST_GDTR_LIMIT = 0x00004810,
GUEST_GDTR_BASE = 0x00006816,
/IDTR中斷描述符表寄存器/
GUEST_IDTR_LIMIT = 0x00004812,
GUEST_IDTR_BASE = 0x00006818,
/MSR/
GUEST_IA32_DEBUGCTL = 0x00002802,
GUEST_IA32_DEBUGCTL_HIGH = 0x00002803,
GUEST_IA32_PAT = 0x00002804,
GUEST_IA32_PAT_HIGH = 0x00002805,
GUEST_IA32_EFER = 0x00002806,
GUEST_IA32_EFER_HIGH = 0x00002807,
GUEST_IA32_PERF_GLOBAL_CTRL = 0x00002808,
GUEST_IA32_PERF_GLOBAL_CTRL_HIGH= 0x00002809,
GUEST_SYSENTER_CS = 0x0000482A,
GUEST_SYSENTER_ESP = 0x00006824,
GUEST_SYSENTER_EIP = 0x00006826,
非寄存器類字段
GUEST_INTR_STATUS = 0x00000810,/指示虛擬中斷的狀態/
VMCS_LINK_POINTER = 0x00002800,
VMCS_LINK_POINTER_HIGH = 0x00002801,
GUEST_PDPTR0 = 0x0000280a, /開啓EPT使用的字段/
GUEST_PDPTR0_HIGH = 0x0000280b,
GUEST_PDPTR1 = 0x0000280c,
GUEST_PDPTR1_HIGH = 0x0000280d,
GUEST_PDPTR2 = 0x0000280e,
GUEST_PDPTR2_HIGH = 0x0000280f,
GUEST_PDPTR3 = 0x00002810,
GUEST_PDPTR3_HIGH = 0x00002811,
GUEST_ACTIVITY_STATE = 0X00004826,/guest-state指示虛擬機進入/退出,虛擬處理器活動狀態/
GUEST_INTERRUPTIBILITY_INFO = 0x00004824,/當前虛擬處理器的可中斷性/
VMX_PREEMPTION_TIMER_VALUE = 0x0000482E,
GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822,/pending debug exceptions/

host-state區域字段
HOST_RSP = 0x00006c14, /棧指針/
HOST_RIP = 0x00006c16, /指令指針/
/控制寄存器/
HOST_CR0 = 0x00006c00,
HOST_CR3 = 0x00006c02,
HOST_CR4 = 0x00006c04,
/段選擇寄存器/
HOST_ES_SELECTOR = 0x00000c00,
HOST_CS_SELECTOR = 0x00000c02,
HOST_SS_SELECTOR = 0x00000c04,
HOST_DS_SELECTOR = 0x00000c06,
HOST_FS_SELECTOR = 0x00000c08,
HOST_GS_SELECTOR = 0x00000c0a,
HOST_TR_SELECTOR = 0x00000c0c,
/段基址寄存器/
HOST_FS_BASE = 0x00006c06,
HOST_GS_BASE = 0x00006c08,
HOST_TR_BASE = 0x00006c0a,
HOST_GDTR_BASE = 0x00006c0c,
HOST_IDTR_BASE = 0x00006c0e,
/MSR寄存器/
HOST_IA32_PAT = 0x00002c00,
HOST_IA32_PAT_HIGH = 0x00002c01,
HOST_IA32_EFER = 0x00002c02,
HOST_IA32_EFER_HIGH = 0x00002c03,
HOST_IA32_PERF_GLOBAL_CTRL = 0x00002c04,
HOST_IA32_PERF_GLOBAL_CTRL_HIGH = 0x00002c05,
HOST_IA32_SYSENTER_CS = 0x00004c00,
HOST_IA32_SYSENTER_ESP = 0x00006c10,
HOST_IA32_SYSENTER_EIP = 0x00006c12,

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章