CPU虛擬化系列文章1——x86架構CPU虛擬化

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文摘自於王柏生、謝廣軍撰寫的《深度探索Linux系統虛擬化:原理與實現》一書,介紹了CPU虛擬化的基本概念,探討了x86架構在虛擬化時面臨的障礙,以及爲支持CPU虛擬化,Intel在硬件層面實現的擴展VMX。同時,介紹了在VMX擴展支持下,虛擬CPU從Host模式到Guest模式,再回到Host模式的完整生命週期。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://link.zhihu.com/?target=https%3A//item.jd.com/12742101.html","title":null},"content":[{"type":"text","text":"https://item.jd.com/12742101.html","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ca/ca7990e93dacf97a8d62d49216b533a9.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Gerald J. Popek和Robert P. Goldberg在1974年發表的論文“Formal Requirements for Virtualizable Third Generation Architectures”中提出了虛擬化的3個條件:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1)等價性,","attrs":{}},{"type":"text","text":"即VMM需要在宿主機上爲虛擬機模擬出一個本質上與物理機一致的環境。虛擬機在這個環境上運行與其在物理機上運行別無二致,除了可能因爲資源競爭或者VMM的干預導致在虛擬環境中表現略有差異,比如虛擬機的I/O、網絡等因宿主機的限速或者多個虛擬機共享資源,導致速度可能要比獨佔物理機時慢一些。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2)高效性,","attrs":{}},{"type":"text","text":"即虛擬機指令執行的性能與其在物理機上運行相比並無明顯損耗。該標準要求虛擬機中的絕大部分指令無須VMM干預而直接運行在物理CPU上,比如我們在x86架構上通過Qemu運行的ARM系統並不是虛擬化,而是模擬。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"3)資源控制,","attrs":{}},{"type":"text","text":"即VMM可以完全控制系統資源。由VMM控制協調宿主機資源給各個虛擬機,而不能由虛擬機控制了宿主機的資源。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1 陷入和模擬模型","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了滿足Gerald J. Popek和Robert P. Goldberg提出的虛擬化的3個條件,一個典型的解決方案是陷入和模擬(Trap and Emulate)模型。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般來說,處理器分爲兩種運行模式:系統模式和用戶模式。相應地,CPU的指令也分爲特權指令和非特權指令。特權指令只能在系統模式運行,如果在用戶模式運行就將觸發處理器異常。操作系統允許內核運行在系統模式,因爲內核需要管理系統資源,需要運行特權指令,而普通的用戶程序則運行在用戶模式。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在陷入和模擬模型下,虛擬機的用戶程序仍然運行在用戶模式,但是虛擬機的內核也將運行在用戶模式,這種方式稱爲特權級壓縮(Ring Compression)。在這種方式下,虛擬機中的非特權指令直接運行在處理器上,滿足了虛擬化標準中高效的要求,即大部分指令無須VMM干預直接在處理器上運行。但是,當虛擬機執行特權指令時,因爲是在用戶模式下運行,將觸發處理器異常,從而陷入VMM中,由VMM代理虛擬機完成系統資源的訪問,即所謂的模擬(emulate)。如此,又滿足了虛擬化標準中VMM控制系統資源的要求,虛擬機將不會因爲可以直接運行特權指令而修改宿主機的資源,從而破壞宿主機的環境。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2 x86架構虛擬化的障礙","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Gerald J. Popek和Robert P. Goldberg指出,修改系統資源的,或者在不同模式下行爲有不同表現的,都屬於敏感指令。在虛擬化場景下,VMM需要監測這些敏感指令。一個支持虛擬化的體系架構的敏感指令都屬於特權指令,即在非特權級別執行這些敏感指令時CPU會拋出異常,進入VMM的異常處理函數,從而實現了控制VM訪問敏感資源的目的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,x86架構恰恰不能滿足這個準則。x86架構並不是所有的敏感指令都是特權指令,有些敏感指令在非特權模式下執行時並不會拋出異常,此時VMM就無法攔截處理VM的行爲了。我們以修改FLAGS寄存器中的IF(Interrupt Flag)爲例,我們首先使用指令pushf將FLAGS寄存器的內容壓到棧中,然後將棧頂的IF清零,最後使用popf指令從棧中恢復FLAGS寄存器。如果虛擬機內核沒有運行在ring 0,x86的CPU並不會拋出異常,而只是默默地忽略指令popf,因此虛擬機關閉IF的目的並沒有生效。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有人提出半虛擬化的解決方案,即修改Guest的代碼,但是這不符合虛擬化的透明準則。後來,人們提出了二進制翻譯的方案,包括靜態翻譯和動態翻譯。靜態翻譯就是在運行前掃描整個可執行文件,對敏感指令進行翻譯,形成一個新的文件。然而,靜態翻譯必須提前處理,而且對於有些指令只有在運行時纔會產生的副作用,無法靜態處理。於是,動態翻譯應運而生,即在運行時以代碼塊爲單元動態地修改二進制代碼。動態翻譯在很多VMM中得到應用,而且優化的效果非常不錯。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3 VMX","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然大家從軟件層面採用了多種方案來解決x86架構在虛擬化時遇到的問題,但是這些解決方案除了引入了額外的開銷外,還給VMM的實現帶來了巨大的複雜性。於是,Intel嘗試從硬件層面解決這個問題。Intel並沒有將那些非特權的敏感指令修改爲特權指令,因爲並不是所有的特權指令都需要攔截處理。舉一個典型的例子,每當操作系統內核切換進程時,都會切換cr3寄存器,使其指向當前運行進程的頁表。但是,當使用影子頁表進行GVA到HPA的映射時,VMM模塊需要捕獲Guest每一次設置cr3寄存器的操作,使其指向影子頁表。而當啓用了硬件層面的EPT支持後,cr3寄存器不再需要指向影子頁表,其仍然指向Guest的進程的頁表。因此,VMM無須再捕捉Guest設置cr3寄存器的操作,也就是說,雖然寫cr3寄存器是一個特權操作,但這個操作不需要陷入VMM。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Intel開發了VT技術以支持虛擬化,爲CPU增加了Virtual-Machine Extensions,簡稱VMX。一旦啓動了CPU的VMX支持,CPU將提供兩種運行模式:VMX Root Mode和VMX non-Root Mode,每一種模式都支持ring 0 ~ ring 3。VMM運行在VMX Root Mode,除了支持VMX外,VMX Root Mode和普通的模式並無本質區別。VM運行在VMX non-Root Mode,Guest無須再採用特權級壓縮方式,Guest kernel可以直接運行在VMX non-Root Mode的ring 0中,如圖1所示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/25/25dcc620fb86bcd94f80be620f1a7eef.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖1 VMX運行模式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"處於VMX Root Mode的VMM可以通過執行CPU提供的虛擬化指令VMLaunch切換到VMX non-Root Mode,因爲這個過程相當於進入Guest,所以通常也被稱爲VM entry。當Guest內部執行了敏感指令,比如某些I/O操作後,將觸發CPU發生陷入的動作,從VMX non-Root Mode切換回VMX Root Mode,這個過程相當於退出VM,所以也稱爲VM exit。然後VMM將對Guest 的操作進行模擬。相比於將Guest的內核也運行在用戶模式(ring 1 ~ ring 3)的方式,支持VMX的CPU有以下3點不同:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)運行於Guest模式時,Guest用戶空間的系統調用直接陷入Guest模式的內核空間,而不再是陷入Host模式的內核空間。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)對於外部中斷,因爲需要由VMM控制系統的資源,所以處於Guest模式的CPU收到外部中斷後,則觸發CPU從Guest模式退出到Host模式,由Host內核處理外部中斷。處理完中斷後,再重新切入Guest模式。爲了提高I/O效率,Intel支持外設透傳模式,在這種模式下,Guest不必產生VM exit,“設備虛擬化”一章將討論這種特殊方式。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)不再是所有的特權指令都會導致處於Guest模式的CPU發生VM exit,僅當運行敏感指令時纔會導致CPU從Guest模式陷入Host模式,因爲有的特權指令並不需要由VMM介入處理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如同一個CPU可以分時運行多個任務一樣,每個任務有自己的上下文,由調度器在調度時切換上下文,從而實現同一個CPU同時運行多個任務。在虛擬化場景下,同一個物理CPU“一人分飾多角”,分時運行着Host及Guest,在不同模式間按需切換,因此,不同模式也需要保存自己的上下文。爲此,VMX設計了一個保存上下文的數據結構:VMCS。每一個Guest都有一個VMCS實例,當物理CPU加載了不同的VMCS時,將運行不同的Guest如圖2所示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/aa/aa901a938124a7477c572ca5e1159257.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖2 多個Guest切換","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"VMCS中主要保存着兩大類數據,一類是狀態,包括Host的狀態和Guest的狀態,另外一類是控制Guest運行時的行爲。其中:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)Guest-state area,保存虛擬機狀態的區域。當發生VM exit時,Guest的狀態將保存在這個區域;當VM entry時,這些狀態將被裝載到CPU中。這些都是硬件層面的自動行爲,無須VMM編碼干預。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)Host-state area,保存宿主機狀態的區域。當發生VM entry時,CPU自動將宿主機狀態保存到這個區域;當發生VM exit時,CPU自動從VMCS恢復宿主機狀態到物理CPU。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)VM-exit information fields。當虛擬機發生VM exit時,VMM需要知道導致VM exit的原因,然後才能“對症下藥”,進行相應的模擬操作。爲此,CPU會自動將Guest退出的原因保存在這個區域,供VMM使用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)VM-execution control fields。這個區域中的各種字段控制着虛擬機運行時的一些行爲,比如設置Guest運行時訪問cr3寄存器時是否觸發VM exit;控制VM entry與VM exit時行爲的VM-entry control fields和VM-exit control fields。此外還有很多不同功能的區域,我們不再一一列舉,讀者如有需要可以查閱Intel手冊。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在創建VCPU時,KVM模塊將爲每個VCPU申請一個VMCS,每次CPU準備切入Guest模式時,將設置其VMCS指針指向即將切入的Guest對應的VMCS實例:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"commit 6aa8b732ca01c3d7a54e93f4d701b8aabbe60fb7\n[PATCH] kvm: userspace interface\nlinux.git/drivers/kvm/vmx.c\n\nstatic struct kvm_vcpu *vmx_vcpu_load(struct kvm_vcpu *vcpu)\n{\n u64 phys_addr = __pa(vcpu->vmcs);\n int cpu;\n\n cpu = get_cpu();\n …\n if (per_cpu(current_vmcs, cpu) != vcpu->vmcs) {\n …\n per_cpu(current_vmcs, cpu) = vcpu->vmcs;\n asm volatile (ASM_VMX_VMPTRLD_RAX \"; setna %0\"\n : \"=g\"(error) : \"a\"(&phys_addr), \"m\"(phys_addr)\n : \"cc\");\n …\n }\n …\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"並不是所有的狀態都由CPU自動保存與恢復,我們還需要考慮效率。以cr2寄存器爲例,大多數時候,從Guest退出Host到再次進入Guest期間,Host並不會改變cr2寄存器的值,而且寫cr2的開銷很大,如果每次VM entry時都更新一次cr2,除了浪費CPU的算力毫無意義。因此,將這些狀態交給VMM,由軟件自行控制更爲合理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4 VCPU生命週期","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於每個虛擬處理器(VCPU),VMM使用一個線程來代表VCPU這個實體。在Guest運轉過程中,每個VCPU基本都在如圖3所示的狀態中不斷地轉換。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8e/8e3df2ab4e9df32dbb59c72d8c8eaa05.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖3 VCPU生命週期","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"在用戶空間準備好後,VCPU所在線程向內核中KVM模塊發起一個ioctl請求KVM_RUN,告知內核中的KVM模塊,用戶空間的操作已經完成,可以切入Guest模式運行Guest了。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"在進入內核態後,KVM模塊將調用CPU提供的虛擬化指令切入Guest模式。如果是首次運行Guest,則使用VMLaunch指令,否則使用VMResume指令。在這個切換過程中,首先,CPU的狀態(也就是Host的狀態)將會被保存到VMCS中存儲Host狀態的區域,非CPU自動保存的狀態由KVM負責保存。然後,加載存儲在VMCS中的Guest的狀態到物理CPU,非CPU自動恢復的狀態則由KVM負責恢復。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"物理CPU切入Guest模式,運行Guest指令。當執行Guest指令遇到敏感指令時,CPU將從Guest模式切回到Host模式的ring 0,進入Host內核的KVM模塊。在這個切換過程中,首先,CPU的狀態(也就是Guest的狀態)將會被保存到VMCS中存儲Guest狀態的區域,然後,加載存儲在VMCS中的Host的狀態到物理CPU。同樣的,非CPU自動保存的狀態由KVM模塊負責保存。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"處於內核態的KVM模塊從VMCS中讀取虛擬機退出原因,嘗試在內核中處理。如果內核中可以處理,那麼虛擬機就不必再切換到Host模式的用戶態了,處理完後,直接快速切回Guest。這種退出也稱爲輕量級虛擬機退出。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"如果內核態的KVM模塊不能處理虛擬機退出,那麼VCPU將再進行一次上下文切換,從Host的內核態切換到Host的用戶態,由VMM的用戶空間部分進行處理。VMM用戶空間處理完畢,再次發起切入Guest模式的指令。在整個虛擬機運行過程中,步驟1~5循環往復。","attrs":{}}]}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面是KVM切入、切出Guest的代碼:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"commit 6aa8b732ca01c3d7a54e93f4d701b8aabbe60fb7\n[PATCH] kvm: userspace interface\nlinux.git/drivers/kvm/vmx.c\n\nstatic int vmx_vcpu_run(struct kvm_vcpu *vcpu, …)\n{\n u8 fail;\n u16 fs_sel, gs_sel, ldt_sel;\n int fs_gs_ldt_reload_needed;\n\nagain:\n …\n /* Enter guest mode */\n \"jne launched \\n\\t\"\n ASM_VMX_VMLAUNCH \"\\n\\t\"\n \"jmp kvm_vmx_return \\n\\t\"\n \"launched: \" ASM_VMX_VMRESUME \"\\n\\t\"\n \".globl kvm_vmx_return \\n\\t\"\n \"kvm_vmx_return: \"\n /* Save guest registers, load host registers, keep flags */\n …\n if (kvm_handle_exit(kvm_run, vcpu)) {\n …\n goto again;\n }\n }\n return 0;\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在從Guest退出時,KVM模塊首先調用函數kvm_handle_exit嘗試在內核空間處理Guest退出。函數kvm_handle_exit有個約定,如果在內核空間可以成功處理虛擬機退出,或者是因爲其他干擾比如外部中斷導致虛擬機退出等無須切換到Host的用戶空間,則返回1;否則返回0,表示需要求助KVM的用戶空間處理虛擬機退出,比如需要KVM用戶空間的模擬設備處理外設請求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果內核空間成功處理了虛擬機的退出,則函數kvm_handle_exit返回1,在上述代碼中即直接跳轉到標籤again處,然後程序流程會再次切入Guest。如果函數kvm_handle_exit返回0,則函數vmx_vcpu_run結束執行,CPU從內核空間返回到用戶空間,以kvmtool爲例,其相關代碼片段如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"commit 8d20223edc81c6b199842b36fcd5b0aa1b8d3456\nDump KVM_EXIT_IO details\nkvmtool.git/kvm.c\n\nint main(int argc, char *argv[])\n{\n …\n for (;;) {\n kvm__run(kvm);\n\n switch (kvm->kvm_run->exit_reason) {\n case KVM_EXIT_IO:\n …\n }\n …\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據代碼可見,kvmtool發起進入Guest的代碼處於一個for的無限循環中。當從KVM內核空間返回用戶空間後,kvmtool在用戶空間處理Guest的請求,比如調用模擬設備處理I/O請求。在處理完Guest的請求後,重新進入下一輪for循環,kvmtool再次請求KVM模塊切入Guest。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ec/ec266841fac439ef402ca60c7e6666d9.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"作者簡介:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"王柏生","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"資深技術專家,先後就職於中科院軟件所、紅旗Linux和百度,現任百度主任架構師。在操作系統、虛擬化技術、分佈式系統、雲計算、自動駕駛等相關領域耕耘多年,有着豐富的實踐經驗。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"著有暢銷書《深度探索Linux操作系統》(2013年出版)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"謝廣軍","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算機專業博士,畢業於南開大學計算機系。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"資深技術專家,有多年的IT行業工作經驗。現擔任百度智能雲副總經理,負責雲計算相關產品的研發。多年來一直從事操作系統、虛擬化技術、分佈式系統、大數據、雲計算等相關領域的研發工作,實踐經驗豐富。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"*本文經出版社授權發佈,更多關於虛擬化技術的內容推薦閱讀《深度探索Linux系統虛擬化:原理與實現》。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章