Meltdown 論文翻譯


摘要(Abstract)

    The security of computer systems fundamentally relies on memory isolation, e.g., kernel address ranges are marked as non-accessible and are protected from user access. In this paper, we present Meltdown. Meltdown exploits side effects of out-of-order execution on modern processors to read arbitrary kernel-memory locations including personal data and passwords. Out-of-order execution is an indispensable performance feature and present in a wide range of modern processors. The attack is independent of the operating system, and it does not rely on any software vulnerabilities. Meltdown breaks all security assumptions given by address space isolation as well as paravirtualized environments and, thus, every security mechanism building upon this foundation. On affected systems, Meltdown enables an adversary to read memory of other processes or virtual machines in the cloud without any permissions or privileges, affecting millions of customers and virtually every user of a personal computer. We show that the KAISER defense mechanism for KASLR [8] has the important (but inadvertent) side effect of impeding Meltdown. We stress that KAISER must be deployed immediately to prevent large-scale exploitation of this severe information leakage.

    內存隔離是計算機系統安全的基礎,例如:內核空間的地址段往往是標記爲受保護的,用戶態程序讀寫內核地址則會觸發異常,從而阻止其訪問。在這篇文章中,我們會詳細描述這個叫Meltdown的硬件漏洞。Meltdown是利用了現代處理器上亂序執行(out-of-order execution)的副作用(side effect),使得用戶態程序也可以讀出內核空間的數據,包括個人私有數據和密碼。由於可以提高性能,現代處理器廣泛採用了亂序執行特性。利用Meltdown進行攻擊的方法和操作系統無關,也不依賴於軟件的漏洞。地址空間隔離帶來的安全保證被Meltdown給無情的打碎了(半虛擬化環境也是如此),因此,所有基於地址空間隔離的安全機制都不再安全了。在受影響的系統中,Meltdown可以讓一個攻擊者讀取其他進程的數據,或者讀取雲服務器中其他虛擬機的數據,而不需要相應的權限。這份文檔也說明了KAISER(本意是解決KASLR不能解決的問題)可以防止Meltdown攻擊。因此,我們強烈建議必須立即部署KAISER,以防止大規模、嚴重的信息泄漏。

一、簡介(Introduction)

    One of the central security features of today’s operating systems is memory isolation. Operating systems ensure that user applications cannot access each other’s memories and prevent user applications from reading or writing kernel memory. This isolation is a cornerstone of our computing environments and allows running multiple applications on personal devices or executing processes of multiple users on a single machine in the cloud.

    當今操作系統的核心安全特性之一是內存隔離。所謂內存隔離就是操作系統要確保用戶應用程序不能訪問彼此的內存,此外,它也要阻止用戶應用程序對內核空間的訪問。在個人設備上,多個進程並行運行,我們需要隔離彼此。在雲計算環境中,共享同一臺物理主機的多個用戶(虛擬機)的多個進程也是共存的,我們也不能讓某個用戶(虛擬機)的進程能夠訪問到其他用戶(虛擬機)的進程數據。因此,這種內核隔離是我們計算環境的基石。

    On modern processors, the isolation between the kernel and user processes is typically realized by a supervisor bit of the processor that defines whether a memory page of the kernel can be accessed or not. The basic idea is that this bit can only be set when entering kernel code and it is cleared when switching to user processes. This hardware feature allows operating systems to map the kernel into the address space of every process and to have very efficient transitions from the user process to the kernel, e.g., for interrupt handling. Consequently, in practice, there is no change of the memory mapping when switching from a user process to the kernel.

    在現代處理器上,內核和用戶地址空間的隔離通常由處理器控制寄存器中的一個bit實現(該bit被稱爲supervisor bit,標識當前處理器處於的模式),該bit定義了是否可以訪問kernel space的內存頁。基本的思路是:當執行內核代碼的時候才設置此位等於1,在切換到用戶進程時清除該bit。有了這種硬件特性的支持,操作系統可以將內核地址空間映射到每個進程。在用戶進程執行過程中,往往需要從用戶空間切換到內核空間,例如用戶進程通過系統調用請求內核空間的服務,或者當在用戶空間發生中斷的時候,需要切換到內核空間執行interrupt handler,以便來處理外設的異步事件。考慮到從用戶態切換內核態的頻率非常高,如果在這個過程中地址空間不需要切換,那麼系統性能就不會受到影響。

    In this work, we present Meltdown1. Meltdown is a novel attack that allows overcoming memory isolation completely by providing a simple way for any user process to read the entire kernel memory of the machine it executes on, including all physical memory mapped in the kernel region. Meltdown does not exploit any software vulnerability, i.e., it works on all major operating systems. Instead, Meltdown exploits side-channel information available on most modern processors, e.g., modern Intel microarchitectures since 2010 and potentially on other CPUs of other vendors.

    While side-channel attacks typically require very specific knowledge about the target application and are tailored to only leak information about its secrets, Meltdown allows an adversary who can run code on the vulnerable processor to obtain a dump of the entire kernel address space, including any mapped physical memory. The root cause of the simplicity and strength of Meltdown are side effects caused by out-of-order execution.

    在這項工作中,我們提出了利用meltdown漏洞進行攻擊的一種全新的方法,通過這種方法,任何用戶進程都可以攻破操作系統對地址空間的隔離,通過一種簡單的方法讀取內核空間的數據,這裏就包括映射到內核地址空間的所有的物理內存。Meltdown並不利用任何的軟件的漏洞,也就是說它對任何一種操作系統都是有效的。相反,它是利用大多數現代處理器(例如2010年以後的Intel微架構(microarchitectural),其他CPU廠商也可能潛伏這樣的問題)上的側信道(side-channel)信息來發起攻擊。一般的側信道攻擊(side-channel attack)都需要直到攻擊目標的詳細信息,然後根據這些信息指定具體的攻擊方法,從而獲取祕密數據。Meltdown攻擊方法則不然,它可以dump整個內核地址空間的數據(包括全部映射到內核地址空間的物理內存)。Meltdown攻擊力度非常很大,其根本原因是利用了亂序執行的副作用(side effect)。

    Out-of-order execution is an important performance feature of today’s processors in order to overcome latencies of busy execution units, e.g., a memory fetch unit needs to wait for data arrival from memory. Instead of stalling the execution, modern processors run operations out-of-order i.e., they look ahead and schedule subsequent operations to idle execution units of the processor. However, such operations often have unwanted side-effects, e.g., timing differences [28, 35, 11] can leak information from both sequential and out-of-order execution.

    有時候CPU執行單元在執行的時候會需要等待操作結果,例如加載內存數據到寄存器這樣的操作。爲了提高性能,CPU並不是進入stall狀態,而是採用了亂序執行的方法,繼續處理後續指令並調度該指令去空閒的執行單元去執行。然而,這種操作常常有不必要的副作用,而通過這些執行指令時候的副作用,例如時序方面的差異[ 28, 35, 11 ],我們可以竊取到相關的信息。

    From a security perspective, one observation is particularly significant: Out-of-order; vulnerable CPUs allow an unprivileged process to load data from a privileged (kernel or physical) address into a temporary CPU register. Moreover, the CPU even performs further computations based on this register value, e.g., access to an array based on the register value. The processor ensures correct program execution, by simply discarding the results of the memory lookups (e.g., the modified register states), if it turns out that an instruction should not have been executed. Hence, on the architectural level (e.g., the abstract definition of how the processor should perform computations), no security problem arises.

    雖然性能提升了,但是從安全的角度來看卻存在問題,關鍵點在於:在亂序執行下,被攻擊的CPU可以運行未授權的進程從一個需要特權訪問的地址上讀出數據並加載到一個臨時的寄存器中。CPU甚至可以基於該臨時寄存器的值執行進一步的計算,例如,基於該寄存器的值來訪問數組。當然,CPU最終還是會發現這個異常的地址訪問,並丟棄了計算的結果(例如將已經修改的寄存器值)。雖然那些異常之後的指令被提前執行了,但是最終CPU還是力挽狂瀾,清除了執行結果,因此看起來似乎什麼也沒有發生過。這也保證了從CPU體系結構角度來看,不存在任何的安全問題。

    However, we observed that out-of-order memory lookups influence the cache, which in turn can be detected through the cache side channel. As a result, an attacker can dump the entire kernel memory by reading privileged memory in an out-of-order execution stream, and transmit the data from this elusive state via a microarchitectural covert channel (e.g., Flush+Reload) to the outside world. On the receiving end of the covert channel, the register value is reconstructed. Hence, on the microarchitectural level (e.g., the actual hardware implementation), there is an exploitable security problem.

    然而,我們可以觀察亂序執行對cache的影響,從而根據這些cache提供的側信道信息來發起攻擊。具體的攻擊是這樣的:攻擊者利用CPU的亂序執行的特性來讀取需要特權訪問的內存地址並加載到臨時寄存器,程序會利用保存在該寄存器的數據來影響cache的狀態。然後攻擊者搭建隱蔽通道(例如,Flush+Reload)把數據傳遞出來,在隱蔽信道的接收端,重建寄存器值。因此,在CPU微架構(和實際的CPU硬件實現相關)層面上看的確是存在安全問題。

    Meltdown breaks all security assumptions given by the CPU’s memory isolation capabilities. We evaluated the attack on modern desktop machines and laptops, as well as servers in the cloud. Meltdown allows an unprivileged process to read data mapped in the kernel address space, including the entire physical memory on Linux and OS X, and a large fraction of the physical memory on Windows. This may include physical memory of other processes, the kernel, and in case of kernel-sharing sandbox solutions (e.g., Docker, LXC) or Xen in paravirtualization mode, memory of the kernel (or hypervisor), and other co-located instances. While the performance heavily depends on the specific machine, e.g., processor speed, TLB and cache sizes, and DRAM speed, we can dump kernel and physical memory with up to 503KB/s. Hence, an enormous number of systems are affected.

    CPU苦心經營的內核隔離能力被Meltdown輕而易舉的擊破了。我們對現代臺式機、筆記本電腦以及雲服務器進行了攻擊,並發現在Linux和OS X這樣的系統中,meltdown可以讓用戶進程dump所有的物理內存(由於全部物理內存被映射到了內核地址空間)。而在Window系統中,meltdown可以讓用戶進程dump大部分的物理內存。這些物理內存可能包括其他進程的數據或者內核的數據。在共享內核的沙箱(sandbox)解決方案(例如Docker,LXC)或者半虛擬化模式的Xen中,dump的物理內存數據也包括了內核(即hypervisor)以及其他的guest OS的數據。根據系統的不同(例如處理器速度、TLB和高速緩存的大小,和DRAM的速度),dump內存的速度可以高達503kB/S。因此,Meltdown的影響是非常廣泛的。

    The countermeasure KAISER [8], originally developed to prevent side-channel attacks targeting KASLR, inadvertently protects against Meltdown as well. Our evaluation shows that KAISER prevents Meltdown to a large extent. Consequently, we stress that it is of utmost importance to deploy KAISER on all operating systems immediately. Fortunately, during a responsible disclosure window, the three major operating systems (Windows, Linux, and OS X) implemented variants of KAISER and will roll out these patches in the near future.

    我們提出的對策是KAISER[ 8 ],KAISER最初是爲了防止針對KASLR的側信道攻擊,不過無意中也意外的解決了Meltdown漏洞。我們的評估表明,KAISER在很大程度上防止了Meltdown,因此,我們強烈建議在所有操作系統上立即部署KAISER。幸運的是,三大操作系統(Windows、Linux和OS X)都已經實現了KAISER變種,並會在不久的將來推出這些補丁。

    Meltdown is distinct from the Spectre Attacks [19] in several ways, notably that Spectre requires tailoring to the victim process’s software environment, but applies more broadly to CPUs and is not mitigated by KAISER.

    熔斷(Meltdown)與幽靈(Spectre)攻擊[19] 有幾點不同,最明顯的不同是發起幽靈攻擊需要了解受害者進程的軟件環境並針對這些信息修改具體的攻擊方法。不過在更多的CPU上存在Spectre漏洞,而且KAISER對Spectre無效。

Contributions. The contributions of this work are:

    1. We describe out-of-order execution as a new, extremely powerful, software-based side channel.

    2. We show how out-of-order execution can be combined with a microarchitectural covert channel to transfer the data from an elusive state to a receiver on the outside.

    3. We present an end-to-end attack combining out-oforder execution with exception handlers or TSX, to read arbitrary physical memory without any permissions or privileges, on laptops, desktop machines, and on public cloud machines.

    4. We evaluate the performance of Meltdown and the effects of KAISER on it.

這項工作的貢獻包括:

    1、 我們首次發現可以通過亂序執行這個側信道發起攻擊,攻擊力度非常強大

    2、 我們展示瞭如何通過亂序執行和處理器微架構的隱蔽通道來傳輸數據,泄露信息。

    3、 我們展示了一種利用亂序執行(結合異常處理或者TSX)的端到端的攻擊方法。通過這種方法,我們可以在沒有任何權限的情況下讀取了筆記本電腦,臺式機和雲服務器上的任意物理內存。

    4、 我們評估了Meltdown的性能以及KAISER對它的影響

    Outline. The remainder of this paper is structured as follows: In Section 2, we describe the fundamental problem which is introduced with out-of-order execution. In Section 3, we provide a toy example illustrating the side channel Meltdown exploits. In Section 4, we describe the building blocks of the full Meltdown attack. In Section 5, we present the Meltdown attack. In Section 6, we evaluate the performance of the Meltdown attack on several different systems. In Section 7, we discuss the effects of the software-based KAISER countermeasure and propose solutions in hardware. In Section 8, we discuss related work and conclude our work in Section 9.

    本文概述:本文的其餘部分的結構如下:在第2節中,我們描述了亂序執行帶來的基本問題,在第3節中,我們提供了一個簡單的示例來說明Meltdown利用的側信道。在第4節中,我們描述了Meltdown攻擊的方塊結構圖。在第5節中,我們展示如何進行Meltdown攻擊。在第6節中,我們評估了幾種不同系統上的meltdown攻擊的性能。在第7節中,我們討論了針對meltdown的軟硬件對策。軟件解決方案主要是KAISER機制,此外,我們也提出了硬件解決方案的建議。在第8節中,我們將討論相關工作,並在第9節給出我們的結論。

二、背景介紹(Background)

    In this section, we provide background on out-of-order execution, address translation, and cache attacks.

    這一小節,我們將描述亂序執行、地址翻譯和緩存攻擊的一些基本背景知識。

1、亂序執行(Out-of-order execution)

Out-of-order execution is an optimization technique that allows to maximize the utilization of all execution units of a CPU core as exhaustive as possible. Instead of processing instructions strictly in the sequential program order, the CPU executes them as soon as all required resources are available. While the execution unit of the current operation is occupied, other execution units can run ahead. Hence, instructions can be run in parallel as long as their results follow the architectural definition.

亂序執行是一種優化技術,通過該技術可以盡最大可能的利用CPU core中的執行單元。和順序執行的CPU不同,支持亂序執行的CPU可以不按照program order來執行代碼,只要指令執行的資源是OK的(沒有被佔用),那麼就進入執行單元執行。如果當前指令涉及的執行單元被佔用了,那麼其他指令可以提前運行(如果該指令涉及的執行單元是空閒的話)。因此,在亂序執行下,只要結果符合體系結構定義,指令可以並行運行。

In practice, CPUs supporting out-of-order execution support running operations speculatively to the extent that the processor’s out-of-order logic processes instructions before the CPU is certain whether the instruction will be needed and committed. In this paper, we refer to speculative execution in a more restricted meaning, where it refers to an instruction sequence following a branch, and use the term out-of-order execution to refer to any way of getting an operation executed before the processor has committed the results of all prior instructions.

在實際中,CPU的亂序執行和推測執行(speculative execution)捆綁在一起的。在CPU無法確定下一條指令是否一定需要執行的時候往往會進行預測,並根據預測的結果來完成亂序執行。在本文中,speculative execution被認爲是一個受限的概念,它特指跳轉指令之後的指令序列的執行。而亂序執行這個術語是指處理器在提交所有前面指令操作結果之前,就已經提前執行了當前指令。

In 1967, Tomasulo [33] developed an algorithm [33] that enabled dynamic scheduling of instructions to allow out-of-order execution. Tomasulo [33] introduced a unified reservation station that allows a CPU to use a data value as it has been computed instead of storing it to a register and re-reading it. The reservation station renames registers to allow instructions that operate on the same physical registers to use the last logical one to solve read-after-write (RAW), write-after-read (WAR) and write-after-write (WAW) hazards. Furthermore, the reservation unit connects all execution units via a common data bus (CDB). If an operand is not available, the reservation unit can listen on the CDB until it is available and then directly begin the execution of the instruction.

1967,Tomasulo設計了一種算法[ 33 ] [ 33 ],實現了指令的動態調度,從而允許了亂序執行。Tomasulo [ 33 ]爲CPU執行單元設計了統一的保留站(reservation station)。在過去,CPU執行單元需要從寄存器中讀出操作數或者把結果寫入寄存器,現在,有了保留站,CPU的執行單元可以使用它來讀取操作數並且保存操作結果。我們給出一個具體的RAW(read-after-write)的例子:

R2 <- R1 + R3

R4 <- R2 + R3

第一條指令是計算R1+R3並把結果保存到R2,第二條指令依賴於R2的值進行計算。在沒有保留站的時候,第一條指令的操作結果提交到R2寄存器之後,第二條指令纔可以執行,因爲需要從R2寄存器中加載操作數。如果有了保留站,那麼我們可以在保留站中重命名寄存器R2,我們稱這個寄存器是R2.rename。這時候,第一條指令執行之後就把結果保存在R2.rename寄存器中,而不需要把最終結果提交到R2寄存器中,這樣第二條指令就可以直接從R2.rename寄存器中獲取操作數並執行,從而解決了RAW帶來的hazard。WAR和WAW類似,不再贅述。(注:上面這一句的翻譯我自己做了一些擴展,方便理解保留站)。此外,保留站和所有的執行單元通過一個統一的CDB(common data bus)相連。如果操作數尚未準備好,那麼執行單元可以監聽CDB,一旦獲取到操作數,該執行單元會立刻開始指令的執行。

clip_image002

On the Intel architecture, the pipeline consists of the front-end, the execution engine (back-end) and the memory subsystem [14]. x86 instructions are fetched by the front-end from the memory and decoded to microoperations (μOPs) which are continuously sent to the execution engine. Out-of-order execution is implemented within the execution engine as illustrated in Figure 1. The Reorder Buffer is responsible for register allocation, register renaming and retiring. Additionally, other optimizations like move elimination or the recognition of zeroing idioms are directly handled by the reorder buffer. The μOPs are forwarded to the Unified Reservation Station that queues the operations on exit ports that are connected to Execution Units. Each execution unit can perform different tasks like ALU operations, AES operations, address generation units (AGU) or memory loads and stores. AGUs as well as load and store execution units are directly connected to the memory subsystem to process its requests.

在英特爾CPU體系結構中,流水線是由前端、執行引擎(後端)和內存子系統組成[14]。前端模塊將x86指令從存儲器中讀取出來並解碼成微操作(μOPS,microoperations),uOPS隨後被髮送給執行引擎。在執行引擎中實現了亂序執行,如上圖所示。重新排序緩衝區(Reorder Buffer)負責寄存器分配、寄存器重命名和將結果提交到軟件可見的寄存器(這個過程也稱爲retirement)。此外,reorder buffer還有一些其他的功能,例如move elimination 、識別zeroing idioms等。uOPS被髮送到統一保留站中,並在該保留站的輸出端口上進行排隊,而保留站的輸出端口則直接連接到執行單元。每個執行單元可以執行不同的任務,如ALU運算,AES操作,地址生成單元(AGU)、memory load和memory store。AGU、memory load和memory store這三個執行單元會直接連接到存儲子系統中以便處理內存請求。

Since CPUs usually do not run linear instruction streams, they have branch prediction units that are used to obtain an educated guess of which instruction will be executed next. Branch predictors try to determine which direction of a branch will be taken before its condition is actually evaluated. Instructions that lie on that path and do not have any dependencies can be executed in advance and their results immediately used if the prediction was correct. If the prediction was incorrect, the reorder buffer allows to rollback by clearing the reorder buffer and re-initializing the unified reservation station.

由於CPU並非總是運行線性指令流,所以它有分支預測單元。該單元可以記錄過去程序跳轉的結果並用它來推測下一條可能被執行的指令。分支預測單元會在實際條件被檢查之前確定程序跳轉路徑。如果位於該路徑上的指令沒有任何依賴關係,那麼這些指令可以提前執行。如果預測正確,指令執行的結果可以立即使用。如果預測不正確,reorder buffer可以回滾操作結果,而具體的回滾是通過清除重新排序緩衝區和初始化統一保留站來完成的。

Various approaches to predict the branch exist: With static branch prediction [12], the outcome of the branch is solely based on the instruction itself. Dynamic branch prediction [2] gathers statistics at run-time to predict the outcome. One-level branch prediction uses a 1-bit or 2-bit counter to record the last outcome of the branch [21]. Modern processors often use two-level adaptive predictors [36] that remember the history of the last n outcomes allow to predict regularly recurring patterns. More recently, ideas to use neural branch prediction [34, 18, 32] have been picked up and integrated into CPU architectures [3].

分支預測有各種各樣的方法:使用靜態分支預測[ 12 ] 的時候,程序跳轉的結果完全基於指令本身。動態分支預測[ 2 ] 則是在運行時收集統計數據來預測結果。一級分支預測使用1位或2位計數器來記錄跳轉結果[ 21 ]。現代處理器通常使用兩級自適應預測器[36],這種方法會記住最後n個歷史跳轉結果,並通過這些歷史跳轉記過來尋找有規律的跳轉模式。最近,使用神經分支預測[ 34, 18, 32 ]的想法被重新拾起並集成到CPU體系結構中[ 3 ]

2、地址空間(address space)

To isolate processes from each other, CPUs support virtual address spaces where virtual addresses are translated to physical addresses. A virtual address space is divided into a set of pages that can be individually mapped to physical memory through a multi-level page translation table. The translation tables define the actual virtual to physical mapping and also protection properties that are used to enforce privilege checks, such as readable, writable, executable and user-accessible. The currently used translation table that is held in a special CPU register. On each context switch, the operating system updates this register with the next process’ translation table address in order to implement per process virtual address spaces. Because of that, each process can only reference data that belongs to its own virtual address space. Each virtual address space itself is split into a user and a kernel part. While the user address space can be accessed by the running application, the kernel address space can only be accessed if the CPU is running in privileged mode. This is enforced by the operating system disabling the user accessible property of the corresponding translation tables. The kernel address space does not only have memory mapped for the kernel’s own usage, but it also needs to perform operations on user pages, e.g., filling them with data. Consequently, the entire physical memory is typically mapped in the kernel. On Linux and OS X, this is done via a direct-physical map, i.e., the entire physical memory is directly mapped to a pre-defined virtual address (cf. Figure 2).

clip_image004

爲了相互隔離進程,CPU支持虛擬地址空間,但是CPU向總線發出的是物理地址,因此程序中的虛擬地址需要被轉換爲物理地址。虛擬地址空間被劃分成一個個的頁面,這些頁面又可以通過多級頁表映射到物理頁面。除了虛擬地址到物理地址的映射,頁表也定義了保護屬性,如可讀的、可寫的、可執行的和用戶態是否可訪問等。當前使用頁表保存在一個特殊的CPU寄存器中(對於X86,這個寄存器就是cr3,對於ARM,這個寄存器是TTBR系列寄存器)。在上下文切換中,操作系統總是會用下一個進程的頁表地址來更新這個寄存器,從而實現了進程虛擬地址空間的切換。因此,每個進程只能訪問屬於自己虛擬地址空間的數據。每個進程的虛擬地址空間本身被分成用戶地址空間和內核地址空間部分。當進程運行在用戶態的時候只可以訪問用戶地址空間,只有在內核態下(CPU運行在特權模式),纔可以訪問內核地址空間。操作系統會disable內核地址空間對應頁表中的用戶是否可訪問屬性,從而禁止了用戶態對內核空間的訪問。內核地址空間不僅爲自身建立內存映射(例如內核的正文段,數據段等),而且還需要對用戶頁面進行操作,例如填充數據。因此,整個系統中的物理內存通常會映射在內核地址空間中。在Linux和OS X上,這是通過直接映射(direct-physical map)完成的,也就是說,整個物理內存直接映射到預定義的虛擬地址(參見上圖)。

Instead of a direct-physical map, Windows maintains a multiple so-called paged pools, non-paged pools, and the system cache. These pools are virtual memory regions in the kernel address space mapping physical pages to virtual addresses which are either required to remain in the memory (non-paged pool) or can be removed from the memory because a copy is already stored on the disk (paged pool). The system cache further contains mappings of all file-backed pages. Combined, these memory pools will typically map a large fraction of the physical memory into the kernel address space of every process.

Windows中的地址映射機制,沒有興趣瞭解。

The exploitation of memory corruption bugs often requires the knowledge of addresses of specific data. In order to impede such attacks, address space layout randomization (ASLR) has been introduced as well as nonexecutable stacks and stack canaries. In order to protect the kernel, KASLR randomizes the offsets where drivers are located on every boot, making attacks harder as they now require to guess the location of kernel data structures. However, side-channel attacks allow to detect the exact location of kernel data structures [9, 13, 17] or derandomize ASLR in JavaScript [6]. A combination of a software bug and the knowledge of these addresses can lead to privileged code execution.

利用memory corruption(指修改內存的內容而造成crash)bug進行攻擊往往需要知道特定數據的地址(因爲我們需要修改該地址中的數據)。爲了阻止這種攻擊,內核提供了地址空間佈局隨機化(ASLR)、非執行堆棧和堆棧溢出檢查三種手段。爲了保護內核,KASLR會在驅動每次開機加載的時候將其放置在一個隨機偏移的位置,這種方法使得攻擊變得更加困難,因爲攻擊者需要猜測內核數據結構的地址信息。然而,攻擊者可以利用側信道攻擊手段獲取內核數據結構的確定位置[ 9, 13, 17 ]或者在JavaScript中對ASLR 解隨機化[ 6 ]。結合本節描述的兩種機制,我們可以發起攻擊,實現特權代碼的執行。

3、緩存攻擊(Cache Attacks)

In order to speed-up memory accesses and address translation, the CPU contains small memory buffers, called caches, that store frequently used data. CPU caches hide slow memory access latencies by buffering frequently used data in smaller and faster internal memory. Modern CPUs have multiple levels of caches that are either private to its cores or shared among them. Address space translation tables are also stored in memory and are also cached in the regular caches.

爲了加速內存訪問和地址翻譯過程,CPU內部包含了一些小的內存緩衝區,我們稱之爲cache,用來保存近期頻繁使用的數據,這樣,CPU cache實際上是隱藏了底層慢速內存的訪問延遲。現代CPU有多個層次的cache,它們要麼是屬於特定CPU core的,要麼是在多個CPU core中共享的。地址空間的頁表存儲在內存中,它也被緩存在cache中(即TLB)。

Cache side-channel attacks exploit timing differences that are introduced by the caches. Different cache attack techniques have been proposed and demonstrated in the past, including Evict+Time [28], Prime+Probe [28, 29], and Flush+Reload [35]. Flush+Reload attacks work on a single cache line granularity. These attacks exploit the shared, inclusive last-level cache. An attacker frequently flushes a targeted memory location using the clflush instruction. By measuring the time it takes to reload the data, the attacker determines whether data was loaded into the cache by another process in the meantime. The Flush+Reload attack has been used for attacks on various computations, e.g., cryptographic algorithms [35, 16, 1], web server function calls [37], user input [11, 23, 31], and kernel addressing information [9].

緩存側信道攻擊(Cache side-channel attack)是一種利用緩存引入的時間差異而進行攻擊的方法,在訪問memory的時候,已經被cache的數據訪問會非常快,而沒有被cache的數據訪問比較慢,緩存側信道攻擊就是利用了這個時間差來偷取數據的。各種各樣的緩存攻擊技術已經被提出並證明有效,包括Evict+Time [ 28 ],Prime+Probe [ 28, 29 ],Flush+Reload [ 35 ]。Flush+Reload方法在單個緩存行粒度上工作。緩存側信道攻擊主要是利用共享的cache(包含的最後一級緩存)進行攻擊。攻擊者經常使用CLFLUSH指令將目標內存位置的cache刷掉。然後讀目標內存的數據並測量目標內存中數據加載所需的時間。通過這個時間信息,攻擊者可以獲取另一個進程是否已經將數據加載到緩存中。Flush+Reload攻擊已被用於攻擊各種算法,例如,密碼算法[ 35, 16, 1 ],Web服務器函數調用[ 37 ],用戶輸入[ 11, 23, 31 ],以及內核尋址信息[ 9 ]

A special use case are covert channels. Here the attacker controls both, the part that induces the side effect, and the part that measures the side effect. This can be used to leak information from one security domain to another, while bypassing any boundaries existing on the architectural level or above. Both Prime+Probe and Flush+Reload have been used in high-performance covert channels [24, 26, 10].

緩存側信道攻擊一個特殊的使用場景是構建隱蔽通道(covert channel)。在這個場景中,攻擊者控制隱蔽通道的發送端和接收端,也就是說攻擊者會通過程序觸發產生cache side effect,同時他也會去量測這個cache side effect。通過這樣的手段,信息可以繞過體系結構級別的邊界檢查,從一個安全域泄漏到外面的世界,。Prime+Probe 和 Flush+Reload這兩種方法都已被用於構建高性能隱蔽通道[ 24, 26, 10 ]

三、簡單示例(A toy example)

In this section, we start with a toy example, a simple code snippet, to illustrate that out-of-order execution can change the microarchitectural state in a way that leaks information. However, despite its simplicity, it is used as a basis for Section 4 and Section 5, where we show how this change in state can be exploited for an attack.

在這一章中,我們給出一個簡單的例子,並說明了在亂序執行的CPU上執行示例代碼是如何改變CPU的微架構狀態並泄露信息的。儘管它很簡單,不過仍然可以作爲第4章和第5章的基礎(在這些章節中,我們會具體展示meltdown攻擊)。

Listing 1 shows a simple code snippet first raising an (unhandled) exception and then accessing an array. The property of an exception is that the control flow does not continue with the code after the exception, but jumps to an exception handler in the operating system. Regardless of whether this exception is raised due to a memory access, e.g., by accessing an invalid address, or due to any other CPU exception, e.g., a division by zero, the control flow continues in the kernel and not with the next user space instruction.

1 raise_exception();

2 // the line below is never reached

3 access(probe_array[data * 4096]);

上面的列表顯示了一個簡單的代碼片段:首先觸發一個異常(我們並不處理它),然後訪問probe_array數組。異常會導致控制流不會執行異常之後的代碼,而是跳轉到操作系統中的異常處理程序去執行。不管這個異常是由於內存訪問而引起的(例如訪問無效地址),或者是由於其他類型的CPU異常(例如除零),控制流都會轉到內核中繼續執行,而不是停留在用戶空間,執行對probe_array數組的訪問。

Thus, our toy example cannot access the array in theory, as the exception immediately traps to the kernel and terminates the application. However, due to the out-of-order execution, the CPU might have already executed the following instructions as there is no dependency on the exception. This is illustrated in Figure 3. Due to the exception, the instructions executed out of order are not retired and, thus, never have architectural effects.

clip_image006

因此,我們給出的示例代碼在理論上不會訪問probe_array數組,畢竟異常會立即陷入內核並終止了該應用程序。但是由於亂序執行,CPU可能已經執行了異常指令後面的那些指令,要知道異常指令和隨後的指令沒有依賴性。如上圖所示。雖然異常指令後面的那些指令被執行了,但是由於產生了異常,那些指令並沒有提交(注:instruction retire,instruction commit都是一個意思,就是指將指令執行結果體現到軟件可見的寄存器或者memory中,不過retire這個術語翻譯成中文容易引起誤會,因此本文統一把retire翻譯爲提交或者不翻譯),因此從CPU 體系結構角度看沒有任何問題(也就是說軟件工程師從ISA的角度看不到這些指令的執行)。

Although the instructions executed out of order do not have any visible architectural effect on registers or memory, they have microarchitectural side effects. During the out-of-order execution, the referenced memory is fetched into a register and is also stored in the cache. If the out-of-order execution has to be discarded, the register and memory contents are never committed. Nevertheless, the cached memory contents are kept in the cache. We can leverage a microarchitectural side-channel attack such as Flush+Reload [35], which detects whether a specific memory location is cached, to make this microarchitectural state visible. There are other side channels as well which also detect whether a specific memory location is cached, including Prime+Probe [28, 24, 26], Evict+ Reload [23], or Flush+Flush [10]. However, as Flush+ Reload is the most accurate known cache side channel and is simple to implement, we do not consider any other side channel for this example.

雖然違反了program order,在CPU上執行了本不應該執行的指令,但是實際上從寄存器和memory上看,我們不能捕獲到任何這些指令產生的變化(也就是說沒有architecture effect)。不過,從CPU微架構的角度看確實是有副作用。在亂序執行過程中,加載內存值到寄存器同時也會把該值保存在cache中。如果必須要丟棄掉亂序執行的結果,那麼寄存器和內存值都不會commit。但是,cache中的內容並沒有丟棄,仍然在cache中。這時候,我們就可以使用微架構側信道攻擊(microarchitectural side-channel attack)的方法,例如Flush+Reload [35],來檢測是否指定的內存地址被cache了,從而讓這些微架構狀態信息變得對用戶可見。我們也有其他的方法來檢測內存地址是否被緩存,包括:Prime+Probe [28, 24, 26], Evict+ Reload [23], 或者Flush+Flush [10]。不過Flush+ Reload是最準確的感知cache side channel的方法,並且實現起來非常簡單,因此在本文中我們主要介紹Flush+ Reload。

Based on the value of data in this toy example, a different part of the cache is accessed when executing the memory access out of order. As data is multiplied by 4096, data accesses to probe array are scattered over the array with a distance of 4 kB (assuming an 1 B data type for probe array). Thus, there is an injective mapping from the value of data to a memory page, i.e., there are no two different values of data which result in an access to the same page. Consequently, if a cache line of a page is cached, we know the value of data. The spreading over different pages eliminates false positives due to the prefetcher, as the prefetcher cannot access data across page boundaries [14].

我們再次回到上面列表中的示例代碼。probe_array是一個按照4KB字節組織的數組,變化data變量的值就可以按照4K size來遍歷訪問該數組。如果在亂序執行中訪問了data變量指定的probe_array數組內的某個4K內存塊,那麼對應頁面(指的是probe_array數組內的4K內存塊)的數據就會被加載到cache中。因此,通過程序掃描probe_array數組中各個頁面的cache情況可以反推出data的數值(data數值和probe_array數組中的頁面是一一對應的)。在Intel處理器中,prefetcher不會跨越page的邊界,因此page size之間的cache狀態是完全獨立的。而在程序中把cache的檢測分散到若干個page上主要是爲了防止prefetcher帶來的誤報。

Figure 4 shows the result of a Flush+Reload measurement iterating over all pages, after executing the out-oforder snippet with data = 84. Although the array access should not have happened due to the exception, we can clearly see that the index which would have been accessed is cached. Iterating over all pages (e.g., in the exception handler) shows only a cache hit for page 84 This shows that even instructions which are never actually executed, change the microarchitectural state of the CPU. Section 4 modifies this toy example to not read a value, but to leak an inaccessible secret.

clip_image008

上圖是通過Flush+Reload 方法遍歷probe_array數組中的各個page並計算該page數據的訪問時間而繪製的座標圖。橫座標是page index,共計256個,縱座標是訪問時間,如果cache miss,那麼訪問時間大概是400多個cycle,如果cache hit,訪問時間大概是200個cycle以下,二者有顯著的區別。從上圖我們可以看出,雖然由於異常,probe_array數組訪問不應該發生,不過在data=84上明顯是cache hit的,這也說明了在亂序執行下,本不該執行的指令也會影響CPU微架構狀態,在下面的章節中,我們將修改示例代碼,去竊取祕密數據。

四、Meltdown攻擊架構圖(Building block of attack)

The toy example in Section 3 illustrated that side-effects of out-of-order execution can modify the microarchitectural state to leak information. While the code snippet reveals the data value passed to a cache-side channel, we want to show how this technique can be leveraged to leak otherwise inaccessible secrets. In this section, we want to generalize and discuss the necessary building blocks to exploit out-of-order execution for an attack.

上一章中我們通過簡單的示例代碼展示了亂序執行的副作用會修改微架構狀態,從而造成信息泄露。通過代碼片段我們已經看到了data變量值已經傳遞到緩存側通道上,下面我們會詳述如何利用這種技術來泄漏受保護的數據。在本章中,我們將概括並討論利用亂序執行進行攻擊所需要的組件。

The adversary targets a secret value that is kept somewhere in physical memory. Note that register contents are also stored in memory upon context switches, i.e., they are also stored in physical memory. As described in Section 2.2, the address space of every process typically includes the entire user space, as well as the entire kernel space, which typically also has all physical memory (inuse) mapped. However, these memory regions are only accessible in privileged mode (cf. Section 2.2).

攻擊者的目標是保存在物理內存中的一個祕密值。注意:寄存器值也會在上下文切換時保存在物理內存中。根據2.2節所述,每個進程的地址空間通常包括整個用戶地址空間以及整個內核地址空間(使用中的物理內存都會映射到該空間中),雖然進程能感知到內核空間的映射。但是這些內存區域只能在特權模式下訪問(參見第2.2節)。

In this work, we demonstrate leaking secrets by bypassing the privileged-mode isolation, giving an attacker full read access to the entire kernel space including any physical memory mapped, including the physical memory of any other process and the kernel. Note that Kocher et al. [19] pursue an orthogonal approach, called Spectre Attacks, which trick speculative executed instructions into leaking information that the victim process is authorized to access. As a result, Spectre Attacks lack the privilege escalation aspect of Meltdown and require tailoring to the victim process’s software environment, but apply more broadly to CPUs that support speculative execution and are not stopped by KAISER.

在這項工作中,我們繞過了地址空間隔離機制,讓攻擊者可以對整個內核空間進行完整的讀訪問,這裏面就包括物理內存直接映射部分。而通過直接映射,攻擊者可以訪問任何其他進程和內核的物理內存。注意:Kocher等人[ 19 ]正在研究一種稱爲幽靈(spectre)攻擊的方法,它通過推測執行(speculative execution)來泄漏目標進程的祕密信息。因此,幽靈攻擊不涉及Meltdown攻擊中的特權提升,並且需要根據目標進程的軟件環境進行定製。不過spectre會影響更多的CPU(只要支持speculative execution的CPU都會受影響),另外,KAISER無法阻擋spectre攻擊。

The full Meltdown attack consists of two building blocks, as illustrated in Figure 5. The first building block of Meltdown is to make the CPU execute one or more instructions that would never occur in the executed path. In the toy example (cf. Section 3), this is an access to an array, which would normally never be executed, as the previous instruction always raises an exception. We call such an instruction, which is executed out of order, leaving measurable side effects, a transient instruction. Furthermore, we call any sequence of instructions containing at least one transient instruction a transient instruction sequence.

clip_image010

完整的meltdown攻擊由兩個組件構成,如上圖所示。第一個組件是使CPU執行一個或多個在正常路徑中永遠不會執行的指令。在第三章中的簡單示例代碼中,對數組的訪問指令按理說是不會執行,因爲前面的指令總是觸發異常。我們稱這種指令爲瞬態指令(transient instruction),瞬態指令在亂序執行的時候被CPU執行(正常情況下不會執行),留下可測量的副作用。此外,我們把任何包含至少一個瞬態指令的指令序列稱爲瞬態指令序列。

In order to leverage transient instructions for an attack, the transient instruction sequence must utilize a secret value that an attacker wants to leak. Section 4.1 describes building blocks to run a transient instruction sequence with a dependency on a secret value.

爲了使用瞬態指令來完成攻擊,瞬態指令序列必須訪問攻擊者想要獲取的祕密值並加以利用。第4.1節將描述一段瞬態指令序列,我們會仔細看看這段指令會如何使用受保護的數據。

The second building block of Meltdown is to transfer the microarchitectural side effect of the transient instruction sequence to an architectural state to further process the leaked secret. Thus, the second building described in Section 4.2 describes building blocks to transfer a microarchitectural side effect to an architectural state using a covert channel.

Meltdown的第二個組件主要用來檢測在瞬態指令序列執行完畢之後,在CPU微架構上產生的side effect。並將其轉換成軟件可以感知的CPU體系結構的狀態,從而將數據泄露出來。因此,在4.2節中描述的第二個組件主要是使用隱蔽信道來把CPU微架構的副作用轉換成CPU architectural state。

1、執行瞬態指令(executing transient instructions)

The first building block of Meltdown is the execution of transient instructions. Transient instructions basically occur all the time, as the CPU continuously runs ahead of the current instruction to minimize the experienced latency and thus maximize the performance (cf. Section 2.1). Transient instructions introduce an exploitable side channel if their operation depends on a secret value. We focus on addresses that are mapped within the attacker’s process, i.e., the user-accessible user space addresses as well as the user-inaccessible kernel space addresses. Note that attacks targeting code that is executed

within the context (i.e., address space) of another process are possible [19], but out of scope in this work, since all physical memory (including the memory of other processes) can be read through the kernel address space anyway.

Meltdown的第一個組件是執行瞬態指令。其實瞬態指令是時時刻刻都在發生的,因爲CPU在執行當前指令之外,往往會提前執行當前指令之後的那些指令,從而最大限度地提高CPU性能(參見第2.1節的描述)。如果瞬態指令的執行依賴於一個受保護的值,那麼它就引入一個可利用的側信道。另外需要說明的是:本文主要精力放在攻擊者的進程地址空間中,也就是說攻擊者在用戶態訪問內核地址空間的受保護的數據。實際上攻擊者進程訪問盜取其他進程地址空間的數據也是可能的(不過本文並不描述這個場景),畢竟攻擊者進程可以通過內核地址空間訪問系統中所有內存,而其他進程的數據也就是保存在系統物理內存的某個地址上。

Accessing user-inaccessible pages, such as kernel pages, triggers an exception which generally terminates the application. If the attacker targets a secret at a user inaccessible address, the attacker has to cope with this exception. We propose two approaches: With exception handling, we catch the exception effectively occurring after executing the transient instruction sequence, and with exception suppression, we prevent the exception from occurring at all and instead redirect the control flow after executing the transient instruction sequence. We discuss these approaches in detail in the following.

運行於用戶態時訪問特權頁面,例如內核頁面,會觸發一個異常,該異常通常終止應用程序。如果攻擊者的目標是一個內核空間地址中保存的數據,那麼攻擊者必須處理這個異常。我們提出兩種方法:一種方法是設置異常處理函數,在發生異常的時候會調用該函數(這時候已經完成了瞬態指令序列的執行)。第二種方法是抑制異常的觸發,下面我們將詳細討論這些方法。

Exception handling. A trivial approach is to fork the attacking application before accessing the invalid memory location that terminates the process, and only access the invalid memory location in the child process. The CPU executes the transient instruction sequence in the child process before crashing. The parent process can then recover the secret by observing the microarchitectural state, e.g., through a side-channel.

程序自己定義異常處理函數。

一個簡單的方法是在訪問內核地址(這個操作會觸發異常並中止程序的執行)之前進行fork的操作,並只在子進程中訪問內核地址,觸發異常。在子進程crash之前,CPU已經執行了瞬態指令序列。在父進程中可以通過觀察CPU微架構狀態來盜取內核空間的數據。

It is also possible to install a signal handler that will be executed if a certain exception occurs, in this specific case a segmentation fault. This allows the attacker to issue the instruction sequence and prevent the application from crashing, reducing the overhead as no new process has to be created.

當然,你也可以設置信號處理函數。異常觸發後將執行該信號處理函數(在這個場景下,異常是segmentation fault)。這種方法的好處是應用程序不會crash,不需要創建新進程,開銷比較小。

Exception suppression.

這種方法和Transactional memory相關,有興趣的同學可以自行閱讀原文。

2、構建隱蔽通道(building covert channel)

The second building block of Meltdown is the transfer of the microarchitectural state, which was changed by the transient instruction sequence, into an architectural state (cf. Figure 5). The transient instruction sequence can be seen as the sending end of a microarchitectural covert channel. The receiving end of the covert channel receives the microarchitectural state change and deduces the secret from the state. Note that the receiver is not part of the transient instruction sequence and can be a different thread or even a different process e.g., the parent process in the fork-and-crash approach.

第二個Meltdown組件主要是用來把執行瞬態指令序列後CPU微架構狀態變化的信息轉換成相應的體系結構狀態(參考上圖)。瞬態指令序列可以認爲是微架構隱蔽通道的發端,通道的接收端用來接收微架構狀態的變化信息,從這些狀態變化中推導出被保護的數據。需要注意的是:接收端並不是瞬態指令序列的一部分,可以來自其他的線程甚至是其他的進程。例如上節我們使用fork的那個例子中,瞬態指令序列在子進程中,而接收端位於父進程中

We leverage techniques from cache attacks, as the cache state is a microarchitectural state which can be reliably transferred into an architectural state using various techniques [28, 35, 10]. Specifically, we use Flush+Reload [35], as it allows to build a fast and low-noise covert channel. Thus, depending on the secret value, the transient instruction sequence (cf. Section 4.1) performs a regular memory access, e.g., as it does in the toy example (cf. Section 3).

我們可以利用緩存攻擊(cache attack)技術,通過對高速緩存的狀態(是微架構狀態之一)的檢測,我們可以使用各種技術[ 28, 35, 10 ]將其穩定地轉換成CPU體系結構狀態。具體來說,我們可以使用Flush+Reload技術 [35],因爲該技術允許建立一個快速的、低噪聲的隱蔽通道。然後根據保密數據,瞬態指令序列(參見第4.1節)執行常規的存儲器訪問,具體可以參考在第3節給出的那個簡單示例程序中所做的那樣。

After the transient instruction sequence accessed an accessible address, i.e., this is the sender of the covert channel; the address is cached for subsequent accesses. The receiver can then monitor whether the address has been loaded into the cache by measuring the access time to the address. Thus, the sender can transmit a ‘1’-bit by accessing an address which is loaded into the monitored cache, and a ‘0’-bit by not accessing such an address.

在隱蔽通道的發送端,瞬態指令序列會訪問一個普通內存地址,從而導致該地址的數據被加載到了cache(爲了加速後續訪問)。然後,接收端可以通過測量內存地址的訪問時間來監視數據是否已加載到緩存中。因此,發送端可以通過訪問內存地址(會加載到cache中)傳遞bit 1的信息,或者通過不訪問內存地址(不會加載到cache中)來發送bit 0信息。而接收端可以通過監視cache的信息來接收這個bit 0或者bit 1的信息。

Using multiple different cache lines, as in our toy example in Section 3, allows to transmit multiple bits at once. For every of the 256 different byte values, the sender accesses a different cache line. By performing a Flush+Reload attack on all of the 256 possible cache lines, the receiver can recover a full byte instead of just one bit. However, since the Flush+Reload attack takes much longer (typically several hundred cycles) than the transient instruction sequence, transmitting only a single bit at once is more efficient. The attacker can simply do that by shifting and masking the secret value accordingly.

使用一個cacheline可以傳遞一個bit,如果使用多個不同的cacheline(類似我們在第3章中的簡單示例代碼一樣),就可以同時傳輸多個比特。一個Byte(8-bit)有256個不同的值,針對每一個值,發送端都會訪問不同的緩存行,這樣通過對所有256個可能的緩存行進行Flush+Reload攻擊,接收端可以恢復一個完整字節而不是一個bit。不過,由於Flush+Reload攻擊所花費的時間比執行瞬態指令序列要長得多(通常是幾百個cycle),所以只傳輸一個bit是更有效的。攻擊者可以通過shift和mask來完成保密數據逐個bit的盜取。

Note that the covert channel is not limited to microarchitectural states which rely on the cache. Any microarchitectural state which can be influenced by an instruction (sequence) and is observable through a side channel can be used to build the sending end of a covert channel. The sender could, for example, issue an instruction (sequence) which occupies a certain execution port such as the ALU to send a ‘1’-bit. The receiver measures the latency when executing an instruction (sequence) on the same execution port. A high latency implies that the sender sends a ‘1’-bit, whereas a low latency implies that sender sends a ‘0’-bit. The advantage of the Flush+ Reload cache covert channel is the noise resistance and the high transmission rate [10]. Furthermore, the leakage can be observed from any CPU core [35], i.e., rescheduling events do not significantly affect the covert channel.

需要注意的是:隱蔽信道並非總是依賴於緩存。只要CPU微架構狀態會被瞬態指令序列影響,並且可以通過side channel觀察這個狀態的改變,那麼該微架構狀態就可以用來構建隱蔽通道的發送端。例如,發送端可以執行一條指令(該指令會佔用相關執行單元(如ALU)的端口),來發送一個“1”這個bit。接收端可以在同一個執行單元端口上執行指令,同時測量時間延遲。高延遲意味着發送方發送一個“1”位,而低延遲意味着發送方發送一個“0”位。Flush+ Reload隱蔽通道的優點是抗噪聲和高傳輸速率[ 10 ]。此外,我們可以從任何cpu core上觀察到數據泄漏[ 35 ],即調度事件並不會顯著影響隱蔽信道。

五、熔斷(Meltdown)

In this section, present Meltdown, a powerful attack allowing to read arbitrary physical memory from an unprivileged user program, comprised of the building blocks presented in Section 4. First, we discuss the attack setting to emphasize the wide applicability of this attack. Second, we present an attack overview, showing how Meltdown can be mounted on both Windows and Linux on personal computers as well as in the cloud. Finally, we discuss a concrete implementation of Meltdown allowing to dump kernel memory with up to 503KB/s.

在這一章我們將向您展示meltdown的威力:通過一個普通的用戶程序讀取系統中任意位置的物理內存。整個攻擊過程的框架圖已經在第4章描述。首先,我們討論攻擊設置,通過設置我們可以看出meltdown這種攻擊具有非常廣泛的適用性。其次,我們對meltdown攻擊進行概述,並展示了它如何對安裝Windows和Linux的個人計算機上以及雲服務器展開攻擊。最後,我們討論了一個具體的實現,該Meltdown的實現允許以503kB/s的速度dump內核空間的內存。

Attack setting.

In our attack, we consider personal computers and virtual machines in the cloud. In the attack scenario, the attacker has arbitrary unprivileged code execution on the attacked system, i.e., the attacker can run any code with the privileges of a normal user. However, the attacker has no physical access to the machine. Further, we assume that the system is fully protected with state-of-the-art software-based defenses such as ASLR and KASLR as well as CPU features like SMAP, SMEP, NX, and PXN. Most importantly, we assume a completely bug-free operating system, thus, no software vulnerability exists that can be exploited to gain kernel privileges or leak information. The attacker targets secret user data, e.g., passwords and private keys, or any other valuable information.

攻擊設定如下:

我們考慮個人計算機和雲服務器上的虛擬機兩種應用場景。在攻擊過程中,攻擊者只使用未授權的代碼來攻擊系統,也就是說攻擊者只能以一個普通用戶的權限來運行代碼。另外,攻擊者沒有對機器進行物理訪問。進一步,我們假設我們準備攻擊的系統是已經有了非常好的基於軟件的防禦措施,例如ASLR和KASLR,同時CPU也包含了像SMAP,SMEP,NX,和PXN的功能。最重要的是,我們假設被攻擊系統是一個完全無bug的操作系統,沒有軟件漏洞可以被利用來獲得root權限或泄露信息。攻擊者的目標是用戶的祕密數據,例如密碼和私鑰,或任何其他有價值的信息。

1、概述

Meltdown combines the two building blocks discussed in Section 4. First, an attacker makes the CPU execute a transient instruction sequence which uses an inaccessible secret value stored somewhere in physical memory (cf. Section 4.1). The transient instruction sequence acts as the transmitter of a covert channel (cf. Section 4.2), ultimately leaking the secret value to the attacker.

Meltdown使用了第4章中討論攻擊架構圖。首先,攻擊者讓CPU執行一個瞬態指令序列,該指令序列會操作保存在物理內存中不可訪問的祕密數據(參見第4.1節)。瞬態指令序列充當隱蔽通道的發送端(參見第4.2節),最終將祕密數據泄漏給攻擊者。

Meltdown consists of 3 steps:

Step 1 The content of an attacker-chosen memory location,which is inaccessible to the attacker, is loaded into a register.

Step 2 A transient instruction accesses a cache line based on the secret content of the register.

Step 3 The attacker uses Flush+Reload to determine the accessed cache line and hence the secret stored at the chosen memory location.

By repeating these steps for different memory locations, the attacker can dump the kernel memory, including the entire physical memory.

Meltdown攻擊包括3個步驟:

步驟1:攻擊者訪問祕密數據所在的內存位置(該內存是攻擊者沒有權限訪問的),並加載到一個寄存器中。

步驟2,瞬態指令基於寄存器中保存的祕密數據內容訪問cache line。

步驟3:攻擊者使用Flush+Reload來確定在步驟2中訪問的cache line,從而恢復在步驟1中讀取的祕密數據。

在不同的內存地址上不斷重複上面的步驟,攻擊者可以dump整個內核地址空間的數據,這也就包括了整個物理內存。

Listing 2 shows the basic implementation of the transient instruction sequence and the sending part of the covert channel, using x86 assembly instructions. Note that this part of the attack could also be implemented entirely in higher level languages like C. In the following, we will discuss each step of Meltdown and the corresponding code line in Listing 2.

clip_image012

上面的列表顯示了瞬態指令序列和隱蔽通道發送部分的基本實現(使用x86彙編指令)。需要注意的是:這部分攻擊的代碼也可以完全用C這樣的高級語言來實現。在隨後的文章中,我們會討論列表中的每一行代碼是如何完成meltdown攻擊的。

Step 1: Reading the secret. To load data from the main memory into a register, the data in the main memory is referenced using a virtual address. In parallel to translating a virtual address into a physical address, the CPU also checks the permission bits of the virtual address, i.e., whether this virtual address is user accessible or only accessible by the kernel. As already discussed in Section 2.2, this hardware-based isolation through a permission bit is considered secure and recommended by the hardware vendors. Hence, modern operating systems always map the entire kernel into the virtual address space of every user process.

步驟1:讀內存中的祕密數據。爲了將數據從主存儲器加載到寄存器中,我們使用虛擬地址來訪問主存中的數據。在將虛擬地址轉換爲物理地址的同時,CPU還會檢查虛擬地址的權限位:這個虛擬地址可否被用戶態訪問,還是只能在內核態中訪問。正如在第2.2節中已經討論過的那樣,我們都認爲這個基於硬件的地址空間隔離是安全的,並且硬件廠商也推薦使用這種隔離方法。因此,現代操作系統總是將整個內核地址空間映射到每個用戶進程的虛擬地址空間。

As a consequence, all kernel addresses lead to a valid physical address when translating them, and the CPU can access the content of such addresses. The only difference to accessing a user space address is that the CPU raises an exception as the current permission level does not allow to access such an address. Hence, the user space cannot simply read the contents of such an address. However, Meltdown exploits the out-of-order execution of modern CPUs, which still executes instructions in the small time window between the illegal memory access and the raising of the exception.

訪問內核地址空間的時候,只要創建了虛擬地址的映射(即可以通過頁表翻譯出一個有效的物理地址),CPU都可以訪問這些地址的內容。和訪問用戶地址空間唯一不同是會進行權限檢查,由於當前CPU權限級別不夠而訪問內核空間地址的時候會觸發異常。因此,用戶空間不能簡單地通過讀取內核地址的內容來獲得祕密數據。然而,亂序執行的特性允許CPU在一個很小的時間窗口內(從執行了非法內存訪問的指令到觸發異常),仍然會繼續執行指令。Meltdown就是利用了亂序執行的特性完成了攻擊。

In line 4 of Listing 2, we load the byte value located at the target kernel address, stored in the RCX register, into the least significant byte of the RAX register represented by AL. As explained in more detail in Section 2.1, the MOV instruction is fetched by the core, decoded into μOPs, allocated, and sent to the reorder buffer. There, architectural registers (e.g., RAX and RCX in Listing 2) are mapped to underlying physical registers enabling out-of-order execution. Trying to utilize the pipeline as much as possible, subsequent instructions (lines 5-7) are already decoded and allocated as μOPs as well. The μOPs are further sent to the reservation station holding the μOPs while they wait to be executed by the corresponding execution unit. The execution of a μOP can be delayed if execution units are already used to their corresponding capacity or operand values have not been calculated yet.

在上面代碼列表中的第4行,我們訪問了位於內核地址空間的memory(地址保存在RCX寄存器),獲取了一個字節的數據,保存在AL寄存器(即RAX寄存器的8個LSB比特)。根據2.1節中的描述,MOV指令由CPU core取指,解碼成μOPS,分配併發送到重排序緩衝區。在那裏,architectural register(軟件可見的寄存器,例如RAX和RCX)會被映射成底層的物理寄存器以便實現亂序執行。爲了儘可能地利用流水線,隨後的指令(5-7的代碼)已經解碼並分配爲uOPs。該uOPs會進一步送到保留站(暫存uOPs),在保留站中,uOPs會等待相應的執行單元空閒,如果執行單元準備好,該uOPs會立刻執行,如果執行單元已經達到了容量的上限(例如有3個加法器,那麼可以同時進行3個加法運算,第四個加法uOPs就需要等待了)或uOPs操作數值尚未計算出來,uOPs則被延遲執行。

When the kernel address is loaded in line 4, it is likely that the CPU already issued the subsequent instructions as part of the out-or-order execution, and that their corresponding μOPs wait in the reservation station for the content of the kernel address to arrive. As soon as the fetched data is observed on the common data bus, the μOPs can begin their execution.

當在程序第4行加載內核地址到寄存器的時候,由於亂序執行,很可能CPU已經把後續指令發射出去,並且它們相應的μOPs會在保留站中等待內核地址的內容到來。一旦在公共數據總線上觀察到所獲取的內核地址數據,這些μOPs就會立刻開始執行。

When the μOPs finish their execution, they retire in order, and, thus, their results are committed to the architectural state. During the retirement, any interrupts and exception that occurred during the execution of the instruction are handled. Thus, if the MOV instruction that loads the kernel address is retired, the exception is registered and the pipeline is flushed to eliminate all results of subsequent instructions which were executed out of order. However, there is a race condition between raising this exception and our attack step 2 which we describe below.

當μOPs執行完畢後,它們就按順序進行retire(這個術語叫做retire,很難翻譯,這裏就不翻譯了,但是和commit是一個意思),因此,μOPs的結果會被提交併體現在體系結構狀態上。在提交過程中,在執行指令期間發生的任何中斷和異常都會被處理。因此,在提交MOV指令的時候發現該指令操作的是內核地址,這時候會觸發異常。這時候CPU流水線會執行flush操作,由於亂序執行而提前執行的那些指令(Mov指令之後)結果會被清掉。然而,在觸發這個異常和我們執行的攻擊步驟2之間有一個競爭條件(race condition),我們在下面描述。

As reported by Gruss et al. [9], prefetching kernel addresses sometimes succeeds. We found that prefetching the kernel address can slightly improve the performance of the attack on some systems.

根據Gruss等人的研究[ 9 ],預取內核地址有時成功。我們發現:預取內核地址可以略微改善某些系統的攻擊性能。

Step 2: Transmitting the secret. The instruction sequence from step 1 which is executed out of order has to be chosen in a way that it becomes a transient instruction sequence. If this transient instruction sequence is executed before the MOV instruction is retired (i.e., raises the exception), and the transient instruction sequence performed computations based on the secret, it can be utilized to transmit the secret to the attacker.

步驟2:傳送祕密數據

在步驟1中亂序執行的指令序列能否成爲瞬態指令序列是需要條件的。如果的確是瞬態指令序列,那麼它必須要在MOV指令retirement之前被執行(即在觸發異常之前),而且瞬態指令序列會基於祕密數據進行計算,而這個計算的副作用可以用來向攻擊者傳遞祕密數據。

As already discussed, we utilize cache attacks that allow to build fast and low-noise covert channel using the CPU’s cache. Thus, the transient instruction sequence has to encode the secret into the microarchitectural cache state, similarly to the toy example in Section 3.

正如之前已經討論過的,我們利用緩存攻擊,即利用CPU的高速緩存建立快速和低噪聲的隱蔽通道。然後,瞬態指令序列必須要把祕密數據編碼在微架構緩存狀態中。這個過程類似於第三節中的那個簡單示例程序。

We allocate a probe array in memory and ensure that no part of this array is cached. To transmit the secret, the transient instruction sequence contains an indirect memory access to an address which is calculated based on the secret (inaccessible) value. In line 5 of Listing 2 the secret value from step 1 is multiplied by the page size, i.e., 4 KB. The multiplication of the secret ensures that accesses to the array have a large spatial distance to each other. This prevents the hardware prefetcher from loading adjacent memory locations into the cache as well. Here, we read a single byte at once, hence our probe array is 256×4096 bytes, assuming 4KB pages.

我們在內存中分配一個探測數組,並確保該數組的所有內存都沒有被cached。爲了傳遞祕密數據,瞬態指令序列包含對探測數組的間接內存訪問,具體的訪問地址是基於那個祕密數據的(該祕密數據是用戶態不可訪問的)。具體可以參考上面列表中的第5行代碼:第1步獲取的祕密數據會乘以頁面大小,即4 KB(代碼使用了移位操作,是一樣的意思)。這個乘法操作確保了對數組的訪問具有較大的空間距離。這可以防止硬件prefetcher把相鄰存儲單元的數據加載到緩存中。在這示例中,由於一次只讀出一個字節,所以我們的探測數組是256×4096字節(假設頁面大小是4KB)。

Note that in the out-of-order execution we have a noise-bias towards register value ‘0’. We discuss the reasons for this in Section 5.2. However, for this reason, we introduce a retry-logic into the transient instruction sequence. In case we read a ‘0’, we try to read the secret again (step 1). In line 7, the multiplied secret is added to the base address of the probe array, forming the target address of the covert channel. This address is read to cache the corresponding cache line. Consequently, our transient instruction sequence affects the cache state based on the secret value that was read in step 1.

注意:在亂序執行中,我們對寄存器值“0”有一個噪聲偏置(noise-bias)。我們在第5.2節討論了具體的原因。正是由於這個原因,我們在瞬態指令序列中引入了重試邏輯。如果我們讀到了“0”值,我們試着重新讀這個祕密數據(第1步)。在代碼的第7行中,將祕密數據乘以4096並累加到探測數組的基地址中,從而形成隱蔽信道的目標地址。讀取該目標地址可以將數據加載到對應的cacheline中。因此,瞬態指令序列根據第1步中讀取的祕密數據修改了探測數組對應的緩存狀態。

Since the transient instruction sequence in step 2 races against raising the exception, reducing the runtime of step 2 can significantly improve the performance of the attack. For instance, taking care that the address translation for the probe array is cached in the TLB increases the attack performance on some systems.

由於步驟2中的瞬態指令序列需要和異常的觸發相競爭,因此減少步驟2的運行時間可以顯著提高攻擊的性能。例如:把探測數組的地址翻譯預先緩存在TLB中。

Step 3: Receiving the secret. In step 3, the attacker recovers the secret value (step 1) by leveraging a microarchitectural side-channel attack (i.e., the receiving end of a microarchitectural covert channel) that transfers the cache state (step 2) back into an architectural state. As discussed in Section 4.2, Meltdown relies on Flush+Reload to transfer the cache state into an architectural state.

步驟3:接收祕密數據。

在步驟3中,攻擊者利用微架構側信道攻擊(即微架構隱蔽信道的接收端)將cache state轉換成了軟件可以感知的體系結構狀態(architectural state),從而恢復了祕密數據。正如第4.2節中所討論的,meltdown依賴於Flush+Reload來將緩存狀態轉換爲CPU體系結構狀態。

When the transient instruction sequence of step 2 is executed, exactly one cache line of the probe array is cached. The position of the cached cache line within the probe array depends only on the secret which is read in step 1. Thus, the attacker iterates over all 256 pages of the probe array and measures the access time for every first cache line (i.e., offset) on the page. The number of the page containing the cached cache line corresponds directly to the secret value.

在步驟2中執行的瞬態指令序列時,整個探測數組只有一個頁面的cacheline被加載了。具體加載的cacheline的在探測數組中的位置僅取決於步驟1中讀取的祕密數據。因此,攻擊者遍歷所有探測數組中的256個頁面,測試每個頁面第一個cacheline的訪問時間,已經預先加載了cacheline的那個page index就直接對應着祕密數據的數值。

Dumping the entire physical memory. By repeating all 3 steps of Meltdown, the attacker can dump the entire memory by iterating over all different addresses. However, as the memory access to the kernel address raises an exception that terminates the program, we use one of the methods described in Section 4.1 to handle or suppress the exception.

Dump整個物理內存:

通過重複上面的3個步驟,同時修改不同的攻擊地址,攻擊者可以dump所有內存。但是,由於對內核地址的內存訪問引發了一個終止程序的異常,所以我們使用第4.1節中描述的方法來處理或抑制這個異常。

As all major operating systems also typically map the entire physical memory into the kernel address space (cf. Section 2.2) in every user process, Meltdown is not only limited to reading kernel memory but it is capable of reading the entire physical memory of the target machine.

在目前所有的主流操作系統中,我們通常會把整個物理內存映射到內核地址空間(參見第2.2節),而每個用戶進程中又包括內核地址空間部分。因此Meltdown不僅能讀取內核地址空間的內存值,而且能夠讀取整個系統的物理內存。

2、優化和限制(optimizations and limitations)

The case of 0. If the exception is triggered while trying to read from an inaccessible kernel address, the register where the data should be stored, appears to be zeroed out. This is reasonable because if the exception is unhandled, the user space application is terminated, and the value from the inaccessible kernel address could be observed in the register contents stored in the core dump of the crashed process. The direct solution to fix this problem is to zero out the corresponding registers. If the zeroing out of the register is faster than the execution of the subsequent instruction (line 5 in Listing 2), the attacker may read a false value in the third step. To prevent the transient instruction sequence from continuing with a wrong value, i.e., ‘0’, Meltdown retries reading the address until it encounters a value different from ‘0’ (line 6). As the transient instruction sequence terminates after the exception is raised, there is no cache access if the secret value is 0. Thus, Meltdown assumes that the secret value is indeed ‘0’ if there is no cache hit at all.

讀出數值是0的場景。

根據前面的描述,在instruction commit階段,當檢測到用戶態訪問內核地址的時候,除了觸發異常,CPU還會清除指令的操作結果,也就是說AL寄存器會被清零。如果瞬態指令序列在和異常的競爭中失敗了(寄存器清零早於上面程序列表中第五行代碼執行),那麼很可能從內核地址讀出的並非其真是值,而是清零後的數值。對寄存器清零也是合理的,因爲如果異常沒有被處理,用戶空間的應用程序會終止,該進程的core dump文件中會保留寄存器的內容,如果不清零,那麼內核空間的數據可以通過core dump文件泄露出去。清零可以修正這個issue,保證內核空間數據的安全。爲了防止瞬態指令序列繼續操作錯誤的“0”值,Meltdown會重讀地址直到讀出非“0”值(第6行代碼)。

你可能會問:如果祕密數據就是0怎麼辦?其實當異常觸發後,瞬態指令序列終止執行,如果祕密數據確實等於0,則不存在任何cacheline被加載。因此,meltdown在進行探測數據cacheline掃描過程中,如果沒有任何cacheline命中,那麼祕密數據實際上就是“0”。

The loop is terminated by either the read value not being ‘0’ or by the raised exception of the invalid memory access. Note that this loop does not slow down the attack measurably, since, in either case, the processor runs ahead of the illegal memory access, regardless of whether ahead is a loop or ahead is a linear control flow. In either case, the time until the control flow returned from exception handling or exception suppression remains the same with and without this loop. Thus, capturing read ‘0’s beforehand and recovering early from a lost race condition vastly increases the reading speed.

無論是讀出數值非“0”或無效地址訪問觸發了異常,代碼中的循環邏輯都會終止。注意,這個循環不會降低攻擊的性能,因爲,在上面兩種情況中,CPU會提前允許非法內存訪問指令之後的代碼指令,而CPU並不關心這些指令是一個循環控制或是一個線性的控制流。無論哪一種情況,從異常處理函數(或者異常抑制)返回的時間都是一樣的,和有沒有循環控制是無關的。因此,儘早發現讀出值是“0”,也就是說盡早發現自己在和異常的競爭中失敗並恢復,可以大大提高了讀取速度。

Single-bit transmission :

In the attack description in Section 5.1, the attacker transmitted 8 bits through the covert channel at once and performed 28 = 256 Flush+Reload measurements to recover the secret. However, there is a clear trade-off between running more transient instruction sequences and performing more Flush+Reload measurements. The attacker could transmit an arbitrary number of bits in a single transmission through the covert channel, by either reading more bits using a MOV instruction for a larger data value. Furthermore, the attacker could mask bits using additional instructions in the transient instruction sequence. We found the number of additional instructions in the transient instruction sequence to have a negligible influence on the performance of the attack.

單個bit數據的發送:

在第5.1節的描述中,攻擊者通過隱蔽通道一次可以傳輸8個bit,接收端執行2^8=256 次Flush+Reload命令來恢復祕密數據。不過,我們需要在運行更多的瞬態指令序列和執行更多的Flush+Reload測量之間進行平衡。攻擊者可以通過隱蔽通道在一次傳輸中發送任意比特的數據,當然這需要使用MOV指令去讀取更多bit的祕密數據。此外,攻擊者可以在瞬態指令序列中增加mask的操作(這樣可以傳送更少的bit,從而減少接收端Flush+Reload的次數)。我們發現在瞬態指令序列中增加的指令數對攻擊的性能影響是微不足道的。

The performance bottleneck in the generic attack description above is indeed, the time spent on Flush+Reload measurements. In fact, with this implementation, almost the entire time will be spent on Flush+Reload measurements. By transmitting only a single bit, we can omit all but one Flush+Reload measurement, i.e., the measurement on cache line 1. If the transmitted bit was a ‘1’, then we observe a cache hit on cache line 1. Otherwise, we observe no cache hit on cache line 1.

上面描述的meltdown攻擊中的性能瓶頸主要是在通過Flush+Reload恢復祕密數據上所花費的時間。實際上在本章中的meltdown代碼實現中,幾乎所有的時間都將花費在Flush+Reload上了。如果只發送一個bit,那麼除了一次Flush+Reload測量時間,其他的我們都可以忽略。在這種情況下,我們只需要檢測一個cacheline的狀態,如果cache hit,那麼傳輸的bit是“1”,如果cache miss,那麼傳輸的bit是“0”。

Transmitting only a single bit at once also has drawbacks. As described above, our side channel has a bias towards a secret value of ‘0’. If we read and transmit multiple bits at once, the likelihood that all bits are ‘0’ may quite small for actual user data. The likelihood that a single bit is ‘0’ is typically close to 50 %. Hence, the number of bits read and transmitted at once is a tradeoff between some implicit error-reduction and the overall transmission rate of the covert channel.

一次只傳輸一個比特也有缺點。如上所述,我們的側通道更偏向於“0”值。如果我們一次讀取多個比特的祕密數據併發送出去,那麼所有bit都是“0”的可能性應該說是相當小。單個bit等於“0”的可能性通常接近50%。因此,一次傳輸的比特數是需要在隱蔽信道的總傳輸速率和減少差錯之間進行平衡。

However, since the error rates are quite small in either case, our evaluation (cf. Section 6) is based on the single-bit transmission mechanics.

不過,由於兩種情況下的錯誤率都很小,因此我們的評估(參見第6節)是基於單比特傳輸機制的。

Exception Suppression using Intel TSX.

和Intel的TSX相關,暫時沒有興趣瞭解。

Dealing with KASLR.

In 2013, kernel address space layout randomization (KASLR) had been introduced to the Linux kernel (starting from version 3.14 [4]) allowing to randomize the location of the kernel code at boot time. However, only as recently as May 2017, KASLR had been enabled by default in version 4.12 [27]. With KASLR also the direct-physical map is randomized and, thus, not fixed at a certain address such that the attacker is required to obtain the randomized offset before mounting the Meltdown attack. However, the randomization is limited to 40 bit.

處理KASLR。

2013年,內核地址空間佈局隨機化(KASLR)已被合併到Linux內核中(從3.14版開始[ 4 ]),這個特性允許在開機的時候把內核代碼加載到一個隨機化地址上去。在最近的(2017年5月)4.12版的內核中,KASLR已經被默認啓用[ 27 ]。並且直接映射部分的地址也是隨機的,並非固定在某個地址上。因此,在利用meltdown漏洞對內核進行攻擊之前,攻擊者需要需要獲得一個40-bit的隨機偏移值。

Thus, if we assume a setup of the target machine with 8GB of RAM, it is sufficient to test the address space for addresses in 8GB steps. This allows to cover the search space of 40 bit with only 128 tests in the worst case. If the attacker can successfully obtain a value from a tested address, the attacker can proceed dumping the entire memory from that location. This allows to mount Meltdown on a system despite being protected by KASLR within seconds.

假設目標機有8GB內存,那麼我們其實是可以使用8G的步長來進行地址空間的探測。即便是在最壞的情況下也只有128次就可以確定這個40-bit的隨機偏移值。攻擊者一旦能夠成功地攻擊某個測試地址,那麼他也可以繼續從該位置dump整個內存。儘管系統受到KASLR的保護,實際上利用meltdown漏洞,攻擊者也可以在幾秒鐘內完成攻擊。

六、評估(Evaluation)

In this section, we evaluate Meltdown and the performance of our proof-of-concept implementation 2. Section 6.1 discusses the information which Meltdown can leak, and Section 6.2 evaluates the performance of Meltdown, including countermeasures. Finally, we discuss limitations for AMD and ARM in Section 6.4.

在本章中,我們將評估meltdown的影響以及我們POC(proof-of-concept)實現的性能。6.1節討論了meltdown可能泄漏的信息,6.2節評估了meltdown的性能和對策。最後在6.4節中,我們討論了在AMD和ARM處理器上meltdown的侷限性。

Table 1 shows a list of configurations on which we successfully reproduced Meltdown. For the evaluation of Meltdown, we used both laptops as well as desktop PCs with Intel Core CPUs. For the cloud setup, we tested Meltdown in virtual machines running on Intel Xeon CPUs hosted in the Amazon Elastic Compute Cloud as well as on DigitalOcean. Note that for ethical reasons we did not use Meltdown on addresses referring to physical memory of other tenants.

clip_image014

在上面列表顯示的系統中,我們都成功地利用meltdown漏洞進行了攻擊。我們在使用英特爾CPU的筆記本電腦和臺式機上進行了meltdown的評估。對於雲服務器,我們測試Amazon Elastic Compute Cloud和DigitalOcean的虛擬機,CPU是英特爾的Xeon處理器。出於道德的原因,我們沒有使用meltdown去獲取真實用戶物理內存地址上的數據。

1、各種環境下的信息泄露(Information Leakage and Environments)

We evaluated Meltdown on both Linux (cf. Section 6.1.1) and Windows 10 (cf. Section 6.1.3). On both operating systems, Meltdown can successfully leak kernel memory. Furthermore, we also evaluated the effect of the KAISER patches on Meltdown on Linux, to show that KAISER prevents the leakage of kernel memory (cf. Section 6.1.2). Finally, we discuss the information leakage when running inside containers such as Docker (cf. Section 6.1.4).

我們在Linux(參見第6.1.1)和Windows 10(參見第6.1.3)這兩個操作系統上評估了meltdown漏洞,結果表明它們都可以成功地泄漏內核信息。此外,我們還測試了KAISER補丁在Linux上的效果,結果表明KAISER補丁可以防止內核信息泄漏(參見第6.1.2)。最後,我們討論了在容器環境下(例如Docker)的信息泄漏(參見第6.1.4)。

(1)Linux

We successfully evaluated Meltdown on multiple versions of the Linux kernel, from 2.6.32 to 4.13.0. On all these versions of the Linux kernel, the kernel address space is also mapped into the user address space. Thus, all kernel addresses are also mapped into the address space of user space applications, but any access is prevented due to the permission settings for these addresses. As Meltdown bypasses these permission settings, an attacker can leak the complete kernel memory if the virtual address of the kernel base is known. Since all major operating systems also map the entire physical memory into the kernel address space (cf. Section 2.2), all physical memory can also be read.

我們成功地對多個版本的Linux內核(從2.6.32到4.13.0)進行了Meltdown評估。在Linux內核的所有這些版本中,內核地址空間都映射到了用戶進程地址空間中。但由於權限設置,任何來自用戶空間的內核數據訪問都被阻止。Meltdown可以繞過這些權限設置,並且只要攻擊者知道內核虛擬地址,都可以發起攻擊,從而泄露內核數據。由於所有主要操作系統都將整個物理內存映射到內核地址空間(參見第2.2節),因此利用meltdown漏洞可以讀取所有物理內存的數據。

Before kernel 4.12, kernel address space layout randomization (KASLR) was not active by default [30]. If KASLR is active, Meltdown can still be used to find the kernel by searching through the address space (cf. Section 5.2). An attacker can also simply de-randomize the direct-physical map by iterating through the virtual address space. Without KASLR, the direct-physical map starts at address 0xffff 8800 0000 0000 and linearly maps the entire physical memory. On such systems, an attacker can use Meltdown to dump the entire physical memory, simply by reading from virtual addresses starting at 0xffff 8800 0000 0000.

在4.12內核之前,內核地址空間佈局隨機化(KASLR)不是默認啓用的[ 30 ]。如果啓動KASLR這個特性,meltdown仍然可以用來找到內核的映射位置(這是通過搜索地址空間的方法,具體參見5.2節)。攻擊者也可以通過遍歷虛擬地址空間的方法來找到物理內存直接映射的信息。沒有KASLR,Linux內核會在0xffff 8800 0000 0000開始的線性地址區域內映射整個物理內存。在這樣的系統中,攻擊者可以用meltdown輕鬆dump整個物理內存,因爲攻擊者已經清楚的知道物理內存的虛擬地址是從0xffff 8800 0000 0000開始的。

On newer systems, where KASLR is active by default, the randomization of the direct-physical map is limited to 40 bit. It is even further limited due to the linearity of the mapping. Assuming that the target system has at least 8GB of physical memory, the attacker can test addresses in steps of 8 GB, resulting in a maximum of 128 memory locations to test. Starting from one discovered location, the attacker can again dump the entire physical memory.

在新的linux系統中,KASLR是默認啓動的,因此物理內存的虛擬地址並非從0xffff 8800 0000 0000開始,而是需要累加一個40-bit的隨機偏移。由於物理內存的映射是線性的,KASLR阻擋meltdown攻擊的效果進一步受到限制。假設目標系統有8GB內存,攻擊者可以按照8 GB的步長來破解40-bit的隨機偏移,最多128次嘗試就可以推出這個隨機偏移。一旦攻破了隨機偏移值,攻擊者可以再次dump整個物理內存。

Hence, for the evaluation, we can assume that the randomization is either disabled, or the offset was already retrieved in a pre-computation step.

因此,在本節的meltdown評估中,我們可以假設KASLR是禁用的,或者隨機偏移量已經在預先計算出來了。

(2)打了KAISER補丁的Linux系統(Linux with KAISER patch)

The KAISER patch by Gruss et al. [8] implements a stronger isolation between kernel and user space.

KAISER does not map any kernel memory in the user space, except for some parts required by the x86 architecture (e.g., interrupt handlers). Thus, there is no valid mapping to either kernel memory or physical memory (via the direct-physical map) in the user space, and such addresses can therefore not be resolved. Consequently, Meltdown cannot leak any kernel or physical memory except for the few memory locations which have to be mapped in user space.

Gruss發佈的KAISER補丁[ 8 ]實現了內核和用戶空間之間更強的隔離。KAISER根本不把內核地址空間映射到用戶進程空間中去。除了x86架構所需的某些部分代碼之外(如中斷處理程序),在用戶空間中根本看不到物理內存的直接映射,也看不到內核地址空間的任何信息。沒有有效的映射,因此用戶空間根本不能解析這些地址。因此,除了少數必須在用戶空間中映射的物理內存或者內核地址外,meltdown不能泄漏任何數據。

We verified that KAISER indeed prevents Meltdown, and there is no leakage of any kernel or physical memory.

我們驗證了KAISER確實解決了meltdown漏洞,沒有任何內核或物理內存數據的泄漏。

Furthermore, if KASLR is active, and the few remaining memory locations are randomized, finding these memory locations is not trivial due to their small size of several kilobytes. Section 7.2 discusses the implications of these mapped memory locations from a security perspective.

此外,如果啓用了KASLR,雖然有部分內核地址空間的映射是用戶態可見的,但是這些內核地址的位置是隨機的,而且由於這段內存區域只有幾KB,發現這些內存位置不是一件簡單的事情。7.2節從安全的角度討論了映射這一小段內存的含義。

(3)Windows

沒有興趣瞭解,略過。

(4)容器

We evaluated Meltdown running in containers sharing a kernel, including Docker, LXC, and OpenVZ, and found that the attack can be mounted without any restrictions. Running Meltdown inside a container allows to leak information not only from the underlying kernel, but also from all other containers running on the same physical host.

我們評估了容器環境下(共享一個內核)的meltdown,包括Docker,LXC,和OpenVZ,結果發現meltdown可以沒有任何限制發起攻擊。在容器中運行執行meltdown攻擊不僅可以泄漏底層內核信息,還可以泄漏同一物理主機上其他容器上的信息。

The commonality of most container solutions is that every container uses the same kernel, i.e., the kernel is shared among all containers. Thus, every container has a valid mapping of the entire physical memory through the direct-physical map of the shared kernel. Furthermore, Meltdown cannot be blocked in containers, as it uses only memory accesses. Especially with Intel TSX, only unprivileged instructions are executed without even trapping into the kernel.

大多數容器解決方案都是使用相同的內核,即內核是在所有容器中共享的。因此,每個容器中都有對整個物理內存的直接映射。由於只涉及內存訪問,在容器中並不能阻止meltdown攻擊。特別是在使用英特爾TSX特性的情況下,攻擊根本不需要陷入內核執行,只有非特權指令的執行。

Thus, the isolation of containers sharing a kernel can be fully broken using Meltdown. This is especially critical for cheaper hosting providers where users are not separated through fully virtualized machines, but only through containers. We verified that our attack works in such a setup, by successfully leaking memory contents from a container of a different user under our control.

因此,共享內核的容器隔離可以輕鬆地被meltdown攻破。對於那些提供廉價主機託管服務的提供商來說,這個問題更嚴重,因爲在那種情況下,用戶不能通過完全虛擬化的物理機器進行隔離,而只能通過容器進行隔離。我們驗證了在這樣的環境中,meltdown的確是起作用的,我們可以成功地從其他用戶容器中盜取內存信息。

2、meltdown性能

To evaluate the performance of Meltdown, we leaked known values from kernel memory. This allows us to not only determine how fast an attacker can leak memory, but also the error rate, i.e., how many byte errors to expect. We achieved average reading rates of up to 503KB/s with an error rate as low as 0.02% when using exception suppression. For the performance evaluation, we focused on the Intel Core i7-6700K as it supports Intel TSX, to get a fair performance comparison between exception handling and exception suppression.

爲了評估meltdown的性能,我們事先在準備攻擊的內核內存中設定了指定的數值。這使我們不僅能夠確定攻擊者盜取內存數據的速度,而且還可以確定錯誤率(即有多少字節的錯誤)。在使用異常抑制(需要TSX支持)的情況下,我們實現了503kB / s的數據泄露速度,而錯誤率低於0.02%。對於性能評估,我們集中在英特爾的Core i7-6700k處理器,因爲它支持TSX。這樣我們可以在一個公平環境中(同一個CPU)比較異常處理和異常抑制兩種方法下meltdown的性能。

For all tests, we use Flush+Reload as a covert channel to leak the memory as described in Section 5. We evaluated the performance of both exception handling and exception suppression (cf. Section 4.1). For exception handling, we used signal handlers, and if the CPU supported it, we also used exception suppression using Intel TSX. An extensive evaluation of exception suppression using conditional branches was done by Kocher et al. [19] and is thus omitted in this paper for the sake of brevity.

對於所有的測試,我們使用Flush+Reload作爲一個隱蔽通道來泄漏內存信息,具體可以參考第5章的描述。我們評估了異常處理和異常抑制這兩種方法下meltdown的性能(參見第4.1節)。對於異常處理,我們設置信號處理函數。如果CPU支持,我們也可以利用英特爾TSX來完成異常抑制。使用條件分支來完成異常抑制的評估是由Kocher等人完成的[ 19 ]。爲了簡潔起見,本文省略了這部分的內容。

(1)異常處理

Exception handling is the more universal implementation, as it does not depend on any CPU extension and can thus be used without any restrictions. The only requirement for exception handling is operating system support to catch segmentation faults and continue operation afterwards. This is the case for all modern operating systems, even though the specific implementation differs between the operating systems. On Linux, we used signals, whereas, on Windows, we relied on the Structured Exception Handler.

異常處理的方法更通用一些,因爲它不依賴於任何CPU擴展特性,從而可以不受任何限制地在各種處理器上使用。異常處理的唯一要求是操作系統支持捕捉segmentation fault並繼續操作。基本上所有現代操作系統都支持這個特性,不過具體的實現會有所不同。在Linux上,我們使用信號,而在Windows上,我們依賴Structured Exception Handler。

With exception handling, we achieved average reading speeds of 123KB/s when leaking 12MB of kernel memory. Out of the 12MB kernel data, only 0.03%were read incorrectly. Thus, with an error rate of 0.03 %, the channel capacity is 122KB/s.

在使用異常處理的情況下,我們實現了以123kB / s的平均速度完成了12MB內核數據的泄漏。在12MB的內核數據中,錯誤率只有0.03%。因此信道容量是122kB/s。

(2)異常抑制

和Intel處理器相關,忽略之。

3、Meltdown實戰

這個小節展示了幾個具體的meltdown攻擊效果,忽略之。

4、AMD和ARM處理器上的限制

We also tried to reproduce the Meltdown bug on several ARM and AMD CPUs. However, we did not manage to successfully leak kernel memory with the attack described in Section 5, neither on ARM nor on AMD. The reasons for this can be manifold. First of all, our implementation might simply be too slow and a more optimized version might succeed. For instance, a more shallow out-of-order execution pipeline could tip the race condition towards against the data leakage. Similarly, if the processor lacks certain features, e.g., no re-order buffer, our current implementation might not be able to leak data. However, for both ARM and AMD, the toy example as described in Section 3 works reliably, indicating that out-of-order execution generally occurs and instructions past illegal memory accesses are also performed.

我們還試圖在幾款ARM和AMD CPU上重現meltdown漏洞。然而無論是在ARM上還是在AMD處理器上,我們都沒有成功地使用第5章中描述的攻擊方法來盜取到內核內存。造成這種情況的原因是多方面的。首先,我們的實現可能太慢,一個更優化的版本可能會成功。例如,一個更淺的亂序執行流水線可能會讓數據泄漏變得更困難一些。類似地,如果處理器缺少某些特性,例如沒有重新排序緩衝區(re-order buffer),那麼我們當前的代碼實現可能根據無法造成泄漏數據。不過,對於ARM和AMD處理器來說,第3章中描述的簡單示例仍然可以可靠的工作,這表明在那些CPU上也發生了亂序執行,即非法內存訪問的指令之後的指令也提前被執行。

七、對策

In this section, we discuss countermeasures against the Meltdown attack. At first, as the issue is rooted in the hardware itself, we want to discuss possible microcode updates and general changes in the hardware design. Second, we want to discuss the KAISER countermeasure that has been developed to mitigate side-channel attacks against KASLR which inadvertently also protects against Meltdown.

在這一章中,我們討論瞭如何應對meltdown攻擊。由於這個issue本身是源於硬件設計,因此我們首先討論如何通過更新硬件設計來修復這個漏洞。其次,我們討論如何用KAISER來減輕meltdown,雖然KAISER最初的設計是爲了防止KASLR側信道攻擊的。

1、硬件策略

Meltdown bypasses the hardware-enforced isolation of security domains. There is no software vulnerability involved in Meltdown. Hence any software patch (e.g., KAISER [8]) will leave small amounts of memory exposed (cf. Section 7.2). There is no documentation whether such a fix requires the development of completely new hardware, or can be fixed using a microcode update.

Meltdown並不涉及軟件漏洞,它是直接繞過硬件隔離機制。因此,任何軟件補丁(例如,KAISER[8])都會暴露出少量的內存區域(參見第7.2節)。是否需要開發全新的硬件或使用微碼更新來修復meltdown,沒有文件說明這一點。

As Meltdown exploits out-of-order execution, a trivial countermeasure would be to completely disable out-of-order execution. However, the performance impacts would be devastating, as the parallelism of modern CPUs could not be leveraged anymore. Thus, this is not a viable solution.

由於meltdown利用了亂序執行,一個簡單的對策就是完全禁用亂序執行。不過這對性能的影響將是毀滅性的,因爲我們將無法再利用現代CPU的並行性了。因此,這個解決方案不可行。

Meltdown is some form of race condition between the fetch of a memory address and the corresponding permission check for this address. Serializing the permission check and the register fetch can prevent Meltdown, as the memory address is never fetched if the permission check fails. However, this involves a significant overhead to every memory fetch, as the memory fetch has to stall until the permission check is completed.

Meltdown是“獲取內存地址數據”和“權限檢查”之間的一種競態條件(race condition)。嚴格在獲取地址數據之前進行權限檢查可以防止meltdown,即在不能通過權限檢查時候,CPU根本沒有辦法把受保護的內存數據加載到寄存器。然而,這給每一個內存訪問增加了很大的開銷,因爲在完成權限檢查之前,內存訪問的動作只能stall。

A more realistic solution would be to introduce a hard split of user space and kernel space. This could be enabled optionally by modern kernels using a new hard split bit in a CPU control register, e.g., CR4. If the hard split bit is set, the kernel has to reside in the upper half of the address space, and the user space has to reside in the lower half of the address space. With this hard split, a memory fetch can immediately identify whether such a fetch of the destination would violate a security boundary, as the privilege level can be directly derived from the virtual address without any further lookups. We expect the performance impacts of such a solution to be minimal. Furthermore, the backwards compatibility is ensured, since the hard-split bit is not set by default and the kernel only sets it if it supports the hard-split feature.

一個更現實的解決方案是從硬件層面區分用戶空間和內核空間。這可以通過CPU寄存器(例如cr4)的一個bit(稱之hard-split bit)來開啓。如果該bit設置爲1,則內核地址必須在地址空間的上半部分,而用戶空間必須位於地址空間的下半部分。有了這種硬件機制,違反權限的內存讀取可以被立刻識別,這是因爲所需特權級別可以直接從虛擬地址推導出來而不需要任何進一步的查找。我們認爲這種解決方案對性能的影響是最小的。此外,向後兼容性也得到保證,因爲默認情況下我們不設置hard-split bit,而內核僅在硬件支持的時候才設置它。

Note that these countermeasures only prevent Meltdown, and not the class of Spectre attacks described by Kocher et al. [19]. Likewise, several countermeasures presented by Kocher et al. [19] have no effect on Meltdown. We stress that it is important to deploy countermeasures against both attacks.

請注意,這些對策只能防止meltdown,對Kocher等人發現的幽靈攻擊無效[ 19 ]。同樣,由Kocher等人提出的解決spectre漏洞[ 19 ] 的對策,對meltdown也沒有效果。我們這裏再次強調一下:針對這兩種攻擊部署相關的對策是非常重要的。

2、KAISER

As hardware is not as easy to patch, there is a need for software workarounds until new hardware can be deployed. Gruss et al. [8] proposed KAISER, a kernel modification to not have the kernel mapped in the user space. This modification was intended to prevent side-channel attacks breaking KASLR [13, 9, 17]. However, it also prevents Meltdown, as it ensures that there is no valid mapping to kernel space or physical memory available in user space. KAISER will be available in the upcoming releases of the Linux kernel under the name kernel page-table isolation (KPTI) [25]. The patch will also be backported to older Linux kernel versions. A similar patch was also introduced in Microsoft Windows 10 Build 17035 [15]. Also, Mac OS X and iOS have similar features [22].

硬件修復漏洞沒有那麼快,因此我們還是要到在新的硬件可以部署之前,提供軟件繞過的方案。Gruss等人[ 8 ]建議了KAISER方案,該方案對內核進行修改,以便在用戶進程地址空間中根本看不到內核地址的映射。這個補丁是爲了防止側信道攻擊方法攻破KASLR [ 13, 9, 17 ]。然而,因爲它確保了在用戶空間沒有有效的內核空間映射或物理內存映射,因此KAISER也能解決meltdown問題。KAISER將會出現在即將發佈的Linux內核中,名字改成了KPTI(kernel page-table isolation)[ 25 ],同時該補丁也將移植到舊的Linux內核版本。微軟Windows 10也提供了類似的補丁[ 15 ]。另外,Mac OS X和iOS也有類似的功能[ 22 ]

Although KAISER provides basic protection against Meltdown, it still has some limitations. Due to the design of the x86 architecture, several privileged memory locations are required to be mapped in user space [8]. This leaves a residual attack surface for Meltdown, i.e., these memory locations can still be read from user space. Even though these memory locations do not contain any secrets, such as credentials, they might still contain pointers. Leaking one pointer can be enough to again break KASLR, as the randomization can be calculated from the pointer value.

雖然KAISER提供了基本的保護以防止meltdown,但它仍然有一些侷限性。由於x86架構的設計,需要在用戶空間中映射一小段內核地址空間[ 8 ],因此這些內存位置仍然可以從用戶空間讀取,這爲meltdown攻擊留下伏筆。即使這些內存位置不包含任何機密數據,它們仍然可能包含指針。其實一個指針的數據就足夠攻破KASLR,因爲隨機偏移可以根據指針的值推導出來。

Still, KAISER is the best short-time solution currently available and should therefore be deployed on all systems immediately. Even with Meltdown, KAISER can avoid having any kernel pointers on memory locations that are mapped in the user space which would leak information about the randomized offsets. This would require trampoline locations for every kernel pointer, i.e., the interrupt handler would not call into kernel code directly, but through a trampoline function. The trampoline function must only be mapped in the kernel. It must be randomized with a different offset than the remaining kernel. Consequently, an attacker can only leak pointers to the trampoline code, but not the randomized offsets of the remaining kernel. Such trampoline code is required for every kernel memory that still has to be mapped in user space and contains kernel addresses. This approach is a trade-off between performance and security which has to be assessed in future work.

不過,KAISER仍然是目前最好的短期解決方案,並且應該立即部署到所有系統上。即便是CPU存在meltdown漏洞,KAISER補丁避免了在用戶空間映射的內存位置上保存內核的指針,這樣可以避免泄露隨機偏移的信息。爲了達到這個目標,我們需要爲每一個內核指針建立trampoline code,例如:中斷處理程序不會直接調用內核代碼,而是通過trampoline函數。trampoline函數只會映射到內核空間,但是和其餘部分的內核應該在不同的隨機偏移上。因此,攻擊者只能獲取trampoline code的內核地址,而不能破解剩餘內核的隨機偏移。每一個進程地址空間仍然映射了trampoline code這段內存,而這段內存中也包括了內核的地址,存在一定的風險,但是這種方法是性能和安全性之間進行平衡,也是我們必須在今後的工作進一步研究的課題。

八、討論

Meltdown fundamentally changes our perspective on the security of hardware optimizations that manipulate the state of microarchitectural elements. The fact that hardware optimizations can change the state of microarchitectural elements, and thereby imperil secure soft-ware implementations, is known since more than 20 years [20]. Both industry and the scientific community so far accepted this as a necessary evil for efficient computing. Today it is considered a bug when a cryptographic algorithm is not protected against the microarchitectural leakage introduced by the hardware optimizations. Meltdown changes the situation entirely. Meltdown shifts the granularity from a comparably low spatial and temporal granularity, e.g., 64-bytes every few hundred cycles for cache attacks, to an arbitrary granularity, allowing an attacker to read every single bit. This is nothing any (cryptographic) algorithm can protect itself against. KAISER is a short-term software fix, but the problem we uncovered is much more significant.

通過調整CPU微架構狀態,CPU設計者可以優化硬件的性能,由此引入的安全問題並沒有引起足夠的重視,Meltdown從根本上改變了這一點,即CPU設計者必須直面安全問題。20多年以來,CPU設計者很清楚的知道這樣一個事實:硬件優化可以改變CPU微架構的狀態,從而給安全軟件的實現帶來風險[ 20 ]。但是到目前爲止,工業界和科學界都認爲這是高效計算所必需面對的一個問題,你不得不接受它。現在,當一個加密算法不能保護微架構狀態的泄露(由於硬件優化而引入),我們認爲這是一個軟件bug。Meltdown徹底改變了現狀。原來的攻擊在空間和時間粒度上是相對較小,例如,緩存攻擊的空間粒度是64個字節,時間粒度是大概幾百個週期。有了meltdown,空間和時間粒度可以任意指定,允許攻擊者讀取每一個比特位,這不是什麼(加密)算法可以保護了的。KAISER是一個短期的軟件解決方案,但我們揭示的問題更爲重要(即不能爲了性能而忽略安全性)。

We expect several more performance optimizations in modern CPUs which affect the microarchitectural state in some way, not even necessarily through the cache. Thus, hardware which is designed to provide certain security guarantees, e.g., CPUs running untrusted code, require a redesign to avoid Meltdown- and Spectre-like attacks. Meltdown also shows that even error-free software, which is explicitly written to thwart side-channel attacks, is not secure if the design of the underlying hardware is not taken into account.

我們期待更多的性能優化出現在現代CPU上,這些優化可能以某種方式影響微架構的狀態(未必一定是影響緩存狀態,可能是其他的CPU微架構單元)。不過只要CPU硬件的設計需求中包含了安全性,那麼CPU設計者就需要重新設計以避免meltdown和spectre的攻擊。Meltdown還表明:即便是軟件沒有bug,而且也小心的設計以避免側信道攻擊,如果不仔細思考底層硬件的安全性,那麼它也是不安全的。

With the integration of KAISER into all major operating systems, an important step has already been done to prevent Meltdown. KAISER is also the first step of a paradigm change in operating systems. Instead of always mapping everything into the address space, mapping only the minimally required memory locations appears to be a first step in reducing the attack surface. However, it might not be enough, and an even stronger isolation may be required. In this case, we can trade flexibility for performance and security, by e.g., forcing a certain virtual memory layout for every operating system. As most modern operating system already use basically the same memory layout, this might be a promising approach.

隨着KAISER集成到目前所有的主流操作系統中,我們在防止meltdown上已經邁出了重要的一步。KAISER也改變了操作系統中地址映射的設計思路。原來我們總是將所有地址(包括內核和用戶)映射到整個進程地址空間,現在,在用戶態執行的時候只是映射必要的地址空間。這的確是減少攻擊範圍。然而,這可能是不夠的,我們可能需要更強烈的隔離。在這種情況下,我們需要在性能和安全性上進行平衡,例如,強制每個操作系統符合特定的虛擬內存佈局。由於大多數現代操作系統使用了基本相同的內存佈局,這可能是一種很有前途的方法。

Meltdown also heavily affects cloud providers, especially if the guests are not fully virtualized. For performance reasons, many hosting or cloud providers do not have an abstraction layer for virtual memory. In such environments, which typically use containers, such as Docker or OpenVZ, the kernel is shared among all guests. Thus, the isolation between guests can simply be circumvented with Meltdown, fully exposing the data of all other guests on the same host. For these providers, changing their infrastructure to full virtualization or using software workarounds such as KAISER would both increase the costs significantly.

Meltdown也嚴重影響了雲服務提供商,特別是在客戶機沒有完全虛擬化的場景中。出於性能方面的原因,許多雲服務提供商沒有虛擬內存的抽象層。在這樣的環境中(通常是使用容器,如Docker或OpenVZ),內核在所有的guest os中共享。因此,雖然存在guest os之間的隔離,但是我們可以利用Meltdown,將其他guest os的數據(在同一個主機)暴露出來。對於這些供應商,改變他們的基礎設施,變成全虛擬化或使用軟件解決方法(如KAISER)都會增加成本。

Even if Meltdown is fixed, Spectre [19] will remain an issue. Spectre [19] and Meltdown need different defenses. Specifically mitigating only one of them will leave the security of the entire system at risk. We expect that Meltdown and Spectre open a new field of research to investigate in what extent performance optimizations change the microarchitectural state, how this state can be translated into an architectural state, and how such attacks can be prevented.

即使meltdown被修復了,spectre[ 19 ]仍然是一個問題。Spectre和meltdown需要不同的防禦策略。只是解決其中一個並不能解決整個系統的安全問題。我們期待meltdown和spectre可以打開一個新的研究領域,讓大家一起探討CPU設計的相關問題,包括改變微架構的狀態如何可以優化CPU性能,微架構狀態如何轉化爲CPU體系結構狀態,以及如何阻止這樣的攻擊。

九、結論

In this paper, we presented Meltdown, a novel softwarebased side-channel attack exploiting out-of-order execution on modern processors to read arbitrary kernel- and physical-memory locations from an unprivileged user space program. Without requiring any software vulnerability and independent of the operating system, Meltdown enables an adversary to read sensitive data of other processes or virtual machines in the cloud with up to 503KB/s, affecting millions of devices. We showed that the countermeasure KAISER [8], originally proposed to protect from side-channel attacks against KASLR, inadvertently impedes Meltdown as well. We stress that KAISER needs to be deployed on every operating system as a short-term workaround, until Meltdown is fixed in hardware, to prevent large-scale exploitation of Meltdown.

在本文中,我們描述了一個新型的CPU漏洞meltdown,一種利用現代處理器上的亂序執行特性,通過側信道攻擊讀取任意內核地址和物理內存數據的方法。不需要利用軟件漏洞,也和具體操作系統無關,利用Meltdown漏洞,普通用戶空間程序可以以503KB/s的速度讀其他進程或虛擬機的敏感數據,這影響了數以百萬計的設備。我們發現針對meltdown的對策是KAISER [ 8 ],KAISER最初是爲了防止側信道攻擊KASLR而引入的,但是無意中也可以防止meltdown漏洞。我們建議:一個短期的解決辦法是在每一個操作系統上都部署KAISER,直到解決meltdown issue的硬件出現。

十、致謝

We would like to thank Anders Fogh for fruitful discussions at BlackHat USA 2016 and BlackHat Europe 2016, which ultimately led to the discovery of Meltdown. Fogh [5] already suspected that it might be possible to abuse speculative execution in order to read kernel memory in user mode but his experiments were not successful. We would also like to thank Jann Horn for comments on an early draft. Jann disclosed the issue to Intel in June. The subsequent activity around the KAISER patch was the reason we started investigating this issue. Furthermore, we would like Intel, ARM, Qualcomm, and Microsoft for feedback on an early draft.

我們感謝Anders Fogh在BlackHat USA 2016和BlackHat Europe 2016上富有成果的討論,這些討論最終導致meltdown的發現。Anders Fogh [ 5 ]已經懷疑利用推測執行可以在用戶模式下讀取內核數據,但他的實驗並不成功。我們也要感謝Jann Horn對早期草稿的意見。Jann Horn在6月份向Intel透漏了這個問題。隨後圍繞KAISER補丁的後續活動也使得我們開始調查這個問題。此外,我們也欣賞英特爾、ARM、高通和微軟在早期草案階段給予的反饋。

We would also like to thank Intel for awarding us with a bug bounty for the responsible disclosure process, and their professional handling of this issue through communicating a clear timeline and connecting all involved researchers. Furthermore, we would also thank ARM for their fast response upon disclosing the issue.

我們也要感謝英特爾公司,在發現meltdown漏洞之後對我們進行了獎勵,並且負責的披露整個過程,感謝他們專業的處理了這個問題(給出一個明確的時間表進行充分溝通,並聯繫了所有相關的研究人員)。此外,我們還感謝ARM在披露問題時的快速反應。

This work was supported in part by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 681402).

歐洲研究委員會(ERC)根據歐盟Horizon 2020科研創新計劃(編號:681402)對本項工作有一定的支持。

參考文獻:

[1] BENGER, N., VAN DE POL, J., SMART, N. P., AND YAROM, Y. “Ooh Aah... Just a Little Bit”: A small amount of side channel can go a long way. In CHES’14 (2014).

[2] CHENG, C.-C. The schemes and performances of dynamic branch predictors. Berkeley Wireless Research Center, Tech. Rep (2000).

[3] DEVIES, A. M. AMD Takes Computing to a New Horizon with RyzenTMProcessors, 2016.

[4] EDGE, J. Kernel address space layout randomization, 2013.

[5] FOGH, A. Negative Result: Reading Kernel Memory From User Mode, 2017.

[6] GRAS, B., RAZAVI, K., BOSMAN, E., BOS, H., AND GIUFFRIDA, C. ASLR on the Line: Practical Cache Attacks on the MMU. In NDSS (2017).

[7] GRUSS, D., LETTNER, J., SCHUSTER, F., OHRIMENKO, O., HALLER, I., AND COSTA, M. Strong and Efficient Cache Side-Channel Protection using Hardware Transactional Memory. In USENIX Security Symposium (2017).

[8] GRUSS, D., LIPP, M., SCHWARZ, M., FELLNER, R., MAURICE, C., AND MANGARD, S. KASLR is Dead: Long Live KASLR. In International Symposium on Engineering Secure Software and Systems (2017), Springer, pp. 161–176.

[9] GRUSS, D., MAURICE, C., FOGH, A., LIPP, M., AND MANGARD, S. Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR. In CCS (2016).

[10] GRUSS, D., MAURICE, C., WAGNER, K., AND MANGARD, S. Flush+Flush: A Fast and Stealthy Cache Attack. In DIMVA (2016).

[11] GRUSS, D., SPREITZER, R., AND MANGARD, S. Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches. In USENIX Security Symposium (2015).

[12] HENNESSY, J. L., AND PATTERSON, D. A. Computer architecture: a quantitative approach. Elsevier, 2011.

[13] HUND, R., WILLEMS, C., AND HOLZ, T. Practical Timing Side Channel Attacks against Kernel Space ASLR. In S&P (2013).

[14] INTEL. IntelR 64 and IA-32 Architectures Optimization Reference Manual, 2014.

[15] IONESCU, A. Windows 17035 Kernel ASLR/VA Isolation In Practice (like Linux KAISER)., 2017.

[16] IRAZOQUI, G., INCI, M. S., EISENBARTH, T., AND SUNAR, B. Wait a minute! A fast, Cross-VM attack on AES. In RAID’14 (2014).

[17] JANG, Y., LEE, S., AND KIM, T. Breaking Kernel Address Space Layout Randomization with Intel TSX. In CCS (2016).

[18] JIM´E NEZ, D. A., AND LIN, C. Dynamic branch prediction with perceptrons. In High-Performance Computer Architecture, 2001. HPCA. The Seventh International Symposium on (2001), IEEE, pp. 197–206.

[19] KOCHER, P., GENKIN, D., GRUSS, D., HAAS, W., HAMBURG, M., LIPP, M., MANGARD, S., PRESCHER, T., SCHWARZ, M., AND YAROM, Y. Spectre Attacks: Exploiting Speculative Execution.

[20] KOCHER, P. C. Timing Attacks on Implementations of Diffe- Hellman, RSA, DSS, and Other Systems. In CRYPTO (1996).

[21] LEE, B., MALISHEVSKY, A., BECK, D., SCHMID, A., AND LANDRY, E. Dynamic branch prediction. Oregon State University.

[22] LEVIN, J. Mac OS X and IOS Internals: To the Apple’s Core John Wiley & Sons, 2012.

[23] LIPP, M., GRUSS, D., SPREITZER, R., MAURICE, C., AND MANGARD, S. ARMageddon: Cache Attacks on Mobile Devices. In USENIX Security Symposium (2016).

[24] LIU, F., YAROM, Y., GE, Q., HEISER, G., AND LEE, R. B. Last-Level Cache Side-Channel Attacks are Practical. In IEEE Symposium on Security and Privacy – SP (2015), IEEE Computer Society, pp. 605–622.

[25] LWN. The current state of kernel page-table isolation, Dec. 2017.

[26] MAURICE, C., WEBER, M., SCHWARZ, M., GINER, L., GRUSS, D., ALBERTO BOANO, C., MANGARD, S., AND R¨OMER, K. Hello from the Other Side: SSH over Robust Cache Covert Channels in the Cloud. In NDSS (2017).

[27] MOLNAR, I. x86: Enable KASLR by default, 2017.

[28] OSVIK, D. A., SHAMIR, A., AND TROMER, E. Cache Attacks and Countermeasures: the Case of AES. In CT-RSA (2006).

[29] PERCIVAL, C. Cache missing for fun and profit. In Proceedings of BSDCan (2005).

[30] PHORONIX. Linux 4.12 To Enable KASLR By Default, 2017.

[31] SCHWARZ, M., LIPP, M., GRUSS, D., WEISER, S., MAURICE, C., SPREITZER, R., AND MANGARD, S. KeyDrown: Eliminating Software-Based Keystroke Timing Side-Channel Attacks. In NDSS’18 (2018).

[32] TERAN, E., WANG, Z., AND JIM´ENEZ, D. A. Perceptron learning for reuse prediction. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on (2016), IEEE, pp. 1–12.

[33] TOMASULO, R. M. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of research and Development 11, 1 (1967), 25–33.

[34] VINTAN, L. N., AND IRIDON, M. Towards a high performance neural branch predictor. In Neural Networks, 1999. IJCNN’99. International Joint Conference on (1999), vol. 2, IEEE, pp. 868–873.

[35] YAROM, Y., AND FALKNER, K. Flush+Reload: a High Resolution, Low Noise, L3 Cache Side-Channel Attack. In USENIX Security Symposium (2014).

[36] YEH, T.-Y., AND PATT, Y. N. Two-level adaptive training branch prediction. In Proceedings of the 24th annual international symposium on Microarchitecture (1991), ACM, pp. 51–61.

[37] ZHANG, Y., JUELS, A., REITER, M. K., AND RISTENPART, T. Cross-Tenant Side-Channel Attacks in PaaS Clouds. In CCS’14 (2014).

 

原創翻譯文章,轉發請註明出處。蝸窩科技

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章