Intel開發者手冊--第3卷--第8章--多處理器管理--8.1

8.1 LOCKED ATOMIC OPERATIONS

8.1 加鎖的原子操作

The 32-bit IA-32 processors support locked atomic operations on locationsin system memory. These operations are typically used to manage shared datastructures (such as semaphores, segment descriptors, system segments, or pagetables) in which two or more processors may try simultaneously to modify thesame field or flag. The processor uses three interdependent mechanisms forcarrying out locked atomic operations:

32位的IA-32處理器支持在系統內存原子操作加鎖。這些操作一般用於管理資料共享結構(比如:信號,段描述符,系統段或分頁表),兩個或多個處理器可能會同時嘗試修改相同的區域或標誌。處理器使用三種機制執行原子操作加鎖。

• Guaranteed atomic operations

有保障原子操作

• Bus locking, using the LOCK# signal and the LOCKinstruction prefix

 總線鎖定,使用LOCK#信號和LOCK指令前綴

• Cache coherency protocols that ensurethat atomic operations can be carried out on cached data structures (cachelock); this mechanism is present in the Pentium 4, Intel Xeon, and P6 familyprocessors These mechanisms are interdependent in the following ways. Certainbasic memory transactions (such as reading or writing a byte in system memory)are always guaranteed to be handled atomically. That is, once started, the

processor guarantees that the operation will be completed before anotherprocessor or bus agent is allowed access to the memory location.

 The processor also supports buslocking for performing selected memory operations (such as a read-modify-writeoperation in a shared area of memory) that typically need to be handledatomically, but are not automatically handled this way. Because frequently usedmemory locations are often cached in a processor’s L1 or L2 caches, atomicoperations can often be carried out inside a processor’s caches withoutasserting the bus lock. Here the processor’s cache coherency protocols ensurethat other processors that are caching the same memory locations are managedproperly while atomic operations are performed on cached memory locations.

 緩存一致性協議保證原子操作可以在緩存資料結構被執行(緩存鎖);這一機制存在於Pentium4, Intel Xeon P6架構的處理器家族。

這些機制以以下方式共存。某些基本內存處理(比如:在系統內存讀寫一byte)總是被保持原子態。即,一旦啟動,處理器保證這些操作被完整執行直到其他處理器或總線被允許方位這些內存單元。處理器也支持總線鎖來執行選定的內存操作(比如:在共享內存上執行一個read-modify-write操作),這一般需要保持原子態,但這裡並沒有自動保持。因為頻繁使用的內存單元經常被緩存到L1L2緩存,原子操作可能經常在處理器緩存內部執行,而不需要藉助總線鎖。這裡處理器緩存一致性協議確保,當原子操作在緩存存儲單元執行的時候,其它緩存了相同內存單元的處理器被適當管控。

 

NOTE

注意

Where there are contested lock accesses, software may need to implementalgorithms that ensure fair access to resources in order toprevent lock starvation. The hardware provides no resource thatguarantees fairness to participating agents. It is the responsibility ofsoftware to manage the fairness of semaphores and exclusive locking functions.

如果哪裡有鎖訪問競爭,那麼軟件需要實現算法確保資源公平來防止死鎖。硬件不會介入提供保障fairness的資源。處理信號fairness性和實現互斥鎖功能是軟件的責任。

The mechanisms for handling locked atomic operations have evolved with thecomplexity of IA-32 processors. More recent IA-32 processors (such as thePentium 4, Intel Xeon, and P6 family processors) and Intel 64 provide a more refinedlocking mechanism than earlier processors. These mechanisms are described inthe following sections.

原子操作鎖機制增加了IA-32處理器的複雜度。較新的IA-32處理器(比如:Pentium 4, Intel Xeon, P6家族處理器)和Intel 64 提供比之前的處理器更好的鎖機制。這種鎖機制將在後面章節介紹。

8.1.1 Guaranteed Atomic Operations

8.1.1 保障原子操作

The Intel486 processor (and newer processors since) guarantees that thefollowing basic memory operations will always be carried out atomically:

Intel486 處理器(和到現在為止的處理器)保障以下基本內存操作總是執行原子化操作:

• Reading or writing a byte

讀取或者寫入一個byte

• Reading or writing a word aligned ona 16-bit boundary

讀取或者寫入一個16邊界對齊的word

• Reading or writing a doublewordaligned on a 32-bit boundary

讀取或寫入一個32位邊界對齊的doubleword

The Pentium processor (and newer processors since) guarantees that thefollowing additional memory operations will always be carried out atomically:

Pentium處理器(和到目前為止的新處理器)保障以下擴展內存操作總是執行原子化操作:

• Reading or writing a quadword alignedon a 64-bit boundary

讀寫一個64位邊界對齊的quadword

16-bit accesses to uncached memorylocations that fit within a 32-bit data bus

未緩存的且在32位數據總線尋址範圍的內存地址的16位訪問

The P6 family processors (and newer processors since) guarantee that thefollowing additional memory operation will always be carried out atomically:

P6家族處理器(和到目前為止的更新處理器)保障以下擴展內存操作總是執行原子化操作:

• Unaligned 16-, 32-, and 64-bitaccesses to cached memory that fit within a cache line

未對齊的16/32/64位訪問單個Cache Line中已緩存的內存

Accesses to cacheable memory that are split across cache lines and pageboundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel® Atom™, Intel Core Duo, Pentium M, Pentium4, Intel Xeon, P6 family, Pentium, and Intel486 processors. The Intel Core 2Duo, Intel Atom, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, and P6family processors provide bus control signals that permit external memorysubsystems to make split accesses atomic; however, nonaligned data accesseswill seriously impact the performance of the processor and should be avoided.

Intel Core 2/Intel Atom/Intel Core Duo/Pentium M/Pentium 4/Intel Xeon/P6family/Pentium/Intel486 處理器,訪問被Cache Line和分頁邊界分開的可緩存內存不會保障原子化。Intel Core 2 Duo/Intel Atom/Intel CoreDuo/Pentium M/Pentium 4/Intel Xeon/P6 family處理器提供總線控制信號允許外部內存子系統使分離訪問原子化。但是,未對齊的資料訪問將嚴重影響處理器性能,應該盡量避免。

An x87 instruction or an SSE instructions that accesses data larger than aquadword may be implemented using multiple memory accesses. If such aninstruction stores to memory, some of the accesses may complete (writing tomemory) while another causes the operation to fault for architectural reasons(e.g. due an page-table entry that

is marked “not present”). In this case, the effects of the completedaccesses may be visible to software even though the overall instruction causeda fault. If TLB invalidation has been delayed (see Section 4.10.4.4), such pagefaults may occur even if all accesses are to the same page.

X87指令或者SSE指令訪問長於quadword的資料可能使用多次內存訪問。如果這樣一個指令存儲內存,一些訪問可能完整(寫內存)而其它的操作會導致因為結構原因的錯誤(比如分頁表入口被標記為“not present”)。這種情況下,整個訪問的過程可能對軟件來說是可見的,雖然整個指令會導致錯誤。如果TLB失效被延遲(見4.10.4.4),這樣的分頁錯誤即使在訪問同一個分頁的時候也可能發生。

8.1.2 Bus Locking

8.1.2 總線鎖

Intel 64 and IA-32 processors provide a LOCK# signal that isasserted automatically during certain critical memory operations tolock the system bus or equivalent link. While this output signal is asserted,requests from other processors or bus agents for control of the bus areblocked. Software can specify other occasions when the LOCK semantics are tobe followed by prepending the LOCK prefix to an instruction.

Intel64 IA-32處理器提供LOCK#信號,它在某些特定內存操作會被自動宣告,來鎖定系統總線或等效構件。當這個輸出信號被宣告,來自於其他處理器或者總線代理的控制需求會被鎖住。其他場合軟件需要把前綴LOCK放置在指令前。

In the case of the Intel386, Intel486, and Pentium processors, explicitlylocked instructions will result in the assertion of the LOCK# signal. It is theresponsibility of the hardware designer to make the LOCK# signal available in systemhardware to control memory accesses among processors.

Intel386/Intel486/Pentium處理器,需要聲明LOCK#信號明確的鎖定指令。硬件設計師的責任是保證LOCK#信號在控制多個處理器中控制內存訪問的時候可用。

For the P6 and more recent processor families, if the memory area beingaccessed is cached internally in the processor, the LOCK# signal is generallynot asserted; instead, locking is only applied to the processor’s caches (seeSection 8.1.4, “Effects of a LOCK Operation on Internal Processor Caches”).

P6和更多現代的處理器家族,如果內存區域已經緩存在處理器內,LOCK#信號一般不會聲明;而是會鎖定處理器緩存(見8.1.4 “在處理器內部緩存的LOCK操作效果)。

8.1.2.1 Automatic Locking

8.1.2.1 自動鎖定

The operations on which the processor automatically follows the LOCKsemantics are as follows:

處理器會在下面這些操作上自動添加LOCK語意

• When executing an XCHG instructionthat references memory.

當執行涉及內存的XCHG指令

When setting the B (busy) flag of aTSS descriptor The processor tests and sets the busy flag in the typefield of the TSS descriptor when switching to a task. To ensure that twoprocessors do not switch to the same task simultaneously, the processor followsthe LOCK semantics while testing and setting this flag.

當正在設置TSS描述符號的B(忙)標誌處理器在切換任務的時候檢測並設置TSS描述符號的忙標誌。為了確保2個處理器不同時切換到相同的任務,處理器在測試和設置標誌的時候添加LOCK前綴。

When updating segment descriptorsWhen loading a segment descriptor, the processor will setthe accessed flag in the segment descriptor if the flag is clear. During thisoperation, the processor follows the LOCK semantics so that the descriptor willnot be modified by another processor while it is being updated. For this actionto be effective, operating-system procedures that update descriptors should usethe following steps:

— Use a locked operation to modify the access-rightsbyte to indicate that the segment descriptor is notpresent, and specify a valuefor the type field that indicates that the descriptor is being updated.

— Update the fields of the segment descriptor. (This operation may requireseveral memory accesses; therefore, locked operations cannot be used.)

— Use a locked operation to modify the access-rights byte to indicate thatthe segment descriptor is valid and present.

當正在更新段描述符號當加載段描述符號,且標誌清除的時候,處理器會在段描述符號設置訪問標誌。進行這個操作的時候,處理器根據LOCK信號,所以正在更新時描述符不會被其他處理器修改。為了讓這個動作起效,操作系統更新描述符需要進行一下步驟:

用鎖操作修改訪問權限字節,指明段描述符不存在,並為類型域定義值,表明描述符在更新中。

更新段描述符域。(這個操作可能需要多次內存訪問;因此,所操作不可用)

使用鎖操作修改訪問權限字節,表明段描述符可用且存在。

• The Intel386 processor always updatesthe accessed flag in the segment descriptor, whether it is clear or not. ThePentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors only updatethis flag if it is not already set.

• Intel386處理器總是更新段描述符內的訪問標誌,不管它已清除或沒有。Pentium 4Intel Xeon/P6 家族、Pentium/Intel486 處理器只當標誌位沒設置的時候更新。

When updating page-directory andpage-table entries When updating page-directory and page-table entries, the processoruses locked cycles to set the accessed and dirty flag in the page-directory andpage-table entries.

當更新分頁目錄和分頁表入口更新頁目錄和頁表入口,處理器使用時鐘週期鎖在分頁目錄和分頁表入口設置訪問和臟標誌。

Acknowledging interrupts After an interrupt request, aninterrupt controller may use the data bus to send the interrupt’s vector to theprocessor. The processor follows the LOCK semantics during this time to ensure thatno other data appears on the data bus while the vector is being transmitted.

中斷響應中斷請求後,中斷控制器可能使用資料總線發送中斷向量到處理器。處理器在向量被傳輸的這段時間內根據LOCK信號,確認沒有其它資料出現在資料總線。

8.1.2.2 Software Controlled Bus Locking

8.1.2.2 軟件控制總線鎖

To explicitly force the LOCK semantics, software can use the LOCK prefixwith the following instructions when they are used to modify a memory location.An invalid-opcode exception (#UD) is generated when the LOCK prefix is usedwith any other instruction or when no write operation is made to memory (thatis, when the destination operand is in a register).

想要明確的強制LOCK信號,當訪修改內存的時候,軟件可以在指令前使用LOCK前綴。當LOCK前綴使用到其他指令前或者沒有指向內存的寫入操作,會產生無效操作數異常(也就是說,當目標操作數是寄存器的時候)。

• The bit test and modify instructions(BTS, BTR, and BTC).

位檢測和修改指令(BTS,BTR,BTC)。

• The exchange instructions (XADD,CMPXCHG, and CMPXCHG8B).

交換指令(XADD,CMPXCHG,CMPXCHG8B)。

• The LOCK prefix is automaticallyassumed for XCHG instruction.

• LOCK前綴自動為XCHG添加。

• The following single-operandarithmetic and logical instructions: INC, DEC, NOT, and NEG.

下列單操作數算術和邏輯指令:INC,DEC,NOT,NEG

• The following two-operand arithmeticand logical instructions: ADD, ADC, SUB, SBB, AND, OR, and XOR.

下列2操作數算術和邏輯指令:ADD,ADC,SUB,SBB,AND,OR,XOR

A locked instruction is guaranteed to lock only the area of memory definedby the destination operand, but may be interpreted by the system as a lock fora larger memory area.

Software should access semaphores (shared memory used for signallingbetween multiple processors) using identical addresses and operand lengths. Forexample, if one processor accesses a semaphore using a word access, other processorsshould not access the semaphore using a byte access.

鎖定指令只保證鎖住目的操作數所定義的內存區域,但可以被系統解釋為鎖定更大範圍的內存區域。

軟件訪問semaphores(用來在多處理器間傳遞信號的共享內存)應該使用同樣的位址和操作數長度。比如:如果一個處理器以word長度訪問semaphore,另外一個處理器就不應該用byte訪問這個semaphore

NOTE

Do not implement semaphores using the WC memory type.Do not perform non-temporal stores to a cache linecontaining a location used to implement a semaphore.

The integrity of a bus lock is not affected by the alignment of the memoryfield. The LOCK semantics are followed for as many bus cycles as necessary toupdate the entire operand. However, it is recommend that locked accesses bealigned on their natural boundaries for better system performance:

不要使用WC 內存類型執行semaphore。在包含用來執行semaphore地址的Cache line不要執行non-temporal存儲。

完整的總線鎖不受內存域對齊的影響。LOCK信號應遵循必要多的總線時鐘來更新整個操作。然而,推薦對齊到它們的自然邊界以得到更好的性能。

• Any boundary for an 8-bit access(locked or otherwise).

任何邊界的8位訪問(鎖定或其他)

• 16-bit boundary for locked wordaccesses.

鎖定的word訪問16位邊界

• 32-bit boundary for locked doublewordaccesses.

鎖定doubleword訪問32位邊界

• 64-bit boundary for locked quadwordaccesses.

鎖定的quadword訪問64位邊界

Locked operations are atomic with respect to all other memory operationsand all externally visible events. Only instruction fetch and page tableaccesses can pass locked instructions. Locked instructions can be used tosynchronize data written by one processor and read by another processor.

在其他所有內存操作和外部可見事件中鎖操作都是原子的。只有取指令和頁表訪問可以繞過鎖住的指令。鎖住的指令可以用來同步一個處理器資料寫入,另一個處理器讀取操作。

For the P6 family processors, locked operations serialize all outstandingload and store operations (that is, wait for them to complete). This rule isalso true for the Pentium 4 and Intel Xeon processors, with one exception. Loadoperations that reference weakly ordered memory types (such as the WC memorytype) may not be serialized.

Locked instructions should not be used to ensure thatdata written can be fetched as instructions.

P6家族處理器,鎖操作串行化所有待處理的加載和存儲操作(也就是,等待他們完成)。這一規則對於Pentium 4 Intel Xeon處理器同樣適用,除了一個例外:涉及弱順序內存類型的加載操作(比如WC內存類型)可能不會被序列化。

鎖定的指令不應該被用來確保已寫入的資料可以被取做指令。

NOTE

The locked instructions for the current versions of the Pentium 4, IntelXeon, P6 family, Pentium, and Intel486 processors allow data written to befetched as instructions. However, Intel recommends that developers who requirethe use of self-modifying code use a different synchronizing mechanism,described in the following sections.

加鎖指令在當前版本的Pentium4, Intel Xeon, P6家族, Pentium, Intel486處理器允許已寫入的資料被取做指令。然而,Intel推薦需要使用自修改代碼的開發者,使用另一種同步機制,將在後面章節介紹。

8.1.3 Handling Self- and Cross-ModifyingCode

8.1.3 控制自修改和交叉修改代碼

The act of a processor writing data into a currently executing codesegment with the intent of executing that data as code is called self-modifyingcode. IA-32 processors exhibit model-specific behavior whenexecuting selfmodified code, depending upon how far ahead of the currentexecution pointer the code has been modified.

處理器為了把當前資料當做代碼執行,而寫資料到當前執行中的代碼段的動作叫做自修改代碼。當執行自修改代碼的時候,IA-32處理器表現出模型特性行為,這基於修改的代碼與正在執行代碼的距離。

As processor microarchitectures become more complex and start tospeculatively execute code ahead of the retirement point (as in P6 and morerecent processor families), the rules regarding which code should execute, pre-or post-modification, become blurred. To writeself-modifying code and ensure that it is compliant with current and futureversions of the IA-32 architectures, use one of the following coding options:

當處理器微架構變得越來越複雜並且開始隱退點之前推測性地執行接下來的代碼(比如:P6和其它更現代的處理器家族),決定那條指令先被執行的規則,先或後修改,變得模糊。寫自修改代碼並確保它兼容現代或將來版本的IA-32架構,請使用以下一種編碼選項:

(* OPTION 1 *)

Store modified code (as data) into code segment;

Jump to new code or an intermediate location;

Execute new code;

*操作1*

存儲修改的代碼(資料態)到代碼段;

跳到新代碼或者中介地址;

執行新代碼;

(* OPTION 2 *)

Store modified code (as data) into code segment;

Execute a serializing instruction; (* For example, CPUIDinstruction *)

Execute new code;

*操作2*

存儲修改的代碼(資料態)到代碼段;

執行序列化指令;(*比如:CPUID指令*

執行新代碼;

The use of one of these options is not required for programs intended torun on the Pentium or Intel486 processors, but are recommended to ensurecompatibility with the P6 and more recent processor families.

使用這些選項在PentiumIntel486運行程序並不是必須的,但推薦使用它以確保與P6和更新的處理器家族兼容。

Self-modifying code will execute at a lower level of performance thannon-self-modifying or normal code. The degree of the performance deteriorationwill depend upon the frequency of modification and specific characteristics ofthe code.

自修改代碼比普通代碼執行效率更低。性能下降的百分比基於修改的頻繁度和代碼特性。

The act of one processor writing data into the currently executing codesegment of a second processor with the intent of having the second processorexecute that data as code is called cross-modifying code. As withselfmodifying code, IA-32 processors exhibit model-specific behavior whenexecuting cross-modifying code, depending upon how far ahead of the executingprocessors current execution pointer the code has been modified.

To write cross-modifying code and ensure that it is compliant with currentand future versions of the IA-32 architecture, the following processorsynchronization algorithm must be implemented:

一個處理器寫數據到當前執行中的代碼段然後另一個處理器將把這些資料當做代碼執行,這叫交叉修改代碼。與自修改代碼一樣,交叉修改代碼在IA-32處理器同樣呈現模式特性行為,基於執行用處理器當前執行點與被修改代碼的距離。寫交叉修改代碼並確保兼容現在和將來的IA-32架構版本,必須遵循以下同步算法。

(* Action of Modifying Processor *)

Memory_Flag ← 0; (* SetMemory_Flag to value other than 1 *)

Store modified code (as data) into code segment;

Memory_Flag ← 1;

*修改用處理器動作*

內存標誌0;(*設置內存標誌位非1的值*

存儲修改後的代碼(資料形式)到代碼段;

內存標誌1

(* Action of Executing Processor *)

WHILE (Memory_Flag ≠ 1)

Wait for code to update;

ELIHW;

Execute serializing instruction; (* For example, CPUIDinstruction *)

Begin executing modified code;

*執行用處理器動作*

WHILE(內存標誌≠1

    等待代碼更新;

ELIHW;

執行序列化指令;(*比如:CPUID指令*

開始執行修改後的代碼;

(The use of this option is not required for programs intended to run onthe Intel486 processor, but is recommended to ensure compatibility with thePentium 4, Intel Xeon, P6 family, and Pentium processors.)

使用這些選項在PentiumIntel486運行程序並不是必須的,但推薦使用它以確保與Pentium 4, Intel Xeon, P6家族和Pentium處理器兼容。

Like self-modifying code, cross-modifying code will execute at a lowerlevel of performance than non-cross-modifying (normal) code, depending upon thefrequency of modification and specific characteristics of the code.

The restrictions on self-modifying code and cross-modifying code alsoapply to the Intel 64 architecture.

自修改代碼和交叉修改代碼執行效率比普通代碼低,基於修改頻度和代碼特性。

這些限制同樣適用Intel 64架構。

8.1.4 Effects of a LOCK Operation onInternal Processor Caches

8.1.4 LOCK操作在處理器內部緩存的作用

For the Intel486 and Pentium processors, the LOCK# signal is alwaysasserted on the bus during a LOCK operation, even if the area of memory beinglocked is cached in the processor.

Intel 486 Pentium處理器,LOCK操作進行時,總是在總線聲明LOCK信號,即使鎖定的內存區域已經緩存在處理器內。

For the P6 and more recent processor families, if the area of memory beinglocked during a LOCK operation is cached in the processor that is performingthe LOCK operation as write-back memory and is completely contained in a cacheline, the processor may not assert the LOCK# signal on the bus. Instead, itwill modify the memory location

internally and allow it’s cache coherency mechanism to ensure that theoperation is carried out atomically. This operation is called “cache locking.”The cache coherency mechanism automatically prevents two or more processors thathave cached the same area of memory from simultaneously modifying data in thatarea.

P6和更現代的處理器家族,如果鎖定的內存區域已經緩存在處理器內,將會把LOCK操作按write-back內存類型操作,並且是完全在Cache line內,處理器未必在總線聲明LOCK#信號。取而代之,它將修改內部內存位址然後讓緩存一致性機制來確保操作被自動執行。這一操作叫做“緩存鎖。緩存一致性機制自動放置2個或多個處理器緩存的相同內存位址資料被同時修改。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章