Linux Kernel CMPXCHG函數分析

最近看到Linux Kernel cmpxchg的代碼，對實現很不理解。上網查了內嵌彙編以及Intel開發文檔，才慢慢理解了，記錄下來以享和我一樣困惑的開發者。其實cmpxchg實現的原子操作原理早已被熟知：

cmpxchg(void* ptr, int old, int new)，如果ptr和old的值一樣，則把new寫到ptr內存，否則返回ptr的值，整個操作是原子的。在Intel平臺下，會用lock cmpxchg來實現，這裏的lock個人理解是鎖住內存總線，這樣如果有另一個線程想訪問ptr的內存，就會被block住。

好了，讓我們來看Linux Kernel中的cmpxchg(網上找來的，我自己機器上沒找到對應的頭文件，據說在include/asm-i386/cmpxchg.h)實現：

01./* TODO: You should use modern GCC atomic instruction builtins instead of this. */  
02.#include <stdint.h>  
03.#define cmpxchg( ptr, _old, _new ) { \  
04.  volatile uint32_t *__ptr = (volatile uint32_t *)(ptr);   \  
05.  uint32_t __ret;                                     \  
06.  asm volatile( "lock; cmpxchgl %2,%1"           \  
07.    : "=a" (__ret), "+m" (*__ptr)                \  
08.    : "r" (_new), "0" (_old)                     \  
09.    : "memory");                 \  
10.  );                                             \  
11.  __ret;                                         \  
12.}

/* TODO: You should use modern GCC atomic instruction builtins instead of this. */
#include <stdint.h>
#define cmpxchg( ptr, _old, _new ) { \
  volatile uint32_t *__ptr = (volatile uint32_t *)(ptr);   \
  uint32_t __ret;                                     \
  asm volatile( "lock; cmpxchgl %2,%1"           \
    : "=a" (__ret), "+m" (*__ptr)                \
    : "r" (_new), "0" (_old)                     \
    : "memory");				 \
  );                                             \
  __ret;                                         \
}

主要要看懂內嵌彙編，c的內嵌彙編格式是

01.asm ( assembler template  
02.    : output operands                   (optional)  
03.    : input operands                    (optional)  
04.    : clobbered registers list          (optional)  
05.    );

asm ( assembler template
    : output operands                   (optional)
    : input operands                    (optional)
    : clobbered registers list          (optional)
    );

output operands和inpupt operands指定參數，它們從左到右依次排列，用','分割，編號從0開始。以cmpxchg彙編爲例，(__ret)對應0，(*__ptr)對應1，(_new)對應2，(_old)對應3，如果在彙編中用到"%2"，那麼就是指代_new，"%1"指代(*__ptr)。

"=a"是說要把結果寫到__ret中，而且要使用eax寄存器，所以最後寫結果的時候是的操作是mov eax, ret (eax==>__ret)。"r" (_new)是要把_new的值讀到一個通用寄存器中使用。

在cmpxchg中，注意"0"(_old)，這個是困惑我的地方，它像告訴你(_old)和第0號操作數使用相同的寄存器或者內存，即(_old)的存儲在和0號操作數一樣的地方。在cmpxchg中，就是說_old和__ret使用一樣的寄存器，而__ret使用的寄存器是eax，所以_old也用eax。

明白了這些，再來看cmpxchgl，在Intel開發文檔上說：

0F B1/r        CMPXCHG r/m32, r32           MR Valid Valid*          Compare EAX with r/m32. If equal, ZF is set
                                                                                                     and r32 is loaded into r/m32. Else, clear ZF
                                                                                                     and load r/m32 into EAX.

翻譯一下：

比較eax和目的操作數(第一個操作數)的值，如果相同，ZF標誌被設置，同時源操作數(第二個操作)的值被寫到目的操作數，否則，清ZF標誌，並且把目的操作數的值寫回eax。

好了，把上面這句話套在cmpxchg上就是：

比較_old和(*__ptr)的值，如果相同，ZF標誌被設置，同時_new的值被寫到(*__ptr)，否則，清ZF標誌，並且把(*__ptr)的值寫回_old。

很明顯，符合我們對cmpxchg的理解。

另：Intel開發手冊上說lock就是讓CPU排他地使用內存。

原文地址：

http://blog.csdn.net/penngrove/article/details/44175387

Linux Kernel CMPXCHG函數分析

Markdown超詳細使用說明

無鎖編程(四) - CAS與ABA問題

【算法學堂】字符串基礎算法

無鎖編程(六) - seqlock(順序鎖)

無鎖編程(五) - RCU(Read-Copy-Update)

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結