編譯器對外部變量的處理

一直以來對鎖比較感興趣。因爲在多核編程中，鎖是一個可恨有可愛的東西。說它可恨，是因爲鎖的使用，降低了併發性，也就降低了性能。可愛之處呢，因爲鎖的使用有時是無法避免的。那麼如何實現一個高性能的鎖又是一個很有意思的問題。以後有機會，再跟大家交流一下鎖的實現部分。

今天是我在看spinlock的時候，突然想到的問題。這個問題不侷限於spinlock，下面的示例我使用mutex來代替。

#include <stdlib.h>

#include <stdio.h>

#include <pthread.h>

extern int counter;

extern pthread_mutex_t counter_mutex;

void add_counter(void)

{

pthread_mutex_lock(&counter_mutex);

++counter;

pthread_mutex_unlock(&counter_mutex);

}

counter是由counter_mutex保護的。在更新counter的時候，必須要先持有counter_mutex，這樣才能保證正確更新counter。另外鎖的實現中，一般需要有內存barrier的指令，來禁止CPU的亂序執行。如果沒有barrier的話，在CPU的指令執行過程中，counter的更新很可能發生在unlock之後。這些問題也不是今天的重點。

內存barrier指令只能保證CPU在barrier前的內存指令必須完成。但是如果在編譯器將counter放到了寄存器中怎麼辦？比如在持有counter_mutex之前，對counter有讀取的操作。那麼編譯器很可能會將counter在lock之前，就存到寄存器中。這樣，在持有lock之後，counter因爲之前已經讀取到寄存器中了，這個++counter是否會直接對寄存器操作呢？如下面的代碼：

#include <stdlib.h>

#include <stdio.h>

#include <pthread.h>

extern int counter;

extern pthread_mutex_t counter_mutex;

#define ASM_SEPERATOR __asm__ __volatile__ ("nop")

void add_counter(void)

{

/* 下面的代碼對counter進行了讀取操作，那麼counter會被放入到寄存器中 */

int t = counter;

ASM_SEPERATOR;

printf("counter is %d\n", counter);

ASM_SEPERATOR;

/* 在前面的代碼中，counter已經被放入到了寄存器中，那麼下面的更新是否會直接更新該寄存器呢？ */

pthread_mutex_lock(&counter_mutex);

ASM_SEPERATOR;

++counter;

ASM_SEPERATOR;

pthread_mutex_unlock(&counter_mutex);

}

當想到這個問題的時候，心裏升起一陣寒意。因爲這樣的代碼肯定會存在於我們的工程中。在持有鎖之前，對保護的資源進行讀取的動作，這是一個很平常的行爲。如果前面的讀取動作導致該資源被放到寄存器中，豈不是導致鎖失效了？難道在這種情況下，即使是讀取動作也要加鎖保護嗎？如果假設爲真的話，那麼有bug的代碼就太多了，那麼早就報出很多問題了。所以這種使用方法應該是沒有問題的。

還是讓我們看一下反彙編吧：

00000000 :

extern pthread_mutex_t counter_mutex;

#define ASM_SEPERATOR __asm__ __volatile__ ("nop")

void add_counter(void)

{

0: 55 push %ebp

1: 89 e5 mov %esp,%ebp

3: 83 ec 28 sub $0x28,%esp

int t = counter;

6: a1 00 00 00 00 mov 0x0,%eax

b: 89 45 f4 mov %eax,-0xc(%ebp)

ASM_SEPERATOR;

e: 90 nop

printf("counter is %d\n", counter);

f: 8b 15 00 00 00 00 mov 0x0,%edx

15: b8 00 00 00 00 mov $0x0,%eax

1a: 89 54 24 04 mov %edx,0x4(%esp)

1e: 89 04 24 mov %eax,(%esp)

21: e8 fc ff ff ff call 22

ASM_SEPERATOR;

26: 90 nop

27: c7 04 24 00 00 00 00 movl $0x0,(%esp)

2e: e8 fc ff ff ff call 2f

pthread_mutex_lock(&counter_mutex);

33: 90 nop

ASM_SEPERATOR;

34: a1 00 00 00 00 mov 0x0,%eax

39: 83 c0 01 add $0x1,%eax

3c: a3 00 00 00 00 mov %eax,0x0

++counter;

41: 90 nop

ASM_SEPERATOR;

42: c7 04 24 00 00 00 00 movl $0x0,(%esp)

49: e8 fc ff ff ff call 4a

pthread_mutex_unlock(&counter_mutex);

4e: c9 leave

4f: c3 ret

紅色部分的代碼是將counter賦給t，這時counter已經被存入到eax中。而藍色的代碼是++counter。這裏顯示在counter進行自加的時候，是重新讀取counter到寄存器中，然後再做自加，並沒有直接利用前面的寄存器eax。

上面的彙編是沒有使用優化選項的輸出，下面是使用-O2優化的彙編結果：

Disassembly of section .text:

00000000 :

0: 55 push %ebp

1: 89 e5 mov %esp,%ebp

3: 83 ec 18 sub $0x18,%esp

6: 90 nop

7: a1 00 00 00 00 mov 0x0,%eax

c: c7 04 24 00 00 00 00 movl $0x0,(%esp)

13: 89 44 24 04 mov %eax,0x4(%esp)

17: e8 fc ff ff ff call 18

1c: 90 nop

1d: c7 04 24 00 00 00 00 movl $0x0,(%esp)

24: e8 fc ff ff ff call 25

29: 90 nop

2a: 83 05 00 00 00 00 01 addl $0x1,0x0

31: 90 nop

32: c7 04 24 00 00 00 00 movl $0x0,(%esp)

39: e8 fc ff ff ff call 3a

3e: c9 leave

3f: c3 ret

在t=counter時，依然是將counter放入到eax中，然後在將eax的值賦給t。而++counter的時候，乾脆不用寄存器了，直接對內存進行加1的操作（x86支持對內存的加法操作）。

從彙編的結果上看，我之前的想到的問題有些杞人憂天了。即使counter在lock之前被存入某個寄存器，在自加的時候，仍然會重新讀取，而不是直接使用那個寄存器。那麼爲什麼編譯器會產生這樣的結果呢？因爲使用了lock？比如lock的API中會有某個指令導致編譯器生成這樣的代碼？我認爲不可能。因爲這樣對編譯器提出了非常過分的要求。因爲編譯的時候，編譯器根本不會去檢查調用的函數。在本例中，這個函數是pthread庫函數，但是很多時候，這個函數甚至可以不存在。所以這個猜想肯定不對的。那麼只有一個合理的解釋了，因爲counter是一個外部變量（非本函數內部定義）。編譯器會假設該變量可能隨時都會被外部更改，所以在任何時候，都需要重新讀取到寄存器再使用。

這次我們乾脆不是用全局變量，而是使用傳入的參數：

#include <stdlib.h>

#include <stdio.h>

#include <pthread.h>

#define ASM_SEPERATOR __asm__ __volatile__ ("nop")

void add_counter(int *counter)

{

int t = *counter;

ASM_SEPERATOR;

printf("counter is %d %d\n", t, *counter);

ASM_SEPERATOR;

++*counter;

ASM_SEPERATOR;

printf("counter is %d\n", *counter);

}

反彙編輸出：

00000000 :

0: 55 push %ebp

1: 89 e5 mov %esp,%ebp

3: 53 push %ebx

4: 83 ec 14 sub $0x14,%esp

7: 8b 5d 08 mov 0x8(%ebp),%ebx

a: 8b 03 mov (%ebx),%eax

c: 90 nop

d: 89 44 24 08 mov %eax,0x8(%esp)

11: 89 44 24 04 mov %eax,0x4(%esp)

15: c7 04 24 00 00 00 00 movl $0x0,(%esp)

1c: e8 fc ff ff ff call 1d

21: 90 nop

22: 90 nop

23: 8b 03 mov (%ebx),%eax

25: 83 c0 01 add $0x1,%eax

28: 89 03 mov %eax,(%ebx)

2a: 90 nop

2b: 89 44 24 04 mov %eax,0x4(%esp)

2f: c7 04 24 12 00 00 00 movl $0x12,(%esp)

36: e8 fc ff ff ff call 37

3b: 83 c4 14 add $0x14,%esp

3e: 5b pop %ebx

3f: 5d pop %ebp

40: c3 ret

藍色部分仍然是自加的代碼++*counter，和全局變量的counter一樣，都是需要將外部變量的值讀入到寄存器中，然後進行運算，再存入到寄存器中。

至此，我們得出結論，編譯器在處理外部變量的時候，每次都需要重新讀取到寄存器中，然後再使用。

編譯器對外部變量的處理

c語言中的#號和##號的作用

有符號數和無符號數的移位區別

Linux的RPM包相關命令

DB2中的SMS和DMS

Python核心編程

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結