前言

大家都知道，在C++11標準中，要求局部靜態變量初始化具有線程安全性，所以我們可以很容易實現一個線程安全的單例類：

class Foo
{
public:
    static Foo *getInstance()
    {
        static Foo s_instance;
        return &s_instance;
    }
private:
    Foo() {}
};

在C++標準中，是這樣描述的（在標準草案的6.7節中）：

such a variable is initialized the first time control passes through its declaration; such a variable is considered initialized upon the completion of its initialization. If the initialization exits by throwing an exception, the initialization is not complete, so it will be tried again the next time control enters the declaration. If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization. If control re-enters the declaration recursively while the variable is being initialized, the behavior is undefined.

分析

標準關於局部靜態變量初始化，有這麼幾點要求：

變量在代碼第一次執行到變量聲明的地方時初始化。
初始化過程中發生異常的話視爲未完成初始化，未完成初始化的話，需要下次有代碼執行到相同位置時再次初始化。
在當前線程執行到需要初始化變量的地方時，如果有其他線程正在初始化該變量，則阻塞當前線程，直到初始化完成爲止。
如果初始化過程中發生了對初始化的遞歸調用，則視爲未定義行爲。

關於第4點，如果不明白，可以參考以下代碼：

class Bar
{
public:
    static Bar *getInstance()
    {
        static Bar s_instance;
        return &s_instance;
    }
private:
    Bar()
    {
        getInstance();
    }
};

GCC的實現

以GCC 7.3.0版本爲例，我們來分析GCC是如何實現標準的。

Foo::getInstance()

使用GCC編譯後，我們使用gdb將文章開頭的Foo::getInstance()反彙編：

Dump of assembler code for function Foo::getInstance():
   0x00005555555546ea <+0>:     push   %rbp
   0x00005555555546eb <+1>:     mov    %rsp,%rbp
=> 0x00005555555546ee <+4>:     movzbl 0x20092b(%rip),%eax        # 0x555555755020 <_ZGVZN3Foo11getInstanceEvE10s_instance>
   0x00005555555546f5 <+11>:    test   %al,%al
   0x00005555555546f7 <+13>:    sete   %al
   0x00005555555546fa <+16>:    test   %al,%al
   0x00005555555546fc <+18>:    je     0x55555555472b <Foo::getInstance()+65>
   0x00005555555546fe <+20>:    lea    0x20091b(%rip),%rdi        # 0x555555755020 <_ZGVZN3Foo11getInstanceEvE10s_instance>
   0x0000555555554705 <+27>:    callq  0x5555555545b0 <__cxa_guard_acquire@plt>
   0x000055555555470a <+32>:    test   %eax,%eax
   0x000055555555470c <+34>:    setne  %al
   0x000055555555470f <+37>:    test   %al,%al
   0x0000555555554711 <+39>:    je     0x55555555472b <Foo::getInstance()+65>
   0x0000555555554713 <+41>:    lea    0x2008fe(%rip),%rdi        # 0x555555755018 <_ZZN3Foo11getInstanceEvE10s_instance>
   0x000055555555471a <+48>:    callq  0x555555554734 <Foo::Foo()>
   0x000055555555471f <+53>:    lea    0x2008fa(%rip),%rdi        # 0x555555755020 <_ZGVZN3Foo11getInstanceEvE10s_instance>
   0x0000555555554726 <+60>:    callq  0x5555555545a0 <__cxa_guard_release@plt>
   0x000055555555472b <+65>:    lea    0x2008e6(%rip),%rax        # 0x555555755018 <_ZZN3Foo11getInstanceEvE10s_instance>
   0x0000555555554732 <+72>:    pop    %rbp
   0x0000555555554733 <+73>:    retq   
End of assembler dump.

在+4、+20、+53出現的_ZGVZN3Foo11getInstanceEvE10s_instance使用c++filt分析爲guard variable for Foo::getInstance()::s_instance，而+41、+65位置出現的_ZZN3Foo11getInstanceEvE10s_instance則爲Foo::getInstance()::s_instance。後者是s_instance這個局部靜態變量，前者從名字看就知道是個guard標誌變量，用來指示局部靜態變量的初始化狀態。

+4 ~ +18

測試guard變量的第一個字節，如果爲0，代表s_instance未初始化，進入+27；否則代表s_instance已初始化，進入+65。

+20 ～ +27

以guard變量地址作爲參數，執行__cxa_guard_acquire函數。

+32 ～ +39

測試返回值，如果爲0，代表s_instance已初始化，進入+65；否則代表s_instance未初始化，進入+41。

+41 ～ +48

初始化s_instance

+53 ～ +60

以guard變量地址作爲參數，執行__cxa_guard_release函數。

+65 ～ +73

返回s_instance地址

__cxa_guard_acquire

我們來看看__cxa_guard_acquire這個函數具體做了什麼，該函數代碼位於gcc-7-7.3.0/gcc-7.3.0/libstdc++-v3/libsupc++/guard.cc。由於這個函數針對不同平臺做了不同的實現，有些我們不需要的代碼，以我機器的設置，支持線程和futex系統調用，所以刪除了一些不相關的代碼：

int __cxa_guard_acquire (__guard *g)
{
    // If the target can reorder loads, we need to insert a read memory
    // barrier so that accesses to the guarded variable happen after the
    // guard test.

    // 1
    if (_GLIBCXX_GUARD_TEST_AND_ACQUIRE (g))
        return 0;

    // If __atomic_* and futex syscall are supported, don't use any global
    // mutex.

    // 2
    if (__gthread_active_p ())
    {
        int *gi = (int *) (void *) g;

        // 3
        const int guard_bit = _GLIBCXX_GUARD_BIT;
        const int pending_bit = _GLIBCXX_GUARD_PENDING_BIT;
        const int waiting_bit = _GLIBCXX_GUARD_WAITING_BIT;

        while (1)
        {
            // 4
            int expected(0);
            if (__atomic_compare_exchange_n(gi, &expected, pending_bit, false,
                                            __ATOMIC_ACQ_REL,
                                            __ATOMIC_ACQUIRE))
            {
                // This thread should do the initialization.
                return 1;
            }

            // 5
            if (expected == guard_bit)
            {
                // Already initialized.
                return 0;
            }

            // 6
            if (expected == pending_bit)
            {
                // Use acquire here.

                // 7
                int newv = expected | waiting_bit;

                // 8
                if (!__atomic_compare_exchange_n(gi, &expected, newv, false,
                                                 __ATOMIC_ACQ_REL,
                                                 __ATOMIC_ACQUIRE))
                {
                    // 9
                    if (expected == guard_bit)
                    {
                        // Make a thread that failed to set the
                        // waiting bit exit the function earlier,
                        // if it detects that another thread has
                        // successfully finished initialising.
                        return 0;
                    }

                    // 10
                    if (expected == 0)
                        continue;
                }

                // 11
                expected = newv;
            }

            // 12
            syscall (SYS_futex, gi, _GLIBCXX_FUTEX_WAIT, expected, 0);
        }
    }

    return acquire (g);
}

首先檢測guard變量，guard變量等於1的話，直接返回0，代表s_instance已初始化，不需要再次初始化。
檢測是否爲多線程環境，如果沒有多線程的話，也就沒有必要去做額外工作來保證線程安全了。
guard_bit表示s_instance已經初始化成功；pending_bit表示s_instance正在初始化；waiting_bit表示有其他線程正在等待s_instance的初始化。
使用一個原子操作來檢測guard變量是否爲0，如果爲0，則由當前線程初始化s_instance，把pending_bit寫入guard變量，返回1。如果不爲0，則將guard當前值寫入expected。
檢測expected值是否爲guard_bit，如果是，則s_instance已初始化完成，不再需要初始化，返回0。
檢測expected值是否爲pending_bit，如果是，說明s_instance正在初始化，且沒有其他線程等待初始化。
將newv變量設置爲pending_bit | waiting_bit，表示s_instance正在初始化且有線程正在等待初始化。
使用一個原子操作來檢測guard變量是否爲pending_bit，如果不是，說明有其他線程修改了guard變量，需要做進一步檢測；如果是，說明沒有其他線程修改guard變量，則將pending_bit | waiting_bit寫入guard變量。
如果expected等於guard_bit，說明s_instance被初始化成功，不需要再初始化，返回0。
如果expected等於0，說明s_instance初始化失敗，回到4重新開始檢測。
如果在8中沒有其他線程修改過guard變量，將expected設置爲pending_bit | waiting_bit，表示s_instance正在初始化且有線程（也就是當前線程）正在等待初始化。
如果在6處沒有進入if分支，說明expected等於pending_bit | waiting_bit，如果進入了if分支，由11可得，此時expected也被修改爲了pending_bit | waiting_bit。總之，此時s_instance正在初始化且有線程正在等待初始化。利用futex系統調用，再次檢測guard變量是否發生了變化，如果發生了變化，回到4重新開始檢測；如果沒有發生變化，仍然等於pending_bit | waiting_bit，則掛起當前線程。

總之，__cxa_guard_acquire要麼返回0要麼返回1，用來指示s_instance已初始化或未初始化。__cxa_guard_acquire可能會導致當前線程掛起，這發生在s_instance正在初始化的時候。

__cxa_guard_release

由於__cxa_guard_acquire可能導致當前線程掛起，因此需要在s_instance初始化完成後使用將__cxa_guard_release線程喚醒。

void __cxa_guard_release (__guard *g) throw ()
{
    // If __atomic_* and futex syscall are supported, don't use any global
    // mutex.

    // 1
    if (__gthread_active_p ())
    {
        int *gi = (int *) (void *) g;
        const int guard_bit = _GLIBCXX_GUARD_BIT;
        const int waiting_bit = _GLIBCXX_GUARD_WAITING_BIT;

        // 2
        int old = __atomic_exchange_n (gi, guard_bit, __ATOMIC_ACQ_REL);

        // 3
        if ((old & waiting_bit) != 0)
            syscall (SYS_futex, gi, _GLIBCXX_FUTEX_WAKE, INT_MAX);
        return;
    }

    set_init_in_progress_flag(g, 0);
    _GLIBCXX_GUARD_SET_AND_RELEASE (g);
}

檢測是否爲多線程環境
使用原子操作將guard變量置爲guard_bit，同時獲取guard變量原始值。
如果guard變量原始值包含waiting_bit，說明有線程掛起（或將要調用futex欲使線程掛起），調用futex喚醒掛起的進程。

__cxa_guard_abort

由於s_instance可能初始化失敗（本例中並未體現），因此還有一個__cxa_guard_abort函數。

void __cxa_guard_abort (__guard *g) throw ()
{
    // If __atomic_* and futex syscall are supported, don't use any global
    // mutex.
    if (__gthread_active_p ())
    {
        int *gi = (int *) (void *) g;
        const int waiting_bit = _GLIBCXX_GUARD_WAITING_BIT;
        int old = __atomic_exchange_n (gi, 0, __ATOMIC_ACQ_REL);

        if ((old & waiting_bit) != 0)
            syscall (SYS_futex, gi, _GLIBCXX_FUTEX_WAKE, INT_MAX);
        return;
    }

    set_init_in_progress_flag(g, 0);
}

與__cxa_guard_release基本一致，不同的地方在於會將guard變量置0。

遞歸初始化調用

由於在C++11標準中，初始化如果發生了遞歸是未定義行爲，所以GCC 7.3.0針對是否爲多線程環境做了不同的處理。如果是多線程環境，不進行額外處理，會發生死鎖；如果是單線程環境，則會拋異常。

// acquire() is a helper function used to acquire guard if thread support is
// not compiled in or is compiled in but not enabled at run-time.
static int
acquire(__guard *g)
{
    // Quit if the object is already initialized.
    if (_GLIBCXX_GUARD_TEST(g))
        return 0;

    if (init_in_progress_flag(g))
        throw_recursive_init_exception();

    set_init_in_progress_flag(g, 1);
    return 1;
}

總結

看到了GCC如此複雜的實現，我的個人感想是還是不要自己造輪子來保證單例類的線程安全了，想要做到和GCC一樣的高效還是比較難的，利用C++11標準的帶來的便利就挺好。

C++11中靜態局部變量初始化的線程安全性

前言

分析

GCC的實現

Foo::getInstance()

+4 ~ +18

+20 ～ +27

+32 ～ +39

+41 ～ +48

+53 ～ +60

+65 ～ +73

__cxa_guard_acquire

__cxa_guard_release

__cxa_guard_abort

遞歸初始化調用

總結

Linux：利用內核日誌記錄系統啓動時產生的進程樹

QString和QByteArray

Linux C/C++調試之五：程序運行耗時的組成

利用libclang提取C++中enum值與名的映射

Linux C/C++調試之一：利用LD_PRELOAD機制監控程序IO操作

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結