Lock-Free Programming

參考:
http://preshing.com/20120612/an-introduction-to-lock-free-programming/
http://blog.csdn.net/lifesider/article/details/6582338
http://blog.poxiao.me/p/spinlock-implementation-in-cpp11/
http://www.infoq.com/cn/news/2014/11/cpp-lock-free-programming
http://people.csail.mit.edu/bushl2/rpi/project_web/page5.html
http://15418.courses.cs.cmu.edu/spring2013/article/46
http://www.cl.cam.ac.uk/research/srg/netos/projects/archive/lock-free/
http://www.liblfds.org/
http://www.cppblog.com/woaidongmao/archive/2009/05/02/81663.html
http://www.sogou.com/labs/report_source/4-2.pdf
https://www.infoq.com/news/2014/10/cpp-lock-free-programming
http://coolshell.cn/articles/8239.html 中文coolshell

先收集了一些文章,等學習完再總結一下。

一些基本概念

deadlock & livelock

https://en.wikipedia.org/wiki/Deadlock
https://en.wikipedia.org/wiki/Deadlock#Livelock

這裏寫圖片描述

兩個進程都需要獲取資源後才能繼續運行。進程P1需要資源R1,並且P1已經擁有了資源R2;進程P2需要資源R2,並且P2已經擁有了資源R1。這就造成了死鎖,兩個進程都無法繼續運行。

而活鎖:指進程1可以使用資源,但它讓進程2先使用資源;進程2可以使用資源,但它讓進程1先使用資源,於是兩者一直謙讓,都無法使用資源。

這裏還有一個形象的比喻

http://blog.csdn.net/java2000_net/article/details/4061983

RMW(ReadModifyWrite)

C++11的 std::atomic<int>::fetch_add 就是RMW操作。但是要注意,C++11並不保證在每個平臺上都是lockfree方法實現的。所以最好使用std::atomic<>::is_lock_free()來確定你使用的平臺是不是lockfree的RMW。

atomic RMWs are a necessary part of lock-free programming even on single-processor systems. Without atomicity, a thread could be interrupted halfway through the transaction, possibly leading to an inconsistent state.

原子的RMW是lock-free programming的必備條件。

CAS (Compare-And-Swap Loops)

CAS可能是討論最廣泛的RMW操作。WIN32平臺提供了一組函數來進行CAS操作,如_InterlockedCompareExchange。通常將CAS操作放在一個循環中,不斷地嘗試直到成功,這種方法通常包含3步。例如下面的push操作:

void LockFreeQueue::push(Node* newHead)
{
    for (;;)
    {
        // 1. Copy a shared variable (m_Head) to a local.
        Node* oldHead = m_Head;

        // 2. Do some speculative work, not yet visible to other threads.
        newHead->next = oldHead;

        // 3. Next, attempt to publish our changes to the shared variable.
        // If the shared variable hasn't changed, the CAS succeeds and we return.
        // Otherwise, repeat.
        if (_InterlockedCompareExchange(&m_Head, newHead, oldHead) == oldHead)
            return;
    }
}

第一步,將線程間共享的數據(上例是鏈表的頭head)拷貝到局部。
第二步,做一些嘗試性的改動(將新的head指向局部舊head)。但是此時對其他線程不可見。
第三步,將所做的改動更新到共享的數據,如果共享的數據沒有變化,那麼CAS成功我們的函數返回;否則進行下次循環。

當寫類似的CAS loop的時候要注意避免ABA問題。

ABA problem

https://en.wikipedia.org/wiki/ABA_problem

wikipedia講解的非常清楚,我這裏只摘錄一下

Example : John is waiting in his car at a red traffic light with his children. His children start fighting with each other while waiting, and he leans back to scold them. Once their fighting stops, John checks the light again and notices that it’s still red. However, while he was focusing on his children, the light had changed to green, and then back again. John doesn’t think the light ever changed, but the people waiting behind him are very mad and honking their horns now.
In this scenario, the ‘A’ state is when the traffic light is red, and the ‘B’ state is when it’s green. Originally, the traffic light starts in ‘A’ state. If John looked at the light he would have noticed the change. But he only looked when the light was red (state ‘A’). There is no way to tell if the light turned green during the time of no observation.

這是wikipedia中列舉的一個比較形象的例子, John和他的孩子們在車子裏等紅燈,這時孩子開始打鬧起來,John就回頭告訴孩子們安靜下來,等孩子們安靜下來後,John再次檢查紅綠燈時發現還是紅燈.但是實際上John在回頭教育孩子的時候錯過了紅綠燈的變化, 即: 紅燈—>綠燈—>紅燈. 這就是ABA問題.

(但是這個例子ABA帶來的危害並不是很大,只要再等一會變成紅燈了就沒問題了. )

ABA帶來問題的例子: 一個lockfree的stack

  /* Naive lock-free stack which suffers from ABA problem.*/
  class Stack {
    std::atomic<Obj*> top_ptr;
    //
    // Pops the top object and returns a pointer to it.
    //
    Obj* Pop() {
      while(1) {
        Obj* ret_ptr = top_ptr;
        if (!ret_ptr) return nullptr;
        // For simplicity, suppose that we can ensure that this dereference is safe
        // (i.e., that no other thread has popped the stack in the meantime).
        Obj* next_ptr = ret_ptr->next;
        // If the top node is still ret, then assume no one has changed the stack.
        // (That statement is not always true because of the ABA problem)
        // Atomically replace top with next.
        if (top_ptr.compare_exchange_weak(ret_ptr, next_ptr)) {
          return ret_ptr;
        }
        // The stack has changed, start over.
      }
    }
    //
    // Pushes the object specified by obj_ptr to stack.
    //
    void Push(Obj* obj_ptr) {
      while(1) {
        Obj* next_ptr = top_ptr;
        obj_ptr->next = next_ptr;
        // If the top node is still next, then assume no one has changed the stack.
        // (That statement is not always true because of the ABA problem)
        // Atomically replace top with obj.
        if (top_ptr.compare_exchange_weak(next_ptr, obj_ptr)) {
          return;
        }
        // The stack has changed, start over.
      }
    }
  };

如果明白了CAS,那麼理解上面的pop和push應該沒什麼問題.很顯然,在調用compare_exchange_weak之前很有可能發生ABA問題. 例如:

假設棧中的元素從棧頂到棧底爲top → A → B → C
線程1開始執行pop操作:

ret = A;
next = B;

接着,線程1在執行compare_exchange_weak之前被中斷了…

  { // 線程2開始執行pop:
    ret = A;
    next = B;
    compare_exchange_weak(A, B)  // 成功, top = B
    return A;
  } // 現在棧爲 top → B → C
  { // 線程2再次執行pop:
    ret = B;
    next = C;
    compare_exchange_weak(B, C)  // 成功, top = C
    return B;
  } // 現在棧爲 top → C
  delete B; // 刪除了B
  { // 線程2又將A放回了棧:
    A->next = C;
    compare_exchange_weak(C, A)  // 成功, top = A
  }

現在棧爲: top → A → C

接着線程1中斷返回了,但是線程1還是認爲棧沒有變化,然後要執行:

compare_exchange_weak(A, B)

那麼問題來了,線程1並不知道B已經被刪除了,它會把棧改成:top → B → C

是的,它把一個空懸指針放到了棧中.那麼當再次pop並使用該node時將發生未知的錯誤.

std::atomic::compare_exchange_weak()

https://www.codeproject.com/articles/808305/understand-std-atomic-compare-exchange-weak-in-cpl

bool compare_exchange_weak (T& expected, T desired, ..);
bool compare_exchange_strong (T& expected, T desired, ..);

當預期的值與對象真正持有的值相等,那麼它將返回成功並把所需的值寫入內存。否則,預期值會被內存中實際的值覆蓋更新,並返回失敗。這在絕大多數情況下都是正確的,除了一個列外情況:CAS的weak版本即使是在內存的值與期望值相等的情況,也可能返回失敗。在這種情況下,所需的值不會同步到內存當中。即僞失敗(Spurious Failiure)

發生僞失敗是因爲,在一些平臺上面,CAS操作是用一個指令序列來實現的,不同與x86上的一個指令。在這些平臺上,切換上下文,另外一個線程加載了同一個內存地址,種種情況都會導致一開始的CAS操作失敗。稱它是假的,是因爲CAS失敗並不是因爲存儲的值與期望的值不相等,而是時間調度的問題。CAS的strong版本的行爲不同,它把這個問題包裹在其中,並防止了這種僞失敗的發生。

由於僞失敗的存在, weak版本通常在循環中使用.

C++11 § 29.6.5
A consequence of spurious failure is that nearly all uses of weak compare-and-exchange will be in a loop.

發佈了195 篇原創文章 · 獲贊 64 · 訪問量 50萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章