【並行計算】Linus：爲何對象引用計數必須是原子的

Linus大神又在rant了！這次的吐槽對象是時下很火熱的並行技術(parellism)，並直截了當地表示並行計算是浪費所有人時間(“The whole “let’s parallelize” thing is a huge waste of everybody’s time.”)。大致意思是說亂序性能快、提高緩存容量、降功耗。當然筆者不打算正面討論並行的是是非非（過於宏偉的主題），因爲Linus在另一則帖子中舉了對象引用計數(reference counting)的例子來說明並行的複雜性。

在Linus回覆之前有人指出對象需要鎖機制的情況下，引用計數的原子性問題：

Since it is being accessed in a multi-threaded way, via multiple access paths, generally it needs its own mutex — otherwise, reference counting would not be required to be atomic and a lock of a higher-level object would suffice.

由於（對象）通過多線程方式及多種獲取渠道，一般而言它需要自身維護一個互斥鎖——否則引用計數就不要求是原子的，一個更高層次的對象鎖足矣。

而Linus不那麼認爲：

The problem with reference counts is that you often need to take them *before* you take the lock that protects the object data.

引用計數的問題在於你經常需要在對象數據上鎖保護之前完成它。

The thing is, you have two different cases:

問題有兩種情況：

- object *reference* 對象引用

- object data 對象數據

and they have completely different locking.

它們鎖機制是完全不一樣的。

Object data locking is generally per-object. Well, unless you don’t have huge scalability issues, in which case you may have some external bigger lock (extreme case: one single global lock).

對象數據保護一般是一個對象擁有一個鎖，假設你沒有海量擴展性問題，不然你需要一些外部大一點的鎖（極端的例子，一個對象一個全局鎖）。

But object *referencing* is mostly about finding the object (and removing/freeing it). Is it on a hash chain? Is it in a tree? Linked list? When the reference count goes down to zero, it’s not the object data that you need to protect (the object is not used by anything else, so there’s nothing to protect!), it’s the ways to find the object you need to protect.

但對象引用主要關於對象的尋找（移除或釋放），它是否在哈希鏈，一棵樹或者鏈表上。當對象引用計數降爲零，你要保護的不是對象數據，因爲對象沒有在其它地方使用，你要保護的是對象的尋找操作。

And the lock for the lookup operation cannot be in the object, because - by definition - you don’t know what the object is! You’re trying to look it up, after all.

而且查詢操作的鎖不可能在對象內部，因爲根據定義，你還不知道這是什麼對象，你在嘗試尋找它。

So generally you have a lock that protects the lookup operation some way, and the reference count needs to be atomic with respect to that lock.

因此一般你要對查詢操作上鎖，而且引用計數相對那個鎖來說是原子的（譯者注：查詢鎖不是引用計數所在的對象所有，不能保護對象引用計數，後面會解釋爲何引用計數變更時其所在對象不能上鎖）。

And yes, that lock may well be sufficient, and now you’re back to non-atomic reference counts. But you usually don’t have just one way to look things up: you might have pointers from other objects (and that pointer is protected by the object locking of the other object), but there may be multiple such objects that point to this (which is why you have a reference count in the first place!)

當然這個鎖是充分有效的，現在假設引用計數是非原子的，但你常常不僅僅使用一種方式來查詢：你可能擁有其它對象的指針（這個指針又被其它對象的對象鎖給保護起來），但同時還會有多個對象指向它（這就是爲何你第一時間需要引用計數的理由）。

See what happens? There is no longer one single lock for lookup. Imagine walking a graph of objects, where objects have pointers to each other. Each pointer implies a reference to an object, but as you walk the graph, you have to release the lock from the source object, so you have to take a new reference to the object you are now entering.

看看會發生什麼？查詢不止存在一個鎖保護。你可以想象走過一張對象流程圖，其中對象存在指向其它對象的指針，每個指針暗含了一次對象引用，但當你走過這個流程圖，你必須釋放源對象的鎖，而你進入新對象時又必須增加一次引用。

And in order to avoid deadlocks, you can not in the general case take the lock of the new object first - you have to release the lock on the source object, because otherwise (in a complex graph), how do you avoid simple ABBA deadlock?

而且爲了避免死鎖，你一般不能立即對新對象上鎖——你必須釋放源對象的鎖，否則在一個複雜流程圖裏，你如何避免ABBA死鎖（譯者注：假設兩個線程，一個是A->B，另一個B->;A，當線程一給A上鎖，線程二給B上鎖，此時兩者誰也無法釋放對方的鎖）？

So atomic reference counts fix that. They work because when you move from object A to object B, you can do this:

原子引用計數修正了這一點，當你從對象A到對象B，你會這樣做：

(a) you have a reference count to A, and you can lock A

對象A增加一次引用計數，並上鎖。

(b) once object A is locked, the pointer from A to B is stable, and you know you have a reference to B (because of that pointer from A to B)

對象A一旦上鎖，A指向B的指針就是穩定的，於是你知道你引用了對象B。

但你不能在對象A上鎖期間給B上鎖（ABBA死鎖）。

(d) increment the atomic reference count on B

對象B增加一次原子引用計數。

(e) now you can drop the lock on A (you’re “exiting” A)

現在你可以扔掉對象A的鎖（退出對象A）。

(f) your reference count means that B cannot go away from under you despite unlocking A, so now you can lock B.

對象B的原子引用計數意味着即使給A解鎖期間，B也不會失聯，現在你可以給B上鎖。

See? Atomic reference counts make this kind of situation possible. Yes, you want to avoid the overhead if at all possible (for example, maybe you have a strict ordering of objects, so you know you can walk from A to B, and never walk from B to A, so there is no ABBA deadlock, and you can just lock B while still holding the lock on A).

看見了嗎？原子引用計數使這種情況成爲可能。是的，你想盡一切辦法避免這種代價，比如，你也許把對象寫成嚴格順序的，這樣你可以從A到B，絕不會從B到A，如此就不存在ABBA死鎖了，你也就可以在A上鎖期間給B上鎖了。

But if you don’t have some kind of forced ordering, and if you have multiple ways to reach an object (and again - why have reference counts in the first place if that isn’t true!) then atomic reference counts really are the simple and sane answer.

但如果你無法做到這種強迫序列，如果你有多種方式接觸一個對象（再一次強調，這是第一時間使用引用計數的理由），這樣，原子引用計數就是簡單又理智的答案。

If you think atomic refcounts are unnecessary, that’s a big flag that you don’t actually understand the complexities of locking.

如果你認爲原子引用計數是不必要的，這就大大說明你實際上不了解鎖機制的複雜性。

Trust me, concurrency is hard. There’s a reason all the examples of “look how easy it is to parallelize things” tend to use simple arrays and don’t ever have allocations or freeing of the objects.

相信我，併發設計是困難的。所有關於“並行化如此容易”的理由都傾向於使用簡單數組操作做例子，甚至不包含對象的分配和釋放。

People who think that the future is highly parallel are invariably completely unaware of just how hard concurrency really is. They’ve seen Linpack, they’ve seen all those wonderful examples of sorting an array in parallel, they’ve seen all these things that have absolutely no actual real complexity - and often very limited real usefulness.

那些認爲未來是高度並行化的人一成不變地完全沒有意識到併發設計是多麼困難。他們只見過Linpack，他們只見過並行技術中關於數組排序的一切精妙例子，他們只見過一切絕不算真正複雜的事物——對真正的用處經常是非常有限的。

（譯者注：當然，我無意借大神之口把技術宗教化。實際上Linus又在另一篇帖子中綜合了對並行的評價。）

Oh, I agree. My example was the simple case. The really complex cases are much worse.

哦，我同意。我的例子還算簡單，真正複雜的用例更糟糕。

I seriously don’t believe that the future is parallel. People who think you can solve it with compilers or programming languages (or better programmers) are so far out to lunch that it’s not even funny.

我嚴重不相信未來是並行的。有人認爲你可以通過編譯器，編程語言或者更好的程序員來解決問題，他們目前都是神志不清，沒意識到這一點都不有趣。

Parallelism works well in simplified cases with fairly clear interfaces and models. You find parallelism in servers with independent queries, in HPC, in kernels, in databases. And even there, people work really hard to make it work at all, and tend to expressly limit their models to be more amenable to it (eg databases do some things much better than others, so DB admins make sure that they lay out their data in order to cater to the limitations).

並行計算可以在簡化的用例以及具備清晰的接口和模型上正常工作。你發現並行在服務器上獨立查詢裏，在高性能計算(High-performance computing)裏，在內核裏，在數據庫裏。即使如此，人們還得花很大力氣才能使它工作，並且還要明確限制他們的模型來盡更多義務（例如數據庫要想做得更好，數據庫管理員得確保數據得到合理安排來迎合侷限性）。

Of course, other programming models can work. Neural networks are inherently very parallel indeed. And you don’t need smarter programmers to program them either..

當然，其它編程模型倒能派上用場，神經網絡(neural networking)天生就是非常並行化的，你不需要更聰明的程序員爲之寫代碼。

參考資料

Real World Technologies：Linus常去“灌水”的一個論壇，討論未來機器架構（看名字就知道Linus技術偏好，及其之前對虛擬化技術(virtualization)的吐槽）
多線程程序中操作的原子性：解釋爲什麼i++不是原子操作
Concurrency Is Not Parallelism：Go語言之父Rob Pike幻燈片解釋“併發”與“並行”概念上的區別