java 內存模型FAQ

一、什麼是內存模型?

原文:http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html

在多核系統中,處理器一般有一層或者多層的緩存,這些的緩存通過加速數據訪問(因爲數據距離處理器更近)和降低共享內存在總線上的通訊(因爲本地緩存能夠滿足許多內存操作)來提高CPU性能。緩存能夠大大提升性能,但是它們也帶來了許多挑戰。例如,當兩個CPU同時檢查相同的內存地址時會發生什麼?在什麼樣的條件下它們會看到相同的值?

在處理器層面上,內存模型定義了一個充要條件,“讓當前的處理器可以看到其他處理器寫入到內存的數據”以及“其他處理器可以看到當前處理器寫入到內存的數據”。有些處理器有很強的內存模型(strong memory model),能夠讓所有的處理器在任何時候任何指定的內存地址上都可以看到完全相同的值。而另外一些處理器則有較弱的內存模型(weaker memory model),在這種處理器中,必須使用內存屏障(一種特殊的指令)來刷新本地處理器緩存並使本地處理器緩存無效,目的是爲了讓當前處理器能夠看到其他處理器的寫操作或者讓其他處理器能看到當前處理器的寫操作。這些內存屏障通常在lock和unlock操作的時候完成。內存屏障在高級語言中對程序員是不可見的。

在強內存模型下,有時候編寫程序可能會更容易,因爲減少了對內存屏障的依賴。但是即使在一些最強的內存模型下,內存屏障仍然是必須的。設置內存屏障往往與我們的直覺並不一致。近來處理器設計的趨勢更傾向於弱的內存模型,因爲弱內存模型削弱了緩存一致性,所以在多處理器平臺和更大容量的內存下可以實現更好的可伸縮性

“一個線程的寫操作對其他線程可見”這個問題是因爲編譯器對代碼進行重排序導致的。例如,只要代碼移動不會改變程序的語義,當編譯器認爲程序中移動一個寫操作到後面會更有效的時候,編譯器就會對代碼進行移動。如果編譯器推遲執行一個操作,其他線程可能在這個操作執行完之前都不會看到該操作的結果,這反映了緩存的影響。

此外,寫入內存的操作能夠被移動到程序裏更前的時候。在這種情況下,其他的線程在程序中可能看到一個比它實際發生更早的寫操作。所有的這些靈活性的設計是爲了通過給編譯器,運行時或硬件靈活性使其能在最佳順序的情況下來執行操作。在內存模型的限定之內,我們能夠獲取到更高的性能。

看下面代碼展示的一個簡單例子:

Class Reordering {
  int x = 0, y = 0;
  public void writer() {
    x = 1;
    y = 2;
  }

  public void reader() {
    int r1 = y;
    int r2 = x;
  }
}

讓我們看在兩個併發線程中執行這段代碼,讀取Y變量將會得到2這個值。因爲這個寫入比寫到X變量更晚一些,程序員可能認爲讀取X變量將肯定會得到1。但是,寫入操作可能被重排序過。如果重排序發生了,那麼,就能發生對Y變量的寫入操作,讀取兩個變量的操作緊隨其後,而且寫入到X這個操作能發生。程序的結果可能是r1變量的值是2,但是r2變量的值爲0。


Java內存模型描述了在多線程代碼中哪些行爲是合法的,以及線程如何通過內存進行交互。它描述了“程序中的變量“ 和 ”從內存或者寄存器獲取或存儲它們的底層細節”之間的關係。Java內存模型通過使用各種各樣的硬件和編譯器的優化來正確實現以上事情。

Java包含了幾個語言級別的關鍵字,包括:volatile, final以及synchronized,目的是爲了幫助程序員向編譯器描述一個程序的併發需求。Java內存模型定義了volatile和synchronized的行爲,更重要的是保證了同步的java程序在所有的處理器架構下面都能正確的運行。

二、其他語言,像C++,也有內存模型嗎?

大部分其他的語言,像C和C++,都沒有被設計成直接支持多線程。這些語言對於發生在編譯器和處理器平臺架構的重排序行爲的保護機制會嚴重的依賴於程序中所使用的線程庫(例如pthreads),編譯器,以及代碼所運行的平臺所提供的保障。

三、JSR133是什麼?

從1997年以來,人們不斷髮現Java語言規範的17章定義的Java內存模型中的一些嚴重的缺陷。這些缺陷會導致一些使人迷惑的行爲(例如final字段會被觀察到值的改變)和破壞編譯器常見的優化能力。

Java內存模型是一個雄心勃勃的計劃,它是編程語言規範第一次嘗試合併一個能夠在各種處理器架構中爲併發提供一致語義的內存模型。不過,定義一個既一致又直觀的內存模型遠比想象要更難。JSR133爲Java語言定義了一個新的內存模型,它修復了早期內存模型中的缺陷。爲了實現JSR133,final和volatile的語義需要重新定義。

完整的語義見:http://www.cs.umd.edu/users/pugh/java/memoryModel,但是正式的語義不是小心翼翼的,它是令人驚訝和清醒的,目的是讓人意識到一些看似簡單的概念(如同步)其實有多複雜。幸運的是,你不需要懂得這些正式語義的細節——JSR133的目的是創建一組正式語義,這些正式語義提供了volatile、synchronzied和final如何工作的直觀框架。

JSR 133的目標包含了:

  • 保留已經存在的安全保證(像類型安全)以及強化其他的安全保證。例如,變量值不能憑空創建:線程觀察到的每個變量的值必須是被其他線程合理的設置的。
  • 正確同步的程序的語義應該儘量簡單和直觀。
  • 應該定義未完成或者未正確同步的程序的語義,主要是爲了把潛在的安全危害降到最低。
  • 程序員應該能夠自信的推斷多線程程序如何同內存進行交互的。
  • 能夠在現在許多流行的硬件架構中設計正確以及高性能的JVM實現。
  • 應該能提供 安全地初始化的保證。如果一個對象正確的構建了(意思是它的引用沒有在構建的時候逸出,那麼所有能夠看到這個對象的引用的線程,在不進行同步的情況下,也將能看到在構造方法中中設置的final字段的值。
  • 應該儘量不影響現有的代碼。

四、重排序意味着什麼?

在很多情況下,訪問一個程序變量(對象實例字段,類靜態字段和數組元素)可能會使用不同的順序執行,而不是程序語義所指定的順序執行。編譯器能夠自由的以優化的名義去改變指令順序。在特定的環境下,處理器可能會次序顛倒的執行指令。數據可能在寄存器,處理器緩衝區和主內存中以不同的次序移動,而不是按照程序指定的順序。

例如,如果一個線程寫入值到字段a,然後寫入值到字段b,而且b的值不依賴於a的值,那麼,處理器就能夠自由的調整它們的執行順序,而且緩衝區能夠在a之前刷新b的值到主內存。有許多潛在的重排序的來源,例如編譯器,JIT以及緩衝區。

編譯器,運行時和硬件被期望一起協力創建好像是順序執行的語義的假象,這意味着在單線程的程序中,程序應該是不能夠觀察到重排序的影響的。但是,重排序在沒有正確同步了的多線程程序中開始起作用,在這些多線程程序中,一個線程能夠觀察到其他線程的影響,也可能檢測到其他線程將會以一種不同於程序語義所規定的執行順序來訪問變量。

大部分情況下,一個線程不會關注其他線程正在做什麼,但是當它需要關注的時候,這時候就需要同步了。

五、舊的內存模型有什麼問題?

舊的內存模型中有幾個嚴重的問題。這些問題很難理解,因此被廣泛的違背。例如,舊的存儲模型在許多情況下,不允許JVM發生各種重排序行爲。舊的內存模型中讓人產生困惑的因素造就了JSR-133規範的誕生。

例如,一個被廣泛認可的概念就是,如果使用final字段,那麼就沒有必要在多個線程中使用同步來保證其他線程能夠看到這個字段的值。儘管這是一個合理的假設和明顯的行爲,也是我們所期待的結果。實際上,在舊的內存模型中,我們想讓程序正確運行起來卻是不行的。在舊的內存模型中,final字段並沒有同其他字段進行區別對待——這意味着同步是保證所有線程看到一個在構造方法中初始化的final字段的唯一方法。結果——如果沒有正確同步的話,對一個線程來說,它可能看到一個字段的默認值,然後在稍後的時間裏,又能夠看到構造方法中設置的值。這意味着,一些不可變的對象,例如String,能夠改變它們值——這實在很讓人鬱悶。

舊的內存模型允許volatile變量的寫操作和非volaitle變量的讀寫操作一起進行重排序,這和大多數的開發人員對於volatile變量的直觀感受是不一致的,因此會造成迷惑。

最後,我們將看到的是,程序員對於程序沒有被正確同步的情況下將會發生什麼的直觀感受通常是錯誤的。JSR-133的目的之一就是要引起這方面的注意。

六、沒有正確同步的含義是什麼?

沒有正確同步的代碼對於不同的人來說可能會有不同的理解。在Java內存模型這個語義環境下,我們談到“沒有正確同步”,我們的意思是:

  1. 一個線程中有一個對變量的寫操作,
  2. 另外一個線程對同一個變量有讀操作,
  3. 而且寫操作和讀操作沒有通過同步來保證順序。

當這些規則被違反的時候,我們就說在這個變量上有一個“數據競爭”(data race)。一個有數據競爭的程序就是一個沒有正確同步的程序。

七、同步會幹些什麼?

同步有幾個方面的作用。最廣爲人知的就是互斥 ——一次只有一個線程能夠獲得一個監視器,因此,在一個監視器上面同步意味着一旦一個線程進入到監視器保護的同步塊中,其他的線程都不能進入到同一個監視器保護的塊中間,除非第一個線程退出了同步塊。

但是同步的含義比互斥更廣。同步保證了一個線程在同步塊之前或者在同步塊中的一個內存寫入操作以可預知的方式對其他有相同監視器的線程可見。當我們退出了同步塊,我們就釋放了這個監視器,這個監視器有刷新緩衝區到主內存的效果,因此該線程的寫入操作能夠爲其他線程所見。在我們進入一個同步塊之前,我們需要獲取監視器,監視器有使本地處理器緩存失效的功能,因此變量會從主存重新加載,於是其它線程對共享變量的修改對當前線程來說就變得可見了。

依據緩存來討論同步,可能聽起來這些觀點僅僅會影響到多處理器的系統。但是,重排序效果能夠在單一處理器上面很容易見到。對編譯器來說,在獲取之前或者釋放之後移動你的代碼是不可能的。當我們談到在緩衝區上面進行的獲取和釋放操作,我們使用了簡述的方式來描述大量可能的影響。

新的內存模型語義在內存操作(讀取字段,寫入字段,鎖,解鎖)以及其他線程的操作(start 和 join)中創建了一個部分排序,在這些操作中,一些操作被稱爲happen before其他操作。當一個操作在另外一個操作之前發生,第一個操作保證能夠排到前面並且對第二個操作可見。這些排序的規則如下:

  • 線程中的每個操作happens before該線程中在程序順序上後續的每個操作。
  • 解鎖一個監視器的操作happens before隨後對相同監視器進行鎖的操作。
  • 對volatile字段的寫操作happens before後續對相同volatile字段的讀取操作。
  • 線程上調用start()方法happens before這個線程啓動後的任何操作。
  • 一個線程中所有的操作都happens before從這個線程join()方法成功返回的任何其他線程。(注意思是其他線程等待一個線程的jion()方法完成,那麼,這個線程中的所有操作happens before其他線程中的所有操作)

這意味着:任何內存操作,這個內存操作在退出一個同步塊前對一個線程是可見的,對任何線程在它進入一個被相同的監視器保護的同步塊後都是可見的,因爲所有內存操作happens before釋放監視器以及釋放監視器happens before獲取監視器。

其他如下模式的實現被一些人用來強迫實現一個內存屏障的,不會生效:

synchronized (new Object()) {}

這段代碼其實不會執行任何操作,你的編譯器會把它完全移除掉,因爲編譯器知道沒有其他的線程會使用相同的監視器進行同步。要看到其他線程的結果,你必須爲一個線程建立happens before關係。

重點注意:對兩個線程來說,爲了正確建立happens before關係而在相同監視器上面進行同步是非常重要的。以下觀點是錯誤的:當線程A在對象X上面同步的時候,所有東西對線程A可見,線程B在對象Y上面進行同步的時候,所有東西對線程B也是可見的。釋放監視器和獲取監視器必須匹配(也就是說要在相同的監視器上面完成這兩個操作),否則,代碼就會存在“數據競爭”。

八、Final字段如何改變它們的值?

我們可以通過分析String類的實現具體細節來展示一個final變量是如何可以改變的。

String對象包含了三個字段:一個character數組,一個數組的offset和一個length。實現String類的基本原理爲:它不僅僅擁有character數組,而且爲了避免多餘的對象分配和拷貝,多個String和StringBuffer對象都會共享相同的character數組。因此,String.substring()方法能夠通過改變length和offset,而共享原始的character數組來創建一個新的String。對一個String來說,這些字段都是final型的字段。

String s1 = "/usr/tmp";
String s2 = s1.substring(4); 

字符串s2的offset的值爲4,length的值爲4。但是,在舊的內存模型下,對其他線程來說,看到offset擁有默認的值0是可能的,而且,稍後一點時間會看到正確的值4,好像字符串的值從“/usr”變成了“/tmp”一樣。

舊的Java內存模型允許這些行爲,部分JVM已經展現出這樣的行爲了。在新的Java內存模型裏面,這些是非法的。

九、在新的Java內存模型中,final字段是如何工作的?

一個對象的final字段值是在它的構造方法裏面設置的。假設對象被正確的構造了,一旦對象被構造,在構造方法裏面設置給final字段的的值在沒有同步的情況下對所有其他的線程都會可見。另外,引用這些final字段的對象或數組都將會看到final字段的最新值。

對一個對象來說,被正確的構造是什麼意思呢?簡單來說,它意味着這個正在構造的對象的引用在構造期間沒有被允許逸出。(參見安全構造技術)。換句話說,不要讓其他線程在其他地方能夠看見一個構造期間的對象引用。不要指派給一個靜態字段,不要作爲一個listener註冊給其他對象等等。這些操作應該在構造方法之後完成,而不是構造方法中來完成。

class FinalFieldExample {
  final int x;
  int y;
  static FinalFieldExample f;
  public FinalFieldExample() {
    x = 3;
    y = 4;
  }

  static void writer() {
    f = new FinalFieldExample();
  }

  static void reader() {
    if (f != null) {
      int i = f.x;
      int j = f.y;
    }
  }
}

上面的類展示了final字段應該如何使用。一個正在執行reader方法的線程保證看到f.x的值爲3,因爲它是final字段。它不保證看到f.y的值爲4,因爲f.y不是final字段。如果FinalFieldExample的構造方法像這樣:

public FinalFieldExample() { // bad!
  x = 3;
  y = 4;
  // bad construction - allowing this to escape
  global.obj = this;
}

那麼,從global.obj中讀取this的引用線程不會保證讀取到的x的值爲3。

能夠看到字段的正確的構造值固然不錯,但是,如果字段本身就是一個引用,那麼,你還是希望你的代碼能夠看到引用所指向的這個對象(或者數組)的最新值。如果你的字段是final字段,那麼這是能夠保證的。因此,當一個final指針指向一個數組,你不需要擔心線程能夠看到引用的最新值卻看不到引用所指向的數組的最新值。重複一下,這兒的“正確的”的意思是“對象構造方法結尾的最新的值”而不是“最新可用的值”。

現在,在講了如上的這段之後,如果在一個線程構造了一個不可變對象之後(對象僅包含final字段),你希望保證這個對象被其他線程正確的查看,你仍然需要使用同步纔行。例如,沒有其他的方式可以保證不可變對象的引用將被第二個線程看到。使用final字段的程序應該仔細的調試,這需要深入而且仔細的理解併發在你的代碼中是如何被管理的。

如果你使用JNI來改變你的final字段,這方面的行爲是沒有定義的。

十、volatile是幹什麼用的?

Volatile字段是用於線程間通訊的特殊字段。每次讀volatile字段都會看到其它線程寫入該字段的最新值;實際上,程序員之所以要定義volatile字段是因爲在某些情況下由於緩存和重排序所看到的陳舊的變量值是不可接受的。編譯器和運行時禁止在寄存器裏面分配它們。它們還必須保證,在它們寫好之後,它們被從緩衝區刷新到主存中,因此,它們立即能夠對其他線程可見。相同地,在讀取一個volatile字段之前,緩衝區必須失效,因爲值是存在於主存中而不是本地處理器緩衝區。在重排序訪問volatile變量的時候還有其他的限制。

在舊的內存模型下,訪問volatile變量不能被重排序,但是,它們可能和訪問非volatile變量一起被重排序。這破壞了volatile字段從一個線程到另外一個線程作爲一個信號條件的手段。

在新的內存模型下,volatile變量仍然不能彼此重排序。和舊模型不同的時候,volatile周圍的普通字段的也不再能夠隨便的重排序了。寫入一個volatile字段和釋放監視器有相同的內存影響,而且讀取volatile字段和獲取監視器也有相同的內存影響。事實上,因爲新的內存模型在重排序volatile字段訪問上面和其他字段(volatile或者非volatile)訪問上面有了更嚴格的約束。當線程A寫入一個volatile字段f的時候,如果線程B讀取f的話 ,那麼對線程A可見的任何東西都變得對線程B可見了。

如下例子展示了volatile字段應該如何使用:

class VolatileExample {
  int x = 0;
  volatile boolean v = false;
  public void writer() {
    x = 42;
    v = true;
  }

  public void reader() {
    if (v == true) {
      //uses x - guaranteed to see 42.
    }
  }
}

假設一個線程叫做“writer”,另外一個線程叫做“reader”。對變量v的寫操作會等到變量x寫入到內存之後,然後讀線程就可以看見v的值。因此,如果reader線程看到了v的值爲true,那麼,它也保證能夠看到在之前發生的寫入42這個操作。而這在舊的內存模型中卻未必是這樣的。如果v不是volatile變量,那麼,編譯器可以在writer線程中重排序寫入操作,那麼reader線程中的讀取x變量的操作可能會看到0。

實際上,volatile的語義已經被加強了,已經快達到同步的級別了。爲了可見性的原因,每次讀取和寫入一個volatile字段已經像一個半同步操作了

重點注意:對兩個線程來說,爲了正確的設置happens-before關係,訪問相同的volatile變量是很重要的。以下的結論是不正確的:當線程A寫volatile字段f的時候,線程A可見的所有東西,在線程B讀取volatile的字段g之後,變得對線程B可見了。釋放操作和獲取操作必須匹配(也就是在同一個volatile字段上面完成)。

十一、新的內存模型是否修復了雙重鎖檢查問題?

臭名昭著的雙重鎖檢查(也叫多線程單例模式)是一個騙人的把戲,它用來支持lazy初始化,同時避免過度使用同步。在非常早的JVM中,同步非常慢,開發人員非常希望刪掉它。雙重鎖檢查代碼如下:

// double-checked-locking - don't do this!

private static Something instance = null;

public Something getInstance() {
  if (instance == null) {
    synchronized (this) {
      if (instance == null)
        instance = new Something();
    }
  }
  return instance;
}

這看起來好像非常聰明——在公用代碼中避免了同步。這段代碼只有一個問題 —— 它不能正常工作。爲什麼呢?最明顯的原因是,初始化實例的寫入操作和實例字段的寫入操作能夠被編譯器或者緩衝區重排序,重排序可能會導致返回部分構造的一些東西。就是我們讀取到了一個沒有初始化的對象。這段代碼還有很多其他的錯誤,以及爲什麼對這段代碼的算法修正是錯誤的。在舊的java內存模型下沒有辦法修復它。更多深入的信息可參見:Double-checkedlocking: Clever but broken and The “DoubleChecked Locking is broken” declaration

許多人認爲使用volatile關鍵字能夠消除雙重鎖檢查模式的問題。在1.5的JVM之前,volatile並不能保證這段代碼能夠正常工作(因環境而定)。在新的內存模型下,實例字段使用volatile可以解決雙重鎖檢查的問題,因爲在構造線程來初始化一些東西和讀取線程返回它的值之間有happens-before關係。

然後,對於喜歡使用雙重鎖檢查的人來說(我們真的希望沒有人這樣做),仍然不是好消息。雙重鎖檢查的重點是爲了避免過度使用同步導致性能問題。從java1.0開始,不僅同步會有昂貴的性能開銷,而且在新的內存模型下,使用volatile的性能開銷也有所上升,幾乎達到了和同步一樣的性能開銷。因此,使用雙重鎖檢查來實現單例模式仍然不是一個好的選擇。(修訂—在大多數平臺下,volatile性能開銷還是比較低的)。

使用IODH來實現多線程模式下的單例會更易讀:

private static class LazySomethingHolder {
  public static Something something = new Something();
}

public static Something getInstance() {
  return LazySomethingHolder.something;
}

這段代碼是正確的,因爲初始化是由static字段來保證的。如果一個字段設置在static初始化中,對其他訪問這個類的線程來說是是能正確的保證它的可見性的。

十二、如果我需要寫一個VM,我需要做些什麼?

參見: http://gee.cs.oswego.edu/dl/jmm/cookbook.html .

十三、爲什麼我需要關注java內存模型?

爲什麼你需要關注java內存模型?併發程序的bug非常難找。它們經常不會在測試中發生,而是直到你的程序運行在高負荷的情況下才發生,非常難於重現和跟蹤。你需要花費更多的努力提前保證你的程序是正確同步的。這不容易,但是它比調試一個沒有正確同步的程序要容易的多。

 

原文分割線:-------------------------------------------------------------------------------------------------------------

What is a memory model, anyway?

In multiprocessor systems, processors generally have one or more layers of memory cache, which improves performance both by speeding access to data (because the data is closer to the processor) and reducing traffic on the shared memory bus (because many memory operations can be satisfied by local caches.) Memory caches can improve performance tremendously, but they present a host of new challenges. What, for example, happens when two processors examine the same memory location at the same time? Under what conditions will they see the same value?

At the processor level, a memory model defines necessary and sufficient conditions for knowing that writes to memory by other processors are visible to the current processor, and writes by the current processor are visible to other processors. Some processors exhibit a strong memory model, where all processors see exactly the same value for any given memory location at all times. Other processors exhibit a weaker memory model, where special instructions, called memory barriers, are required to flush or invalidate the local processor cache in order to see writes made by other processors or make writes by this processor visible to others. These memory barriers are usually performed when lock and unlock actions are taken; they are invisible to programmers in a high level language.

It can sometimes be easier to write programs for strong memory models, because of the reduced need for memory barriers. However, even on some of the strongest memory models, memory barriers are often necessary; quite frequently their placement is counterintuitive. Recent trends in processor design have encouraged weaker memory models, because the relaxations they make for cache consistency allow for greater scalability across multiple processors and larger amounts of memory.

The issue of when a write becomes visible to another thread is compounded by the compiler's reordering of code. For example, the compiler might decide that it is more efficient to move a write operation later in the program; as long as this code motion does not change the program's semantics, it is free to do so.  If a compiler defers an operation, another thread will not see it until it is performed; this mirrors the effect of caching.

Moreover, writes to memory can be moved earlier in a program; in this case, other threads might see a write before it actually "occurs" in the program.  All of this flexibility is by design -- by giving the compiler, runtime, or hardware the flexibility to execute operations in the optimal order, within the bounds of the memory model, we can achieve higher performance.

A simple example of this can be seen in the following code:

<span style="color:#000000"><code>Class Reordering {
  int x = 0, y = 0;
  public void writer() {
    x = 1;
    y = 2;
  }

  public void reader() {
    int r1 = y;
    int r2 = x;
  }
}
</code></span>

Let's say that this code is executed in two threads concurrently, and the read of y sees the value 2. Because this write came after the write to x, the programmer might assume that the read of x must see the value 1. However, the writes may have been reordered. If this takes place, then the write to y could happen, the reads of both variables could follow, and then the write to x could take place. The result would be that r1 has the value 2, but r2 has the value 0.

The Java Memory Model describes what behaviors are legal in multithreaded code, and how threads may interact through memory. It describes the relationship between variables in a program and the low-level details of storing and retrieving them to and from memory or registers in a real computer system. It does this in a way that can be implemented correctly using a wide variety of hardware and a wide variety of compiler optimizations.

Java includes several language constructs, including volatile, final, and synchronized, which are intended to help the programmer describe a program's concurrency requirements to the compiler. The Java Memory Model defines the behavior of volatile and synchronized, and, more importantly, ensures that a correctly synchronized Java program runs correctly on all processor architectures.

Do other languages, like C++, have a memory model?

Most other programming languages, such as C and C++, were not designed with direct support for multithreading. The protections that these languages offer against the kinds of reorderings that take place in compilers and architectures are heavily dependent on the guarantees provided by the threading libraries used (such as pthreads), the compiler used, and the platform on which the code is run.  

What is JSR 133 about?

Since 1997, several serious flaws have been discovered in the Java Memory Model as defined in Chapter 17 of the Java Language Specification. These flaws allowed for confusing behaviors (such as final fields being observed to change their value) and undermined the compiler's ability to perform common optimizations.

The Java Memory Model was an ambitious undertaking; it was the first time that a programming language specification attempted to incorporate a memory model which could provide consistent semantics for concurrency across a variety of architectures. Unfortunately, defining a memory model which is both consistent and intuitive proved far more difficult than expected. JSR 133 defines a new memory model for the Java language which fixes the flaws of the earlier memory model. In order to do this, the semantics of final and volatile needed to change.

The full semantics are available at http://www.cs.umd.edu/users/pugh/java/memoryModel, but the formal semantics are not for the timid. It is surprising, and sobering, to discover how complicated seemingly simple concepts like synchronization really are. Fortunately, you need not understand the details of the formal semantics -- the goal of JSR 133 was to create a set of formal semantics that provides an intuitive framework for how volatile, synchronized, and final work.

The goals of JSR 133 include:

  • Preserving existing safety guarantees, like type-safety, and strengthening others. For example, variable values may not be created "out of thin air": each value for a variable observed by some thread must be a value that can reasonably be placed there by some thread.
  • The semantics of correctly synchronized programs should be as simple and intuitive as possible.
  • The semantics of incompletely or incorrectly synchronized programs should be defined so that potential security hazards are minimized.
  • Programmers should be able to reason confidently about how multithreaded programs interact with memory.
  • It should be possible to design correct, high performance JVM implementations across a wide range of popular hardware architectures.
  • A new guarantee of initialization safety should be provided. If an object is properly constructed (which means that references to it do not escape during construction), then all threads which see a reference to that object will also see the values for its final fields that were set in the constructor, without the need for synchronization.
  • There should be minimal impact on existing code.

What is meant by reordering?

There are a number of cases in which accesses to program variables (object instance fields, class static fields, and array elements) may appear to execute in a different order than was specified by the program. The compiler is free to take liberties with the ordering of instructions in the name of optimization. Processors may execute instructions out of order under certain circumstances. Data may be moved between registers, processor caches, and main memory in different order than specified by the program.

For example, if a thread writes to field a and then to field b, and the value of b does not depend on the value of a, then the compiler is free to reorder these operations, and the cache is free to flush b to main memory before a. There are a number of potential sources of reordering, such as the compiler, the JIT, and the cache.

The compiler, runtime, and hardware are supposed to conspire to create the illusion of as-if-serial semantics, which means that in a single-threaded program, the program should not be able to observe the effects of reorderings. However, reorderings can come into play in incorrectly synchronized multithreaded programs, where one thread is able to observe the effects of other threads, and may be able to detect that variable accesses become visible to other threads in a different order than executed or specified in the program.

Most of the time, one thread doesn't care what the other is doing. But when it does, that's what synchronization is for.

What was wrong with the old memory model?

There were several serious problems with the old memory model. It was difficult to understand, and therefore widely violated. For example, the old model did not, in many cases, allow the kinds of reorderings that took place in every JVM. This confusion about the implications of the old model was what compelled the formation of JSR-133.

One widely held belief, for example, was that if final fields were used, then synchronization between threads was unnecessary to guarantee another thread would see the value of the field. While this is a reasonable assumption and a sensible behavior, and indeed how we would want things to work, under the old memory model, it was simply not true. Nothing in the old memory model treated final fields differently from any other field -- meaning synchronization was the only way to ensure that all threads see the value of a final field that was written by the constructor. As a result, it was possible for a thread to see the default value of the field, and then at some later time see its constructed value. This means, for example, that immutable objects like String can appear to change their value -- a disturbing prospect indeed.

The old memory model allowed for volatile writes to be reordered with nonvolatile reads and writes, which was not consistent with most developers intuitions about volatile and therefore caused confusion.

Finally, as we shall see, programmers' intuitions about what can occur when their programs are incorrectly synchronized are often mistaken. One of the goals of JSR-133 is to call attention to this fact.

What do you mean by “incorrectly synchronized”?

Incorrectly synchronized code can mean different things to different people. When we talk about incorrectly synchronized code in the context of the Java Memory Model, we mean any code where

  1. there is a write of a variable by one thread,
  2. there is a read of the same variable by another thread and
  3. the write and read are not ordered by synchronization

When these rules are violated, we say we have a data race on that variable. A program with a data race is an incorrectly synchronized program.

What does synchronization do?

Synchronization has several aspects. The most well-understood is mutual exclusion -- only one thread can hold a monitor at once, so synchronizing on a monitor means that once one thread enters a synchronized block protected by a monitor, no other thread can enter a block protected by that monitor until the first thread exits the synchronized block.

But there is more to synchronization than mutual exclusion. Synchronization ensures that memory writes by a thread before or during a synchronized block are made visible in a predictable manner to other threads which synchronize on the same monitor. After we exit a synchronized block, we release the monitor, which has the effect of flushing the cache to main memory, so that writes made by this thread can be visible to other threads. Before we can enter a synchronized block, we acquire the monitor, which has the effect of invalidating the local processor cache so that variables will be reloaded from main memory. We will then be able to see all of the writes made visible by the previous release.

Discussing this in terms of caches, it may sound as if these issues only affect multiprocessor machines. However, the reordering effects can be easily seen on a single processor. It is not possible, for example, for the compiler to move your code before an acquire or after a release. When we say that acquires and releases act on caches, we are using shorthand for a number of possible effects.

The new memory model semantics create a partial ordering on memory operations (read field, write field, lock, unlock) and other thread operations (start and join), where some actions are said to happen before other operations. When one action happens before another, the first is guaranteed to be ordered before and visible to the second. The rules of this ordering are as follows:

  • Each action in a thread happens before every action in that thread that comes later in the program's order.
  • An unlock on a monitor happens before every subsequent lock on that same monitor.
  • A write to a volatile field happens before every subsequent read of that same volatile.
  • A call to start() on a thread happens before any actions in the started thread.
  • All actions in a thread happen before any other thread successfully returns from a join() on that thread.

This means that any memory operations which were visible to a thread before exiting a synchronized block are visible to any thread after it enters a synchronized block protected by the same monitor, since all the memory operations happen before the release, and the release happens before the acquire.

Another implication is that the following pattern, which some people use to force a memory barrier, doesn't work:

<span style="color:#000000"><code>synchronized (new Object()) {}</code></span>

This is actually a no-op, and your compiler can remove it entirely, because the compiler knows that no other thread will synchronize on the same monitor. You have to set up a happens-before relationship for one thread to see the results of another.

Important Note: Note that it is important for both threads to synchronize on the same monitor in order to set up the happens-before relationship properly. It is not the case that everything visible to thread A when it synchronizes on object X becomes visible to thread B after it synchronizes on object Y. The release and acquire have to "match" (i.e., be performed on the same monitor) to have the right semantics. Otherwise, the code has a data race.

How can final fields appear to change their values?

One of the best examples of how final fields' values can be seen to change involves one particular implementation of the String class.

A String can be implemented as an object with three fields -- a character array, an offset into that array, and a length. The rationale for implementing String this way, instead of having only the character array, is that it lets multiple String and StringBuffer objects share the same character array and avoid additional object allocation and copying. So, for example, the methodString.substring() can be implemented by creating a new string which shares the same character array with the original String and merely differs in the length and offset fields. For a String, these fields are all final fields.

<span style="color:#000000"><code>String s1 = "/usr/tmp";
String s2 = s1.substring(4); 
</code></span>

The string s2 will have an offset of 4 and a length of 4. But, under the old model, it was possible for another thread to see the offset as having the default value of 0, and then later see the correct value of 4, it will appear as if the string "/usr" changes to "/tmp".

The original Java Memory Model allowed this behavior; several JVMs have exhibited this behavior. The new Java Memory Model makes this illegal.

How do final fields work under the new JMM?

The values for an object's final fields are set in its constructor. Assuming the object is constructed "correctly", once an object is constructed, the values assigned to the final fields in the constructor will be visible to all other threads without synchronization. In addition, the visible values for any other object or array referenced by those final fields will be at least as up-to-date as the final fields.

What does it mean for an object to be properly constructed? It simply means that no reference to the object being constructed is allowed to "escape" during construction. (See Safe Construction Techniques for examples.)  In other words, do not place a reference to the object being constructed anywhere where another thread might be able to see it; do not assign it to a static field, do not register it as a listener with any other object, and so on. These tasks should be done after the constructor completes, not in the constructor.

<span style="color:#000000"><code>class FinalFieldExample {
  final int x;
  int y;
  static FinalFieldExample f;
  public FinalFieldExample() {
    x = 3;
    y = 4;
  }

  static void writer() {
    f = new FinalFieldExample();
  }

  static void reader() {
    if (f != null) {
      int i = f.x;
      int j = f.y;
    }
  }
}
</code></span>

The class above is an example of how final fields should be used. A thread executing reader is guaranteed to see the value 3 for f.x, because it is final. It is not guaranteed to see the value 4 for y, because it is not final. If FinalFieldExample's constructor looked like this:

<span style="color:#000000"><code>public FinalFieldExample() { // bad!
  x = 3;
  y = 4;
  // bad construction - allowing this to escape
  global.obj = this;
}
</code></span>

then threads that read the reference to this from global.obj are not guaranteed to see 3 for x.

The ability to see the correctly constructed value for the field is nice, but if the field itself is a reference, then you also want your code to see the up to date values for the object (or array) to which it points. If your field is a final field, this is also guaranteed. So, you can have a final pointer to an array and not have to worry about other threads seeing the correct values for the array reference, but incorrect values for the contents of the array. Again, by "correct" here, we mean "up to date as of the end of the object's constructor", not "the latest value available".

Now, having said all of this, if, after a thread constructs an immutable object (that is, an object that only contains final fields), you want to ensure that it is seen correctly by all of the other thread, you still typically need to use synchronization. There is no other way to ensure, for example, that the reference to the immutable object will be seen by the second thread. The guarantees the program gets from final fields should be carefully tempered with a deep and careful understanding of how concurrency is managed in your code.

There is no defined behavior if you want to use JNI to change final fields.

What does volatile do?

Volatile fields are special fields which are used for communicating state between threads. Each read of a volatile will see the last write to that volatile by any thread; in effect, they are designated by the programmer as fields for which it is never acceptable to see a "stale" value as a result of caching or reordering. The compiler and runtime are prohibited from allocating them in registers. They must also ensure that after they are written, they are flushed out of the cache to main memory, so they can immediately become visible to other threads. Similarly, before a volatile field is read, the cache must be invalidated so that the value in main memory, not the local processor cache, is the one seen. There are also additional restrictions on reordering accesses to volatile variables.

Under the old memory model, accesses to volatile variables could not be reordered with each other, but they could be reordered with nonvolatile variable accesses. This undermined the usefulness of volatile fields as a means of signaling conditions from one thread to another.

Under the new memory model, it is still true that volatile variables cannot be reordered with each other. The difference is that it is now no longer so easy to reorder normal field accesses around them. Writing to a volatile field has the same memory effect as a monitor release, and reading from a volatile field has the same memory effect as a monitor acquire. In effect, because the new memory model places stricter constraints on reordering of volatile field accesses with other field accesses, volatile or not, anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f.

Here is a simple example of how volatile fields can be used:

<span style="color:#000000"><code>class VolatileExample {
  int x = 0;
  volatile boolean v = false;
  public void writer() {
    x = 42;
    v = true;
  }

  public void reader() {
    if (v == true) {
      //uses x - guaranteed to see 42.
    }
  }
}
</code></span>

Assume that one thread is calling writer, and another is calling reader. The write to v in writer releases the write to x to memory, and the read of v acquires that value from memory. Thus, if the reader sees the value true for v, it is also guaranteed to see the write to 42 that happened before it. This would not have been true under the old memory model.  If v were not volatile, then the compiler could reorder the writes in writer, and reader's read of x might see 0.

Effectively, the semantics of volatile have been strengthened substantially, almost to the level of synchronization. Each read or write of a volatile field acts like "half" a synchronization, for purposes of visibility.

Important Note: Note that it is important for both threads to access the same volatile variable in order to properly set up the happens-before relationship. It is not the case that everything visible to thread A when it writes volatile field f becomes visible to thread B after it reads volatile field g. The release and acquire have to "match" (i.e., be performed on the same volatile field) to have the right semantics.

Does the new memory model fix the "double-checked locking" problem?

The (infamous) double-checked locking idiom (also called the multithreaded singleton pattern) is a trick designed to support lazy initialization while avoiding the overhead of synchronization. In very early JVMs, synchronization was slow, and developers were eager to remove it -- perhaps too eager. The double-checked locking idiom looks like this:

<span style="color:#000000"><code>// double-checked-locking - don't do this!

private static Something instance = null;

public Something getInstance() {
  if (instance == null) {
    synchronized (this) {
      if (instance == null)
        instance = new Something();
    }
  }
  return instance;
}
</code></span>

This looks awfully clever -- the synchronization is avoided on the common code path. There's only one problem with it -- it doesn't work. Why not? The most obvious reason is that the writes which initialize instance and the write to the instance field can be reordered by the compiler or the cache, which would have the effect of returning what appears to be a partially constructed Something. The result would be that we read an uninitialized object. There are lots of other reasons why this is wrong, and why algorithmic corrections to it are wrong. There is no way to fix it using the old Java memory model. More in-depth information can be found at Double-checked locking: Clever, but broken and The "Double Checked Locking is broken" declaration

Many people assumed that the use of the volatile keyword would eliminate the problems that arise when trying to use the double-checked-locking pattern. In JVMs prior to 1.5, volatilewould not ensure that it worked (your mileage may vary). Under the new memory model, making the instance field volatile will "fix" the problems with double-checked locking, because then there will be a happens-before relationship between the initialization of the Something by the constructing thread and the return of its value by the thread that reads it.

However, for fans of double-checked locking (and we really hope there are none left), the news is still not good. The whole point of double-checked locking was to avoid the performance overhead of synchronization. Not only has brief synchronization gotten a LOT less expensive since the Java 1.0 days, but under the new memory model, the performance cost of using volatile goes up, almost to the level of the cost of synchronization. So there's still no good reason to use double-checked-locking. Redacted -- volatiles are cheap on most platforms.

Instead, use the Initialization On Demand Holder idiom, which is thread-safe and a lot easier to understand:

<span style="color:#000000"><code>private static class LazySomethingHolder {
  public static Something something = new Something();
}

public static Something getInstance() {
  return LazySomethingHolder.something;
}
</code></span>

This code is guaranteed to be correct because of the initialization guarantees for static fields; if a field is set in a static initializer, it is guaranteed to be made visible, correctly, to any thread that accesses that class.

What if I'm writing a VM?

You should look at http://gee.cs.oswego.edu/dl/jmm/cookbook.html .

Why should I care?

Why should you care? Concurrency bugs are very difficult to debug. They often don't appear in testing, waiting instead until your program is run under heavy load, and are hard to reproduce and trap. You are much better off spending the extra effort ahead of time to ensure that your program is properly synchronized; while this is not easy, it's a lot easier than trying to debug a badly synchronized application.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章