譯文-jdk默認hashCode方法的實現[How does the default hashCode() work?]

原文鏈接：How does the default hashCode() work?
譯文：

一個不起眼的小問題

上週的工作中我向一個類提交了一個微不足道的變化，實現toString（）方法用來讓日誌更有用。令我驚訝的是，變化導致約5％的覆蓋率下降。我知道所有新代碼都被現有的單元測試覆蓋，但是覆蓋率下降了，所以哪裏出了問題？
對比之前的覆蓋範圍報告，一個敏銳的同事發現，在代碼之前單元測試覆蓋了HashCode（）的實現，但改動之後就沒有覆蓋。當然，這是對的：默認的ToString（）調用hashcode（），修改後的沒有。

public String toString() {
    return getClass().getName() + "@" + Integer.toHexString(hashCode());
}

重寫了toString之後，我們自定義的hashCode不再被調用，所以覆蓋率下降了。所有人都知道默認的toString的實現原理，但是...

默認的hashCode方法怎麼實現的？

默認的hashCode()返回的是唯一hash碼（identity hash code），注意這個和重寫hashCode返回的hash碼不是一個東西，如果某個類我們重寫了hashCode方法，我們還可以使用System.identityHashCode(o)來獲取它的唯一hash碼（感覺這個就是對象的身份證號）。
大家普遍認爲唯一hash碼使用的是對象內存地址的對應的整數（內存整理對象移動了咋辦？），不過java api文檔是這麼說的：

... is typically implemented by converting the internal address of the object into an integer, 
but this implementation technique is not required by the Java™ programming language.
典型的實現方式是把對象的內存地址轉爲一個整數，但是這種實現技術並不是java平臺必需的

鑑於JVM將重新定位對象（例如在垃圾收集期間由於晉升或壓縮），在我們計算對象的身份哈希碼之後，我們必須保留它。

默認的hashCode實現

對於默認的hashCode方法，不同的JVM可能實現的方式不一樣，本文只看openJDK的源碼，hashCode是native方法，入口如下：src/share/vm/prims/jvm.h 和 src/share/vm/prims/jvm.cpp

508 JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
509   JVMWrapper("JVM_IHashCode");
510   // as implemented in the classic virtual machine; return 0 if object is NULL
511   return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;
512 JVM_END

然後是ObjectSynchronizer::FastHashCode()文件是src/share/vm/runtime/synchronizer.cpp 人們可能天真的以爲方法像下面這麼簡單：

if (obj.hash() == 0) {
    obj.set_hash(generate_new_hash());
}
return obj.hash();

但實際上有幾百行...看文件名也大概知道此處涉及到同步，也就是synchronized的實現，是的，就是對象內置鎖。這個隨後再討論，先看看如何生成唯一hash碼

static inline intptr_t get_next_hash(Thread* self, oop obj) {
  intptr_t value = 0;
  if (hashCode == 0) {
    // This form uses global Park-Miller RNG.
    // On MP system we'll have lots of RW access to a global, so the
    // mechanism induces lots of coherency traffic.
    value = os::random();
  } else if (hashCode == 1) {
    // This variation has the property of being stable (idempotent)
    // between STW operations.  This can be useful in some of the 1-0
    // synchronization schemes.
    intptr_t addr_bits = cast_from_oop<intptr_t>(obj) >> 3;
    value = addr_bits ^ (addr_bits >> 5) ^ GVars.stw_random;
  } else if (hashCode == 2) {
    value = 1;            // for sensitivity testing
  } else if (hashCode == 3) {
    value = ++GVars.hc_sequence;
  } else if (hashCode == 4) {
    value = cast_from_oop<intptr_t>(obj);
  } else {
    // Marsaglia's xor-shift scheme with thread-specific state
    // This is probably the best overall implementation -- we'll
    // likely make this the default in future releases.
    unsigned t = self->_hashStateX;
    t ^= (t << 11);
    self->_hashStateX = self->_hashStateY;
    self->_hashStateY = self->_hashStateZ;
    self->_hashStateZ = self->_hashStateW;
    unsigned v = self->_hashStateW;
    v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
    self->_hashStateW = v;
    value = v;
  }

  value &= markWord::hash_mask;
  if (value == 0) value = 0xBAD;
  assert(value != markWord::no_hash, "invariant");
  return value;
}

0. A randomly generated number.隨機數
1. A function of memory address of the object.內存地址函數
2. A hardcoded 1 (used for sensitivity testing.)硬編碼爲數字1
3. A sequence.自增序列
4. The memory address of the object, cast to int.內存地址強轉爲int
5. Thread state combined with xorshift (https://en.wikipedia.org/wiki/Xorshift)線程狀態聯合xorshift

根據src/share/vm/runtime/globals.hpp中，生產環境是5，也就是xorshift，應該也是一個隨機數方案

1127   product(intx, hashCode, 5,                                                \
1128           "(Unstable) select hashCode generation algorithm")                \

openjdk8和9使用的是5，openjdk7和6使用的是第一種方案(也就是隨機數方案)。

對象頭與同步

在openjdk中，mark word的描述如下：細節看這裏

30 // The markOop describes the header of an object.
31 //
32 // Note that the mark is not a real oop but just a word.
33 // It is placed in the oop hierarchy for historical reasons.
34 //
35 // Bit-format of an object header (most significant first, big endian layout below):
36 //
37 //  32 bits:
38 //  --------
39 //             hash:25 ------------>| age:4    biased_lock:1 lock:2 (normal object)
40 //             JavaThread*:23 epoch:2 age:4    biased_lock:1 lock:2 (biased object)
41 //             size:32 ------------------------------------------>| (CMS free block)
42 //             PromotedObject*:29 ---------->| promo_bits:3 ----->| (CMS promoted object)
43 //
44 //  64 bits:
45 //  --------
46 //  unused:25 hash:31 -->| unused:1   age:4    biased_lock:1 lock:2 (normal object)
47 //  JavaThread*:54 epoch:2 unused:1   age:4    biased_lock:1 lock:2 (biased object)
48 //  PromotedObject*:61 --------------------->| promo_bits:3 ----->| (CMS promoted object)
49 //  size:64 ----------------------------------------------------->| (CMS free block)
50 //
51 //  unused:25 hash:31 -->| cms_free:1 age:4    biased_lock:1 lock:2 (COOPs && normal object)
52 //  JavaThread*:54 epoch:2 cms_free:1 age:4    biased_lock:1 lock:2 (COOPs && biased object)
53 //  narrowOop:32 unused:24 cms_free:1 unused:4 promo_bits:3 ----->| (COOPs && CMS promoted object)
54 //  unused:21 size:35 -->| cms_free:1 unused:7 ------------------>| (COOPs && CMS free block)

mark word格式在32和64位略有不同。後者有兩個變體，具體取決於是否啓用了壓縮對象指針。默認情況下，Oracle和OpenJDK 8都執行。如果對象處於偏向鎖定狀態，那麼有23bit存儲的是偏向線程的指針，那麼從哪裏取唯一hash碼呢？

偏向鎖

對象的偏向狀態是偏向鎖導致的。從hotspot6開始嘗試減少給一個對象加鎖的成本。這些操作很昂貴，因爲它們的實現通常依賴於原子CPU指令（CAS），以便在不同線程上安全地處理對象上的鎖定/解鎖請求。但是根據分析，在大多數應用中，大部分的對象只會被一個線程鎖定，所以上述原子指令的執行是一種浪費（cas指令已經很快了，比上下文切換快多了，也是一種浪費。。。），爲了避免這種浪費，有偏向鎖定的JVM允許線程讓對象偏向自己。如果一個對象是偏心的，那個幸運的線程加鎖和解鎖連cas指令都不需要執行，只有沒有多個線程爭取同一個對象，偏向鎖的性能會很好。繼續看FastHashCode：

601 intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) {
602   if (UseBiasedLocking) {
610     if (obj->mark()->has_bias_pattern()) {
          ...
617       BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());
          ...
619       assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
620     }
621   }

生成唯一hash碼時，會撤銷已存在的偏向，並且會禁用此對象的偏向能力（false意味着不要嘗試重偏向），上述代碼幾行之後，這個確實是不變的：

637   // object should remain ineligible for biased locking
638   assert (!mark->has_bias_pattern(), "invariant") ;

這意味着請求一個對象的唯一hash碼會禁用這個對象的偏向鎖，嘗試鎖定此對象需要使用昂貴的原子指令，即使只有一個線程請求鎖。

爲什麼偏向鎖和唯一hash碼有衝突？

要回答這個問題，我們必須瞭解哪些是標記字的可能位置，具體取決於對象的鎖定狀態。從HotSpot Wiki的示例圖中有如下轉換：後邊不譯了，直接說重點😅

譯文-jdk默認hashCode方法的實現[How does the default hashCode() work?]

一個不起眼的小問題

默認的hashCode方法怎麼實現的？

默認的hashCode實現

對象頭與同步

偏向鎖

爲什麼偏向鎖和唯一hash碼有衝突？

ci 404 問題總結

探祕Python爬蟲技術：王者榮耀英雄圖片爬取

BizDevOps全局建設思路：橫向串聯，縱向深化

MySQL 創建表後神祕消失？揭祕零寬字符陷阱

寫給職場新人｜從迷茫到屢獲殊榮的技術人成長之路

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結