面試加分項-HashMap源碼中這些常量的設計目的

前言
之前週會技術分享，一位同事講解了HashMap的源碼，涉及到一些常量設計的目的，本文將談談這些常量爲何這樣設計，希望大家有所收穫。

HashMap默認初始化大小爲什麼是1 << 4（16）

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;

HashMap默認初始化大小爲什麼是16，這裏分兩個維度分析，爲什麼是2的冪，爲什麼是16而不是8或者32。

默認初始化大小爲什麼定義爲2的冪？

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab =
table) == null || (n = tab.length) == 0) n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash,
key, value, null);

我們知道HashMap的底層數據結構是數組+鏈表/數組+紅黑樹，由以上方法，可以發現數組下標索引的定位公式是：i = (n - 1) & hash，當初始化大小n是2的倍數時，(n - 1) & hash等價於n%hash。定位下標一般用取餘法，爲什麼這裏不用取餘呢？

因爲，與運算（&）比取餘（%）運算效率高
求餘運算： a % b就相當與a-(a / b)*b 的運算。
與運算：一個指令就搞定
因此，默認初始化大定義爲2的冪，就是爲了使用更高效的與運算。

默認初始化大小爲什麼是16而不是8或者32？

如果太小，4或者8，擴容比較頻繁；如果太大，32或者64甚至太大，又佔用內存空間。

打個比喻，假設你開了個情侶咖啡廳，平時一般都是7,8對情侶來喝咖啡，高峯也就10對。那麼，你是不是設置8個桌子就好啦，如果人來得多再考慮加桌子。如果設置4桌，那麼就經常座位不夠要加桌子，如果設置10桌或者更多，那麼肯定佔地方嘛。

默認加載因子爲什麼是0.75

/**

The load factor used when none specified in constructor.
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
加載因子表示哈希表的填滿程度，跟擴容息息相關。爲什麼不是0.5或者1呢？

如果是0.5，就是說哈希表填到一半就開始擴容了，這樣會導致擴容頻繁，並且空間利用率比較低。如果是1，就是說哈希表完全填滿纔開始擴容，這樣雖然空間利用提高了，但是哈希衝突機會卻大了。可以看一下源碼文檔的解釋：

As a general rule, the default load factor (.75) offers a good
tradeoff between time and space costs. Higher values decrease the
space overhead but increase the lookup cost (reflected in most of
the operations of the HashMap class, including
get and put). The expected number of entries in
the map and its load factor should be taken into account when
setting its initial capacity, so as to minimize the number of
rehash operations. If the initial capacity is greater than the
maximum number of entries divided by the load factor, no rehash
operations will ever occur.
翻譯大概意思是：

作爲一般規則，默認負載因子（0.75）在時間和空間成本上提供了良好的權衡。負載因子數值越大，空間開銷越低，但是會提高查找成本（體現在大多數的HashMap類的操作，包括get和put）。設置初始大小時，應該考慮預計的entry數在map及其負載係數，並且儘量減少rehash操作的次數。如果初始容量大於最大條目數除以負載因子，rehash操作將不會發生。

簡言之，負載因子0.75 就是衝突的機會與空間利用率權衡的最後體現，也是一個程序員實驗的經驗值。

StackOverFlow有個回答這個問題的： What is the significance of load factor in HashMap?

這個回答解釋：一個bucket空和非空的概率爲0.5，通過牛頓二項式等數學計算，得到這個loadfactor的值爲log（2），約等於0.693。最後選擇選擇0.75，可能0.75是接近0.693的四捨五入數中，比較好理解的一個，並且默認容量大小16*0.75=12，爲一個整數。

鏈表轉換紅黑樹的閥值爲什麼是8

/**

The bin count threshold for using a tree rather than list for a
bin. Bins are converted to trees when adding an element to a
bin with at least this many nodes. The value must be greater
than 2 and should be at least 8 to mesh with assumptions in
tree removal about conversion back to plain bins upon
shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;
JDK8及以後的版本中，HashMap底層數據結構引入了紅黑樹。當添加元素的時候，如果桶中鏈表元素超過8，會自動轉爲紅黑樹。那麼閥值爲什麼是8呢？請看HashMap的源碼這段註釋：
Ideally, under random hashCodes, the frequency of
- nodes in bins follows a Poisson distribution
- (http://en.wikipedia.org/wiki/Poisson_distribution) with a
- parameter of about 0.5 on average for the default resizing
- threshold of 0.75, although with a large variance because of
- resizing granularity. Ignoring variance, the expected
- occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
- factorial(k)). The first values are:
- 0: 0.60653066
- 1: 0.30326533
- 2: 0.07581633
- 3: 0.01263606
- 4: 0.00157952
- 5: 0.00015795
- 6: 0.00001316
- 7: 0.00000094
- 8: 0.00000006
- more: less than 1 in ten million

理想狀態中，在隨機哈希碼情況下，對於默認0.75的加載因子，桶中節點的分佈頻率服從參數爲0.5的泊松分佈，即使粒度調整會產生較大方差。

由對照表，可以看到鏈表中元素個數爲8時的概率非常非常小了，所以鏈表轉換紅黑樹的閥值選擇了8。

一個樹的鏈表還原閾值爲什麼是6

/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 */
 static final int UNTREEIFY_THRESHOLD = 6;

上一小節分析，可以知道，鏈表樹化閥值是8，那麼樹還原爲鏈表爲什麼是6而不是7呢？這是爲了防止鏈表和樹之間頻繁的轉換。如果是7的話，假設一個HashMap不停的插入、刪除元素，鏈表個數一直在8左右徘徊，就會頻繁樹轉鏈表、鏈表轉樹，效率非常低下。

最大容量爲什麼是1 << 30

/**
 * The maximum capacity, used if a higher value is implicitly specified
 * by either of the constructors with arguments.
 * MUST be a power of two <= 1<<30.
 */
 static final int MAXIMUM_CAPACITY = 1 << 30;

HashMap爲什麼要滿足2的n次方？

由第一小節（HashMap默認初始化大小爲什麼是1 << 4）分析可知，HashMap容量需要滿足2的冪，與運算比取餘運算效率高。只有容量是2的n次方時，與運算纔等於取餘運算。

tab[i = (n - 1) & hash]
爲什麼不是2的31次方呢？

我們知道，int佔四個字節，一個字節佔8位，所以是32位整型，也就是說最多32位。那按理說，最大數可以向左移動31位即2的31次冪，在這裏爲什麼不是2的31次方呢？

實際上，二進制數的最左邊那一位是符號位，用來表示正負的，我們來看一下demo代碼：

 System.out.println(1<<30);
 System.out.println(1<<31);
 System.out.println(1<<32);
 System.out.println(1<<33);
 System.out.println(1<<34);

輸出：

1073741824
-2147483648
1
2
4

所以，HashMap最大容量是1 << 30。

哈希表的最小樹形化容量爲什麼是64

/**

The smallest table capacity for which bins may be treeified.
(Otherwise the table is resized if too many nodes in a bin.)
Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
between resizing and treeification thresholds.
*/
static final int MIN_TREEIFY_CAPACITY = 64;
這是因爲容量低於64時，哈希碰撞的機率比較大，而這個時候出現長鏈表的可能性會稍微大一些，這種原因下產生的長鏈表，我們應該優先選擇擴容而避免不必要的樹化。

作者：Jay_huaxiao

鏈接：https://juejin.im/post/5d7195f9f265da03a6533942

來源：掘金

著作權歸作者所有。商業轉載請聯繫作者獲得授權，非商業轉載請註明出處

。

免費分享java技術資料，需要的朋友可以在關注後私信我

面試加分項-HashMap源碼中這些常量的設計目的

從入門到放棄？零基礎學java並不難，小白快速入門祕籍

Java代碼優化：使用構造函數和使用一個setter的效率差別

Java八大優勢，這就是你選擇它的理由！

Java實現QQ登錄和微博登錄

關於 Docker 入門，這一篇就夠了

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結