HashMap中有哪些常量?這些常量設計的目的是什麼?本篇帶你走近Doug Lea、Josh Bloch、Arthur van Hoff、 Neal Gafter對HashMap的設計。(以下都是基於jdk1.8)
常量設計 |
(1)HashMap默認初始化大小是1 << 4(即16)
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
關於這個變量,註釋說“MUST be a power of two”,即必須是2的冪次方。爲什麼一定要是2的冪次方呢?
HashMap底層數據結構是數組+鏈表(或數組+紅黑樹),當添加元素時,索引定位使用的是i =(n - 1) & hash ,當初始化大小n是2的冪次方時,它就等價於 n % hash 。定位下標一般用取餘法,而按位與(&)運算的效率要比取餘(%)運算的效率高,所以默認初始化大必須爲2的冪次方,就是爲了使用更高效的與運算。
默認初始化大小爲什麼是16而不是8或者32?如果太小,擴容比較頻繁;如果太大,又佔用內存空間。這算是jdk爲我們做的初始權衡吧。
(2)HashMap最大容量是1<<30,即2的30次方
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
我們知道int是佔4個字節,一個字節是8位,所以說是32位整型,那按理說可以左移31位,即2的31次冪。在這裏爲什麼不是2的31次方呢?實際上,二進制數的最左邊那一位是符號位,用來表示正負的。我們來看下面的例子:
System.out.println(1 << 30);
System.out.println(1 << 31);
System.out.println(1 << 32);
System.out.println(1 << 33);
輸出:
1073741824
-2147483648
1
2
所以,HashMap的最大容量就是2的30次方。
(3)HashMap默認加載因子是0.75
/**
* The load factor used when none specified in constructor.
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
HashMap表徵hash表的填滿程度,讓我們看一下源碼對load factor的解釋:
* <p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs. Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>). The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations. If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.
通常來說,加載因子的默認值0.75在時間性能和空間消耗之間達到了平衡。較高的值雖然降低了空間消耗,但是卻增加了查找時間(反映在HashMap大多數的操作上,包括get和put)。當設置初始容量的時候,應該考慮將要放入map中的元素數量和加載因子,以減少rehash的次數。如果初始的容量比預計的entry數量除以加載因子的商還要大,那麼永遠不需要rehash操作。
(4)HashMap默認樹化(鏈表轉換成紅黑樹)閾值是8
/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;
Java8及以後的版本中,HashMap底層數據結構引入了紅黑樹,當添加元素的時候,如果桶中鏈表元素超過8,會自動轉爲紅黑樹。那麼閾值爲什麼是8呢?來看HashMap源碼中的這段註釋:
* Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million
理想狀態中,在隨機哈希碼情況下,對於默認0.75的加載因子,桶中節點的分佈頻率服從參數約爲0.5的泊松分佈,即使粒度調整會產生較大方差。從數據中可以看到鏈表中元素個數爲8時的概率非常非常小了,所以鏈表轉換紅黑樹的閾值選擇了8。
(5)HashMap中一個樹的鏈表還原閾值是6
/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;
鏈表樹化閥值是8,那麼樹還原爲鏈表爲什麼是6而不是7呢?這是爲了防止鏈表和樹之間頻繁的轉換。如果是7的話,假設一個HashMap不停的插入、刪除元素,鏈表個數一直在8左右徘徊,就會頻繁樹轉鏈表、鏈表轉樹,效率非常低下。
(5)HashMap的最小樹化容量是64
/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
*/
static final int MIN_TREEIFY_CAPACITY = 64;
爲什麼是64呢?這是因爲容量低於64時,哈希碰撞的機率比較大,而這個時候出現長鏈表的可能性會稍微大一些,這種原因下產生的長鏈表,我們應該優先選擇擴容而避免不必要的樹化。
參考鏈接: