最近在優化代碼,發現有些場景下使用HashMap效率特別低下,深入研究源碼找到問題根源,以文記之。
HashMap的數據結構:數組+鏈表或者紅黑樹,大概長這樣
一般使用HashMap的時候都是直接進行如下操作
HashMap map = new HashMap();
即沒有指定任何初始化參數,那麼在底層,jvm是怎麼做的呢,源碼描述如下
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
註釋的意思是說使用默認的初始容量**(16)**初始化一個空的HashMap對象,並且指定默認的負載因子爲0.75。
提到負載因子,則要先介紹一下HashMap內部定義的幾個常量
1、初始容量,初始容量爲16,而且必須爲2的n次方
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
2、最大容量,2的30次方,
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
3、默認負載因子,默認爲0.75,就是實際容量和當前最大容量的比值,這個值決定了擴容閾值的大小,比如默認容量是16,當map中已經put了16*0.75=12個元素時,map就需要擴容了
/**
* The load factor used when none specified in constructor.
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
4、轉換成樹閾值,什麼意思呢,HashMap的組成是數組加鏈表,也就是說在某一個數組下標下如果鏈表的長度到了8,就要轉換成樹
/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;
5、樹轉鏈表的閾值,即當某個數組下標下的樹元素個數小於等於6時,會將樹轉換成鏈表
/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;
6、鏈表轉樹時,table的最小容量,其實這個是爲了平衡擴容和轉換成樹之間的矛盾吧
/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
* 即 當哈希表中的容量 > 該值時,才允許樹形化鏈表 (即 將鏈表 轉換成紅黑樹)
* 否則,若桶內元素太多時,則直接擴容,而不是樹形化
* 爲了避免進行擴容、樹形化選擇的衝突,這個值不能小於 4 * TREEIFY_THRESHOLD
*/
static final int MIN_TREEIFY_CAPACITY = 64;
除了上面幾個有值的常量,還有兩個需要知道的變量
1、閾值,就是需要resize的臨界值,它等於(容量*負載因子)
/**
* The next size value at which to resize (capacity * load factor).
*
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;
2、負載因子,默認的是0.75,
/**
* The load factor for the hash table.
*
* @serial
*/
final float loadFactor;
3、map元素個數,被transient 修飾,說明不能被序列化
/**
* The number of key-value mappings contained in this map.
*/
transient int size;
4、哈希數組,就是常說的容量大小的那個數組,最上面那個圖的黃色部分
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
transient Node<K,V>[] table;
瞭解了常數和重要的變量後,查詢來看看無參構造函數
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
既然描述中是使用默認的容量(16)和默認的負載因子去構造一個空的HashMap對象,那也就是內部會調用有兩個參數的構造方法,如下
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param initialCapacity the initial capacity
* @param loadFactor the load factor
* @throws IllegalArgumentException if the initial capacity is negative
* or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}
在我們用new HashMap()
時,這時候的initialCapacity
爲16,loadFactor
爲0.75,換句話說就算我們new HashMap()
操作時,jvm給我們返回了一個初始容量大小爲16、負載因子爲0.75的HashMap對象的引用。
總結:裏面隱含一個面試經常問的點:不指定初始容量是,HashMap的初始大小是多少?