HashMap源碼分析

文章作者:Gracker

發佈時間:2015年08月05日 - 20時27分

最後更新:2015年12月17日 - 15時32分

原始鏈接:http://androidperformance.com/2015/08/05/HashMap.html

鏈表和數組可以按照人們意願排列元素的次序，但是，如果想要查看某個指定的元素，卻又忘記了它的位置，就需要訪問所有的元素，直到找到爲止。如果集合中元素很多，將會消耗很多時間。有一種數據結構可以快速查找所需要查找的對象，這個就是哈希表（hash table）.

HashMap是基於哈希表的 Map 接口的實現。此實現提供所有可選的映射操作，並允許使用 null 值和 null 鍵。（除了非同步和允許使用 null 之外，HashMap 類與 Hashtable 大致相同。）此類不保證映射的順序，特別是它不保證該順序恆久不變。

1. HashMap的數據結構：

HashMap使用數組和鏈表來共同組成的。可以看出底層是一個數組，而數組的每個元素都是一個鏈表頭。

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;
        ...
     }

Entry是HashMap中的一個內部靜態類，包級私有，實現了Map中的接口Entey。可以看出來它內部含有一個指向下一個元素的指針。

2.構造函數

HashMap的構造函數有四個：

HashMap() — 構造一個具有默認初始容量 (16) 和默認加載因子 (0.75) 的空 HashMap。
HashMap(int initialCapacity) — 構造一個帶指定初始容量和默認加載因子 (0.75) 的空 HashMap。
HashMap(int initialCapacity, float loadFactor) — 構造一個帶指定初始容量和加載因子的空 HashMap。
HashMap(Map<? extends K,? extends V> m) — 構造一個映射關係與指定 Map 相同的新 HashMap

實際上就兩種，一個是指定初始容量和加載因子，一個是用一個給定的映射關係生成一個新的HashMap。說一下第一種。

/**
    * Constructs an empty <tt>HashMap </tt> with the specified initial
    * capacity and load factor.
    *
    * @param  initialCapacity the initial capacity
    * @param  loadFactor      the load factor
    * @throws IllegalArgumentException if the initial capacity is negative
    *         or the load factor is nonpositive
    */
   public HashMap( int initialCapacity, float loadFactor) {
       if (initialCapacity < 0)
           throw new IllegalArgumentException( "Illegal initial capacity: " +
                                              initialCapacity);
       if (initialCapacity > MAXIMUM_CAPACITY)
           initialCapacity = MAXIMUM_CAPACITY;
       if (loadFactor <= 0 || Float. isNaN(loadFactor))
           throw new IllegalArgumentException( "Illegal load factor: " +
                                              loadFactor);

       // Find a power of 2 >= initialCapacity
       int capacity = 1;
       while (capacity < initialCapacity)
           capacity <<= 1;

       this.loadFactor = loadFactor;
       threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
       table = new Entry[capacity];
       useAltHashing = sun.misc.VM. isBooted() &&
               (capacity >= Holder. ALTERNATIVE_HASHING_THRESHOLD);
       init();
   }

參數很簡單，初始容量，和加載因子。初始容量定義了初識數組的大小，加載因子和初始容量的乘積確定了一個閾值。閾值最大是(1<<30) + 1。初始容量一定是2的N次方，而且剛剛比要設置的值大。默認初始容量是16，默認加載因子是0.75。當表中的元素數量大於等於閾值時，數組的容量會翻倍，並重新插入元素到新的數組中，所以HashMap不保證順序恆久不變。

當輸入的加載因子小於零或者不是浮點數時會拋出異常（IllegalArgumentException）。

3.put操作

/**
    * Associates the specified value with the specified key in this map.
    * If the map previously contained a mapping for the key, the old
    * value is replaced.
    *
    * @param key key with which the specified value is to be associated
    * @param value value to be associated with the specified key
    * @return the previous value associated with <tt>key </tt>, or
    *         <tt>null </tt> if there was no mapping for <tt> key</tt> .
    *         (A <tt>null </tt> return can also indicate that the map
    *         previously associated <tt>null </tt> with <tt> key</tt> .)
    */
   public V put(K key, V value) {
       if (key == null)
           return putForNullKey(value);
       int hash = hash(key);
       int i = indexFor(hash, table .length );
       for (Entry<K,V> e = table[i]; e != null; e = e. next) {
           Object k;
           if (e. hash == hash && ((k = e. key) == key || key.equals(k))) {
               V oldValue = e. value;
               e. value = value;
               e.recordAccess( this);
               return oldValue;
           }
       }

       modCount++;
       addEntry(hash, key, value, i);
       return null;
   }

由於HashMap只是key值爲null，所以首先要判斷key值是不是爲null，是則進行特殊處理。

/**
    * Offloaded version of put for null keys
    */
   private V putForNullKey(V value) {
       for (Entry<K,V> e = table[0]; e != null; e = e. next) {
           if (e. key == null) {
               V oldValue = e. value;
               e. value = value;
               e.recordAccess( this);
               return oldValue;
           }
       }
       modCount++;
       addEntry(0, null, value, 0);
       return null;
   }

可以看出key值爲null則會插入到數組的第一個位置。如果第一個位置存在，則替代，不存在則添加一個新的。稍後會看到addEntry函數。

PS：考慮一個問題，key值爲null會插入到table[0]，那爲什麼還要遍歷整個鏈表呢？

回到put函數中。在判斷key不爲null後，會求key的hash值，並通過indexFor函數找出這個key應該存在table中的位置。

/**
    * Returns index for hash code h.
    */
   static int indexFor (int h, int length) {
       return h & (length-1);
   }

indexFor函數很簡短，但是卻實現的很巧妙。一般來說我們把一個數映射到一個固定的長度會用取餘（%）運算，也就是h % length，但裏巧妙地運用了table.length的特性。還記得前面說了數組的容量都是很特殊的數，是2的N次方。用二進制表示也就是一個1後面N個0，（length-1）就是N個1了。這裏直接用與運算，運算速度快，效率高。但是這是是利用了length的特殊性，如果length不是2的N次方的話可能會增加衝突。

前面的問題在這裏就有答案了。因爲indexFor函數返回值的範圍是0到（length-1），所以可能會有key值不是null的Entry存到table[0]中，所以前面還是需要遍歷鏈表的。

得到key值對應在table中的位置，就可以對鏈表進行遍歷，如果存在該key則，替換value，並把舊的value返回，modCount++代表操作數加1。這個屬性用於Fail-Fast機制，後面講到。如果遍歷鏈表後發現key不存在，則要插入一個新的Entry到鏈表中。這時就會調用addEntry函數

/**
 * Adds a new entry with the specified key, value and hash code to
 * the specified bucket.  It is the responsibility of this
 * method to resize the table if appropriate.
 *
 * Subclass overrides this to alter the behavior of put method.
 */
void addEntry (int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && ( null != table[bucketIndex])) {
        resize(2 * table. length);
        hash = ( null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }

    createEntry(hash, key, value, bucketIndex);
}

這個函數有四個參數，第一個是key的hash值，第二個第三個分別是key和value，最後一個是這個key在table中的位置，也就是indexFor(hash(key), table.length-1)。首先會判斷size（當前表中的元素個數）是不是大於或等於閾值。並且判斷數組這個位置是不是空。如果條件滿足則要resize(2 * table. length)，等下我們來看這個操作。超過閾值要resize是爲了減少衝突，提高訪問效率。判斷當前位置不是空時才resize是爲了儘可能減少resize次數，因爲這個位置是空，放一個元素在這也沒有衝突，所以不影響效率，就先不進行resize了。

/**
 * Rehashes the contents of this map into a new array with a
 * larger capacity.  This method is called automatically when the
 * number of keys in this map reaches its threshold.
 *
 * If current capacity is MAXIMUM_CAPACITY, this method does not
 * resize the map, but sets threshold to Integer.MAX_VALUE.
 * This has the effect of preventing future calls.
 *
 * @param newCapacity the new capacity, MUST be a power of two;
 *        must be greater than current capacity unless current
 *        capacity is MAXIMUM_CAPACITY (in which case value
 *        is irrelevant).
 */
void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable. length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer. MAX_VALUE;
        return;
    }

    Entry[] newTable = new Entry[newCapacity];
    boolean oldAltHashing = useAltHashing;
    useAltHashing |= sun.misc.VM. isBooted() &&
            (newCapacity >= Holder. ALTERNATIVE_HASHING_THRESHOLD);
    boolean rehash = oldAltHashing ^ useAltHashing;
    transfer(newTable, rehash);
    table = newTable;
    threshold = (int)Math.min(newCapacity * loadFactor , MAXIMUM_CAPACITY + 1);
}

resize操作先要判斷當前table的長度是不是已經等於最大容量（1<<30）了，如果是則把閾值調到整數的最大值（(1<<31) - 1），就沒有再拓展table的必要了。如果沒有到達最大容量，就要生成一個新的空數組，長度是原來的兩倍。這時候可能要問了，如果oldTable. length不等於MAXIMUM_CAPACITY，但是（2 * oldTable. length）也就是newCapacity大於MAXIMUM_CAPACITY怎麼辦？這個是不可能的，因爲數組長度是2的N次方，而MAXIMUM_CAPACITY = 1<<30。
生成新的數組後要執行transfer函數。

/**
 * Transfers all entries from current table to newTable.
 */
void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable. length;
    for (Entry<K,V> e : table) {
        while( null != e) {
            Entry<K,V> next = e. next;
            if ( rehash) {
                e. hash = null == e. key ? 0 : hash(e. key);
            }
            int i = indexFor(e.hash, newCapacity);
            e. next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

這個函數要做的就是把原來table中的值挨個拿出來插到新數組中，由於數組長度發生了改變，所以元素的位置肯定發生變化，所以HashMap不能保證該順序恆久不變。回到resize函數，這時新的數組已經生成了，只需要替換原來數組就好了。並且要更新一下閾值。可以看出來resize是個比較消耗資源的函數，所以能減少resize的次數就儘量減少。

回到函數addEntry 中，判斷完是不是需要resize後就需要創建一個新的Entry了。

/**
 * Like addEntry except that this version is used when creating entries
 * as part of Map construction or "pseudo -construction" (cloning,
 * deserialization).  This version needn't worry about resizing the table.
 *
 * Subclass overrides this to alter the behavior of HashMap(Map),
 * clone, and readObject.
 */
void createEntry( int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

調用createEntry函數，參數跟addEntry一樣，第一個是key的hash值，第二個第三個分別是key和value，最後一個是這個key在table中的位置。這裏的操作與Entry的構造函數有關係。

/**
 * Creates new entry.
 */
Entry (int h, K k, V v, Entry<K,V> n) {
    value = v;
    next = n;
    key = k;
    hash = h;
}

構造函數中傳入一個Entry對象，並把它當做這個新生成的Entry的next。所以createEntry函數中的操作相當於把table[bucketIndex]上的鏈表拿下來，放在新的Entry後面，然後再把新的Entry放到table[bucketIndex]上。

到這裏整個put函數算是結束了。如果新插入的K，V則會返回null。

4.get操作

/**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 *
 * <p>More formally, if this map contains a mapping from a key
 * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
 * key.equals(k))}, then this method returns {@code v}; otherwise
 * it returns {@code null}.  (There can be at most one such mapping.)
 *
 * <p>A return value of {@code null} does not <i>necessarily </i>
 * indicate that the map contains no mapping for the key; it's also
 * possible that the map explicitly maps the key to {@code null}.
 * The {@link #containsKey containsKey} operation may be used to
 * distinguish these two cases.
 *
 * @see #put(Object, Object)
 */
public V get(Object key) {
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

也是先判斷key是不是null，做特殊處理。直接上代碼，不贅述。

/**
 * Offloaded version of get() to look up null keys.  Null keys map
 * to index 0.  This null case is split out into separate methods
 * for the sake of performance in the two most commonly used
 * operations (get and put), but incorporated with conditionals in
 * others.
 */
private V getForNullKey() {
    for (Entry<K,V> e = table[0]; e != null; e = e. next) {
        if (e. key == null)
            return e. value;
    }
    return null;
}

key不是null則會調用getEntry函數，並返回一個Entry對象，如果不是null，就返回entry的value。

/**
 * Returns the entry associated with the specified key in the
 * HashMap.  Returns null if the HashMap contains no mapping
 * for the key.
 */
final Entry<K,V> getEntry(Object key) {
    int hash = (key == null) ? 0 : hash(key);
    for (Entry<K,V> e = table[ indexFor(hash, table.length)];
         e != null;
         e = e. next) {
        Object k;
        if (e. hash == hash &&
            ((k = e. key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

直接求key值hash值，然後求table中的位置，遍歷鏈表。有返回entry對象，沒有返回null。

5. Fail-Fast機制

/**
 * The number of times this HashMap has been structurally modified
 * Structural modifications are those that change the number of mappings in
 * the HashMap or otherwise modify its internal structure (e.g.,
 * rehash).  This field is used to make iterators on Collection-views of
 * the HashMap fail -fast.  (See ConcurrentModificationException).
 */
transient int modCount;

我們知道java.util.HashMap不是線程安全的，因此如果在使用迭代器的過程中有其他線程修改了map，那麼將拋出ConcurrentModificationException，這就是所謂fail-fast策略。

這一策略在源碼中的實現是通過modCount域，保證線程之間修改的可見性。，modCount顧名思義就是修改次數，對HashMap內容的修改都將增加這個值，那麼在迭代器初始化過程中會將這個值賦給迭代器的expectedModCount。

注意，迭代器的快速失敗行爲不能得到保證，一般來說，存在非同步的併發修改時，不可能作出任何堅決的保證。快速失敗迭代器盡最大努力拋出 ConcurrentModificationException。因此，編寫依賴於此異常的程序的做法是錯誤的，正確做法是：迭代器的快速失敗行爲應該僅用於檢測程序錯誤。

cwpwenwen888

發佈了28 篇原創文章 · 獲贊 1 · 訪問量 4萬+

私信關注

HashMap源碼分析

1. HashMap的數據結構：

2.構造函數

3.put操作

4.get操作

5. Fail-Fast機制

github下載別人demo源碼

Android面試必問的Listview getview方法問題

2016-Android面試題自我總結

給 App 提速：Android 性能優化總結

百度安卓面試

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結