Java基礎知識總結 ——HashMap源碼

一、HashMap的架構
HashMap底層主要是通過數組+鏈表+紅黑樹實現的,當鏈表的長度大於等於8的時候,鏈表會轉化爲紅黑樹,當紅黑樹的大小小於等於6時,紅黑樹會轉化爲鏈表。

二、常見屬性

	/*
     * 類註釋
     * 1、不同於HashTable,允許key值和value爲null,線程不安全。
     * 2、loadFactor(影響因子)默認值是0.75,是均衡了時間和空間損耗算出來的值,較高的值會減少空間開銷(擴容減少,數組大小增長速度變慢),但增加了查找成本(hash 衝突增加,鏈表長度變長),不擴容的條件:數組容量 > 需要的數組大小 /load factor。
     * 3、如果有很多數據需要儲存到 HashMap 中,建議 HashMap 的容量一開始就設置成足夠的大小,這樣可以防止在其過程中不斷的擴容,影響性能。
     * 4、HashMap 是非線程安全的,我們可以自己在外部加鎖,或者通過 Collections#synchronizedMap 來實現線程安全,Collections#synchronizedMap 的實現是在每個方法上加上了 synchronized 鎖。
     * 5、在迭代過程中,如果 HashMap 的結構被修改,會快速失敗。
     */
	private static final long serialVersionUID = 362498820763181265L;

    /*
     * Implementation notes.
     *
     * This map usually acts as a binned (bucketed) hash table, but
     * when bins get too large, they are transformed into bins of
     * TreeNodes, each structured similarly to those in
     * java.util.TreeMap. Most methods try to use normal bins, but
     * relay to TreeNode methods when applicable (simply by checking
     * instanceof a node).  Bins of TreeNodes may be traversed and
     * used like any others, but additionally support faster lookup
     * when overpopulated. However, since the vast majority of bins in
     * normal use are not overpopulated, checking for existence of
     * tree bins may be delayed in the course of table methods.
     *
     * Tree bins (i.e., bins whose elements are all TreeNodes) are
     * ordered primarily by hashCode, but in the case of ties, if two
     * elements are of the same "class C implements Comparable<C>",
     * type then their compareTo method is used for ordering. (We
     * conservatively check generic types via reflection to validate
     * this -- see method comparableClassFor).  The added complexity
     * of tree bins is worthwhile in providing worst-case O(log n)
     * operations when keys either have distinct hashes or are
     * orderable, Thus, performance degrades gracefully under
     * accidental or malicious usages in which hashCode() methods
     * return values that are poorly distributed, as well as those in
     * which many keys share a hashCode, so long as they are also
     * Comparable. (If neither of these apply, we may waste about a
     * factor of two in time and space compared to taking no
     * precautions. But the only known cases stem from poor user
     * programming practices that are already so slow that this makes
     * little difference.)
     *
     * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million
     *
     * The root of a tree bin is normally its first node.  However,
     * sometimes (currently only upon Iterator.remove), the root might
     * be elsewhere, but can be recovered following parent links
     * (method TreeNode.root()).
     *
     * All applicable internal methods accept a hash code as an
     * argument (as normally supplied from a public method), allowing
     * them to call each other without recomputing user hashCodes.
     * Most internal methods also accept a "tab" argument, that is
     * normally the current table, but may be a new or old one when
     * resizing or converting.
     *
     * When bin lists are treeified, split, or untreeified, we keep
     * them in the same relative access/traversal order (i.e., field
     * Node.next) to better preserve locality, and to slightly
     * simplify handling of splits and traversals that invoke
     * iterator.remove. When using comparators on insertion, to keep a
     * total ordering (or as close as is required here) across
     * rebalancings, we compare classes and identityHashCodes as
     * tie-breakers.
     *
     * The use and transitions among plain vs tree modes is
     * complicated by the existence of subclass LinkedHashMap. See
     * below for hook methods defined to be invoked upon insertion,
     * removal and access that allow LinkedHashMap internals to
     * otherwise remain independent of these mechanics. (This also
     * requires that a map instance be passed to some utility methods
     * that may create new nodes.)
     *
     * The concurrent-programming-like SSA-based coding style helps
     * avoid aliasing errors amid all of the twisty pointer operations.
     */

    /**
     * The default initial capacity - MUST be a power of two.
     * 初始容量,默認16
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     * 最大容量
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     * 默認負載因子值
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     * 鏈表長度大於等於8時,鏈表轉化爲紅黑樹
     * 爲什麼爲8?
     * 答:鏈表查詢的時間複雜度是 O (n),紅黑樹的查詢複雜度是 O (log (n))。在鏈表數據不多的時候,使用鏈表進行遍歷也比較快,只有當鏈表數據比較多的時候,纔會轉化成紅黑樹,但紅黑樹需要的佔用空間是鏈表的 2 倍,考慮到轉化時間和空間損耗,所以我們需要定義出轉化的邊界值。
     * 在考慮設計 8 這個值的時候,參考了泊松分佈概率函數,由泊松分佈中得出結論。當鏈表的長度是 8 的時候,出現的概率是 0.00000006,不到千萬分之一,所以說正常情況下,鏈表的長度不可能到達 8 ,而一旦到達 8 時,肯定是 hash 算法出了問題,所以在這種情況下,爲了讓 HashMap 仍然有較高的查詢性能,所以讓鏈表轉化成紅黑樹,我們正常寫代碼,使用 HashMap 時,幾乎不會碰到鏈表轉化成紅黑樹的情況,畢竟概念只有千萬分之一。
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     * 紅黑樹大小小於等於6時,紅黑樹轉化爲鏈表
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     * 當數組容量大於 64 時,鏈表纔會轉化成紅黑樹
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

	/* ---------------- Fields -------------- */

    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     * 存放數據的數組
     */
    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     */
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     * HashMap 的實際大小
     */
    transient int size;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     * 版本號
     */
    transient int modCount;

    /**
     * The next size value at which to resize (capacity * load factor).
     * 
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    // 擴容的門檻,有兩種情況
 	// 1、如果初始化時,給定數組大小的話,通過 tableSizeFor 方法計算,數組大小永遠接近於 2 的冪次方,比如你給定初始化大小 19,實際上初始化大小爲 32,爲 2 的 5 次方。
 	// 2、如果是通過 resize 方法進行擴容,大小 = 數組容量 * 0.75
    int threshold;

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;
     //鏈表的節點
 	static class Node<K,V> implements Map.Entry<K,V> {}
 
 	//紅黑樹的節點
 	static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {}

三、添加

	/**
     * Implements Map.put and related methods.
     * 添加流程
     * 1、空數組有無初始化,沒有的話初始化
     * 2、如果通過 key 的 hash 能夠直接找到值,跳轉到 6,否則到 3
     * 3、如果 hash 衝突,兩種解決方案:鏈表 or 紅黑樹
     * 4、如果是鏈表,遞歸循環,把新元素追加到隊尾
     * 5、如果是紅黑樹,調用紅黑樹新增的方法
     * 6、通過 2、4、5 將新元素追加成功,再根據 onlyIfAbsent 判斷是否需要覆蓋
     * 7、判斷是否需要擴容,需要擴容進行擴容,結束
     * @param hash hash for key(key的哈希值)
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value(如果爲true不改變已經存在的value值,默認是false)
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
         //如果數組爲空,使用 resize 方法初始化
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        //(n - 1) & hash計算是將hash映射到0到n-1之間
        //如果當前索引位置的值爲空,直接生成新的節點在當前索引位置上
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        // 如果當前索引位置有值的處理方法,即我們常說的如何解決 hash 衝突
        else {
            Node<K,V> e; K k;
            //如果第一個點的key 的 hash 和值都相等,直接把當前下標位置的 Node 值賦值給臨時變量
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            //如果第一個點的key 的 hash 和值並不是都相等,再判斷是紅黑樹還是鏈表,分別去判斷並新增
            //如果是紅黑樹
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            //如果是鏈表
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    //這裏使用的equals方法判斷是否相同
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                //當 onlyIfAbsent 爲 false 時,纔會覆蓋值 
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        //版本號+1
        ++modCount;
        //如果 HashMap 的實際大小大於擴容的門檻,開始擴容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

四、紅黑樹新增節點

final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] tab,
                                       int h, K k, V v) {
            Class<?> kc = null;
            boolean searched = false;
            TreeNode<K,V> root = (parent != null) ? root() : this;
            for (TreeNode<K,V> p = root;;) {
                int dir, ph; K pk;
                if ((ph = p.hash) > h)
                    dir = -1;
                else if (ph < h)
                    dir = 1;
                else if ((pk = p.key) == k || (k != null && k.equals(pk)))
                    return p;
                else if ((kc == null &&
                          (kc = comparableClassFor(k)) == null) ||
                         (dir = compareComparables(kc, k, pk)) == 0) {
                    if (!searched) {
                        TreeNode<K,V> q, ch;
                        searched = true;
                        if (((ch = p.left) != null &&
                             (q = ch.find(h, k, kc)) != null) ||
                            ((ch = p.right) != null &&
                             (q = ch.find(h, k, kc)) != null))
                            return q;
                    }
                    dir = tieBreakOrder(k, pk);
                }

                TreeNode<K,V> xp = p;
                if ((p = (dir <= 0) ? p.left : p.right) == null) {
                    Node<K,V> xpn = xp.next;
                    TreeNode<K,V> x = map.newTreeNode(h, k, v, xpn);
                    if (dir <= 0)
                        xp.left = x;
                    else
                        xp.right = x;
                    xp.next = x;
                    x.parent = x.prev = xp;
                    if (xpn != null)
                        ((TreeNode<K,V>)xpn).prev = x;
                    moveRootToFront(tab, balanceInsertion(root, x));
                    return null;
                }
            }
        }

五、查找
查找的代碼的過程可以類比添加,主要有以下幾個步驟
1、根據 hash 算法定位數組的索引位置,equals 判斷當前節點是否是我們需要尋找的 key,是的話直接返回,不是的話往下。
2、判斷當前節點有無 next 節點,有的話判斷是鏈表類型,還是紅黑樹類型。
3、分別走鏈表和紅黑樹不同類型的查找方法。

其中紅黑樹的查找思路如下:
1、從根節點遞歸查找;
2、根據 hashcode,比較查找節點,左邊節點,右邊節點之間的大小,根本紅黑樹左小右大的特性進行判斷;
3、判斷查找節點在第 2 步有無定位節點位置,有的話返回,沒有的話重複 2,3 兩步;
4、一直自旋到定位到節點位置爲止。
如果紅黑樹比較平衡的話,每次查找的次數就是樹的深度。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章