java基礎之HashSet、HashMap詳解

我們都知道java集合中有兩個重要的對象HashSet和HashMap,爲什麼處於這麼重要的位置呢,首先set集合中我們存放的是一個沒有重複對象的集合,這給我們編程提供了非常方便的操作,我們不用擔心set集合中會有兩個重複的對象,但是也會有缺點,我們遍歷會存在一定麻煩;然後就是我們的map,我們的map存放的key-value的形式了,跟我們對象中的屬性和屬性值類似的東西,那不知道大多數朋友們知道其中的原理了嗎?我們今天通過源碼來分析下我們的set和map集合。

既然使用,我們還是從初始化開始(Set set = new HashSet()),然後add(new Object());

    //HashSet中維護着一個map??
    private transient HashMap<E,Object> map;

    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();

    /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * default initial capacity (16) and load factor (0.75).
     */
    public HashSet() {
        map = new HashMap<>();
    }

    /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element <tt>e</tt> to this set if
     * this set contains no element <tt>e2</tt> such that
     * <tt>(e==null&nbsp;?&nbsp;e2==null&nbsp;:&nbsp;e.equals(e2))</tt>.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns <tt>false</tt>.
     *
     * @param e element to be added to this set
     * @return <tt>true</tt> if this set did not already contain the specified
     * element
     */
    //add方法也是調用的map對象中的put方法(e作爲key,PRESENT = new Object()作爲value)
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

我們可以看到我們初始化Set時會同時初始化了map,我們add添加對象時,調用的map對象中的put方法,因爲我們的map是key爲唯一的key-value形式,所以我們的hashSet就是依照這個關係來保證了對象唯一的。所以我們重點去看下我們的hashMap的put方法如何保證了key的唯一性。

下面是map初始化時初始化的部分代碼,除了賦予loadFactor值外,其他都爲默認、table爲null、entySet爲null等

    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    //map維護的Node數組
    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     */
    //map存放的key對應的set集合
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     */
    //map集合的大小
    transient int size;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    //修改此map的次數
    transient int modCount;

    /**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;   


     /**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    //初始化時,只初始化了loadFactor,其他都爲默認
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

初始化之後是我們的put方法。拿我們的set.add()方法去看,我們重點看我們key是如何保證唯一的。

    /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element <tt>e</tt> to this set if
     * this set contains no element <tt>e2</tt> such that
     * <tt>(e==null&nbsp;?&nbsp;e2==null&nbsp;:&nbsp;e.equals(e2))</tt>.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns <tt>false</tt>.
     *
     * @param e element to be added to this set
     * @return <tt>true</tt> if this set did not already contain the specified
     * element
     */
    //set中的add方法,很明顯我們可以看到調用的是map.put方法
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

    //=====上面是set中的代碼,下面是map中的方法,貼到一塊了============

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    //map插入鍵值對執行的方法
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    /**
     * Computes key.hashCode() and spreads (XORs) higher bits of hash
     * to lower.  Because the table uses power-of-two masking, sets of
     * hashes that vary only in bits above the current mask will
     * always collide. (Among known examples are sets of Float keys
     * holding consecutive whole numbers in small tables.)  So we
     * apply a transform that spreads the impact of higher bits
     * downward. There is a tradeoff between speed, utility, and
     * quality of bit-spreading. Because many common sets of hashes
     * are already reasonably distributed (so don't benefit from
     * spreading), and because we use trees to handle large sets of
     * collisions in bins, we just XOR some shifted bits in the
     * cheapest possible way to reduce systematic lossage, as well as
     * to incorporate impact of the highest bits that would otherwise
     * never be used in index calculations because of table bounds.
     */

    //調用key.hashCode()方法 並且於該值無符號右移16位 異或取值
    /**
     *曾經我一度想知道這個值是多少,但是發現就算知道也沒什麼用
     *我們只需要知道同一個對象調用這個方法之後,返回的int值是一樣的就夠了
     */
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

    /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    //這是我們要看的重點方法了
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

在分析putVal方法之前,我先去網上找了一張map的結構圖,這樣更清晰一點:

可以看到我們的map的結構是數組單鏈表共同組合成的一種結構,數組也就是我們的table,然而我們的table的類型是Node類型的節點

Node結構:我們可以清楚的看到其中存放的有一個hash,key,value還有一個關鍵的next指向鏈接起來的後面的節點(從而構成了單鏈表) 

    /**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */
    //這個是我們的Node節點的結構,我們可以清楚的看到next指向的就是下個節點,從而構成單鏈表
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

 我們現在逐句來看下我們的putVal方法都進行了什麼操作:

首次插入初始化table數組

    //聲明變量沒什麼看的
    Node<K,V>[] tab; Node<K,V> p; int n, i;

    //首先判斷該table數組是否爲null,首次插入key-value時table爲null
    if ((tab = table) == null || (n = tab.length) == 0)

        //初始化我們的table,默認長度爲16,talbe = (Node<K,V>[])new Node[16]
        //感興趣的朋友可以去看下resize方法詳細步驟
         n = (tab = resize()).length;

判斷table數組對應位置是否爲null,如果爲null,插入該位置對應單鏈表的首個值

 我們的n爲16,一個int的hash值與(16 - 1)進行&操作不管是什麼值,我們只需要知道肯定是個不大於15的值就行,爲什麼是15?(因爲我們剛纔初始化table數組時長度爲16(下標0-15),所以我們保證不超過這個值就行,不是很明白的可以回過頭看map的結構圖)

舉例:比如我們插入的key的hash爲0000 1000,這個數與15進行&操作

tab[8] == null?我們第一次插入值肯定爲null啊,毋庸置疑,此時我們new了一個next->null的Node節點賦給了tab[8],有人說不是從下標0開始的嗎?map沒有規定一定要從下標爲0的開始,只要是0-15隨意

       if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);

執行完if語句之後,此時我們的table就有鍵值對了,tab在下標爲8的位置有一個只有一個key-value(我們拿set做例子,自然key就是我們的key,但是value只是一個new Object而已,下面的例子都是這樣的key-value不做解釋了)節點的單鏈表

 第二次插入key-value

我們接着看我們的else語句,假設我們第二次插入的key-value,並且同樣也插入table下標爲8位置:

    else {
            Node<K,V> e; K k;

            //我們從if條件中可以看到p節點即爲table下標8位置的節點(也就是我們首次插入的節點Node)
            //我們可以看到這個判斷條件是去比較將要插入的key於p節點(也就是該位置的單鏈表的第一個節點)的key的hash是否相同
            //如果相同則將p賦予e,否則看我們的else
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;

            //這個條件可以忽略,有興趣的朋友可以研究下TreeNode和Node的區別
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);

            //我們直接來看如果key的hash不相同的情況
            else {

                //我們可以看到此循環的目的是拿到node.next->null的節點(也就是talbe下標爲8的單鏈表的最後一個節點)
                for (int binCount = 0; ; ++binCount) {
                    //如果循環結束直接鏈接在最後
                    if ((e = p.next) == null) {

                        //找到node.next -> null的節點是將next-> 新夠成的節點newNode
                        //構成單鏈表
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    //否則就判斷此節點於目標節點(new節點)key是否相同(hash是否相同)
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }

總結一下上面的代碼:

首先我們先判斷此單鏈表的第一個節點是否與目標節點(新節點)的key是同對象(也就是hash是否相同),如果相同拿出此節點賦予e變量,如果不相同,則循環單鏈表,如果有key相同的拿出相同的節點賦予e變量,如果循環結束都沒有則鏈接在此鏈表的最後。

也就是如果此鏈表有key相同的將鏈表中已有的節點拿出來,沒有key相同的鏈接在此鏈表最後

拿到key對象相同的node節點,次節點可能爲null(不存在key相同的node),替換老value值

    //這個e變量也就是上面拿到的重複key的node,如果沒有則爲null
    //如果e不爲null,新的value值將覆蓋oldvalue,最終返回
    if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }

上面代碼就是如果e不爲null,說明存在key相同的鍵,新value替換老value並返回

        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;

最後這些就沒什麼重要的了,如果沒有重複的key,則返回null。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章