HashSet源碼解析&Map迭代器給jdk寫註釋系列之jdk1.6容器(6)

今天的主角是HashSet，Set是什麼東東，當然也是一種java容器了。

那麼今天的HashSet它又是怎麼一回事的，他的存在又是爲了解決什麼問題呢？

先來看下Set的特點：Set元素無順序，且元素不可以重複。。想到了什麼？無順序，由於散列的緣故；不可重複，HashMap的key就是不能重複的。是的，你有想對了。HashSet就是基於HashMap的key來實現的，整個HashSet中基本所有方法都是調用的HashMap的方法。利用HashMap可以實現兩個賣點：1.不可重複，2.快速查找（contains）。

一起來看下吧：

1.定義

public class HashSet<E>
    extends AbstractSet<E>
    implements Set<E>, Cloneable, java.io.Serializable

我們看到HashSet繼承了AbstractSet抽象類，並實現了Set、Cloneable、Serializable接口。AbstractSet是一個抽象類，對一些基礎的set操作進行封裝。繼續來看下Set接口的定義：

public interface Set<E> extends Collection<E> {
    // Query Operations
    int size();
    boolean isEmpty();
    boolean contains(Object o);
    Iterator<E> iterator();
    Object[] toArray();
    <T> T[] toArray(T[] a);
    // Modification Operations
    boolean add(E e);
    boolean remove(Object o);
    // Bulk Operations
    boolean containsAll(Collection<?> c);
    boolean addAll(Collection<? extends E> c);
    boolean retainAll(Collection<?> c);
    boolean removeAll(Collection<?> c);
    void clear();
    // Comparison and hashing
    boolean equals(Object o);
    int hashCode();
}

發現了什麼，Set接口和java.util.List接口一樣也實現了Collection接口，但是Set和List所不同的是，Set沒有get等跟下標先關的一些操作方法，那怎麼取值呢？Iterator還記得嗎，使用迭代器對不對。（不明白的回去看Iterator講解）

2.底層存儲

// 底層使用HashMap來保存HashSet的元素
    private transient HashMap<E,Object> map;
 
    // Dummy value to associate with an Object in the backing Map
    // 由於Set只使用到了HashMap的key，所以此處定義一個靜態的常量Object類，來充當HashMap的value
    private static final Object PRESENT = new Object();

看到這裏就明白了，和我們前面說的一樣，HashSet是用HashMap來保存數據，而主要使用到的就是HashMap的key。

看到private static final Object PRESENT = new Object();不知道你有沒有一點疑問呢。這裏使用一個靜態的常量Object類來充當HashMap的value，既然這裏map的value是沒有意義的，爲什麼不直接使用null值來充當value呢？比如寫成這樣子privatefinal Object PRESENT = null;我們都知道的是，Java首先將變量PRESENT分配在棧空間，而將new出來的Object分配到堆空間，這裏的new Object()是佔用堆內存的（一個空的Object對象佔用8byte），而null值我們知道，是不會在堆空間分配內存的。那麼想一想這裏爲什麼不使用null值。想到什麼嗎，看一個異常類java.lang.NullPointerException，噢買尬，這絕對是Java程序員的一個噩夢，這是所有Java程序猿都會遇到的一個異常，你看到這個異常你以爲很好解決，但是有些時候也不是那麼容易解決，Java號稱沒有指針，但是處處碰到NullPointerException。所以啊，爲了從根源上避免NullPointerException的出現，浪費8個byte又怎麼樣，在下面的代碼中我再也不會寫這樣的代碼啦if (xxx == null) { … } else {….}，好爽。

3.構造方法

/**
     * 使用HashMap的默認容量大小16和默認加載因子0.75初始化map，構造一個HashSet
     */
    public HashSet() {
        map = new HashMap<E,Object>();
    }
 
    /**
     * 構造一個指定Collection參數的HashSet，這裏不僅僅是Set，只要實現Collection接口的容器都可以
     */
    public HashSet(Collection<? extends E> c) {
        map = new HashMap<E,Object>(Math. max((int) (c.size()/.75f) + 1, 16));
       // 使用Collection實現的Iterator迭代器，將集合c的元素一個個加入HashSet中
       addAll(c);
    }
 
    /**
     * 使用指定的初始容量大小和加載因子初始化map，構造一個HashSet
     */
    public HashSet( int initialCapacity, float loadFactor) {
        map = new HashMap<E,Object>(initialCapacity, loadFactor);
    }
 
    /**
     * 使用指定的初始容量大小和默認的加載因子0.75初始化map，構造一個HashSet
     */
    public HashSet( int initialCapacity) {
        map = new HashMap<E,Object>(initialCapacity);
    }
 
    /**
     * 不對外公開的一個構造方法（默認default修飾），底層構造的是LinkedHashMap，dummy只是一個標示參數，無具體意義
     */
    HashSet( int initialCapacity, float loadFactor, boolean dummy) {
        map = new LinkedHashMap<E,Object>(initialCapacity, loadFactor);
}

從構造方法可以很輕鬆的看出，HashSet的底層是一個HashMap，理解了HashMap後，這裏沒什麼可說的。只有最後一個構造方法有寫區別，這裏構造的是LinkedHashMap，該方法不對外公開，實際上是提供給LinkedHashSet使用的，而第三個參數dummy是無意義的，只是爲了區分其他構造方法。

4.增加和刪除

 /**
     * 利用HashMap的put方法實現add方法
     */
    public boolean add(E e) {
        return map .put(e, PRESENT)== null;
    }
 
    /**
     * 利用HashMap的remove方法實現remove方法
     */
    public boolean remove(Object o) {
        return map .remove(o)==PRESENT;
    }
 
    /**
     * 添加一個集合到HashSet中，該方法在AbstractCollection中
     */
    public boolean addAll(Collection<? extends E> c) {
        boolean modified = false;
       // 取得集合c迭代器Iterator
       Iterator<? extends E> e = c.iterator();
       // 遍歷迭代器
        while (e.hasNext()) {
           // 將集合c的每個元素加入到HashSet中
           if (add(e.next()))
              modified = true;
       }
        return modified;
    }
 
    /**
     * 刪除指定集合c中的所有元素，該方法在AbstractSet中
     */
    public boolean removeAll(Collection<?> c) {
        boolean modified = false;
 
        // 判斷當前HashSet元素個數和指定集合c的元素個數，目的是減少遍歷次數
        if (size() > c.size()) {
            // 如果當前HashSet元素多，則遍歷集合c，將集合c中的元素一個個刪除
            for (Iterator<?> i = c.iterator(); i.hasNext(); )
                modified |= remove(i.next());
        } else {
            // 如果集合c元素多，則遍歷當前HashSet，將集合c中包含的元素一個個刪除
            for (Iterator<?> i = iterator(); i.hasNext(); ) {
                if (c.contains(i.next())) {
                    i.remove();
                    modified = true;
                }
            }
        }
        return modified;

5.是否包含

/**
     * 利用HashMap的containsKey方法實現contains方法
     */
    public boolean contains(Object o) {
        return map .containsKey(o);
    }
 
    /**
     * 檢查是否包含指定集合中所有元素，該方法在AbstractCollection中
     */
    public boolean containsAll(Collection<?> c) {
       // 取得集合c的迭代器Iterator
       Iterator<?> e = c.iterator();
       // 遍歷迭代器，只要集合c中有一個元素不屬於當前HashSet，則返回false
        while (e.hasNext())
           if (!contains(e.next()))
               return false;
        return true;

　　由於HashMap基於hash表實現，hash表實現的容器最重要的一點就是可以快速存取，那麼HashSet對於contains方法，利用HashMap的containsKey方法，效率是非常之快的。在我看來，這個方法也是HashSet最核心的賣點方法之一。

6.容量檢查

/**
     * Returns the number of elements in this set (its cardinality).
     *
     * @return the number of elements in this set (its cardinality)
     */
    public int size() {
        return map .size();
    }
 
    /**
     * Returns <tt>true</tt> if this set contains no elements.
     *
     * @return <tt> true</tt> if this set contains no elements
     */
    public boolean isEmpty() {
        return map .isEmpty();
    }

　　以上代碼都很簡單，因爲基本都是基於HashMap實現，只要理解了HashMap，HashSet理解起來真的是小菜一碟了。

那麼HashSet就結束了。。。等等，不對還有一個東西，那就是迭代器，在HashMap和LinkedHashMap中都說過，這兩個的迭代器實現都要依賴Set接口，下面就讓我們先看下HashSet的迭代器吧。

7.迭代器

7.1 HashMap的迭代器

在《Iterator設計模式》中，我們分析了，實現Iterator迭代器的幾個角色，並且自己簡單實現了一個。而且我們看到Collection實現了Iterable接口，並且要求其子類實現一個返回Iterator接口的iterator()方法。那麼既然HashSet是Collection的孫子類，那麼HashSet也應該實現了一個返回Iterator接口的iterator()方法，對不對，我們去看看。

/**
     * Returns an iterator over the elements in this set.  The elements
     * are returned in no particular order.
     *
     * @return an Iterator over the elements in this set
     * @see ConcurrentModificationException
     */
    public Iterator<E> iterator() {
        return map .keySet().iterator();
    }

我cha，咋回事，HashSet的iterator()方法竟然也是利用HashMap實現的，我們去看看HashMap的keySet()方法是什麼鬼。

public Set<K> keySet() {
        Set<K> ks = keySet;
        return (ks != null ? ks : (keySet = new KeySet()));
}

HashMap的keySet()方法的返回值竟然是一個Set，具體實現是一個叫KeySet的東東，KeySet又是什麼鬼。

private final class KeySet extends AbstractSet<K> {
        public Iterator<K> iterator() {
            return newKeyIterator();
        }
        public int size() {
            return size ;
        }
        public boolean contains(Object o) {
            return containsKey(o);
        }
        public boolean remove(Object o) {
            return HashMap.this.removeEntryForKey(o) != null;
        }
        public void clear() {
            HashMap. this.clear();
        }
}

哦，KeySet是一個實現了AbstractSet的HashMap的內部類。而KeySet的iterator()方法返回的是一個newKeyIterator()方法，好繞好繞，頭暈了。

Iterator<K> newKeyIterator()   {
        return new KeyIterator();
}

newKeyIterator()方法返回的又是一個KeyIterator()方法，what are you 弄啥嘞？

private final class KeyIterator extends HashIterator<K> {
        public K next() {
            return nextEntry().getKey();
        }
}

　　好吧，不想說什麼了，繼續往下看吧。

private abstract class HashIterator<E> implements Iterator<E> {
        // 下一個需要返回的節點
        Entry<K,V> next;   // next entry to return
        int expectedModCount ;     // For fast-fail
        int index ;          // current slot
        // 當前需要返回的節點
        Entry<K,V> current;// current entry
 
        HashIterator() {
            expectedModCount = modCount ;
            if (size > 0) { // advance to first entry
                Entry[] t = table;
               // 初始化next參數，將next賦值爲HashMap底層的第一個不爲null節點
                while (index < t.length && ( next = t[index ++]) == null)
                    ;
            }
        }
 
        public final boolean hasNext() {
            return next != null;
        }
 
        final Entry<K,V> nextEntry() {
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            // 取得HashMap底層數組中鏈表的一個節點
            Entry<K,V> e = next;
            if (e == null)
                throw new NoSuchElementException();
 
            // 將next指向下一個節點，並判斷是否爲null
            if ((next = e.next) == null) {
                Entry[] t = table;
                // 如果爲null，則遍歷真個數組，知道取得一個不爲null的節點
                while (index < t.length && ( next = t[index ++]) == null)
                    ;
            }
           current = e;
           // 返回當前節點
            return e;
        }
 
        public void remove() {
            if (current == null)
                throw new IllegalStateException();
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            Object k = current.key ;
            current = null;
            HashMap. this.removeEntryForKey(k);
            expectedModCount = modCount ;
        }
 
}

　　最終找到了HashIterator這個類（也是HashMap的內部類），好累。。。主要看下nextEntry()這個方法，該方法主要思路是，首選拿去HashMap低層數組中第一個不爲null的節點，每次調用迭代器的next()方法，就用該節點next一下，噹噹前節點next到最後爲null，就拿數組中下一個不爲null的節點繼續遍歷。什麼意思呢，就是循環從數組第一個索引開始，遍歷整個Hash表。

至於你問我Iterator實現起來本來挺容易的一件事，爲什麼HashMap搞得這麼複雜，我只想說不要問我，我也不知道。。。

當然map是一個k-v鍵值對的容器，除了有對key的迭代keySet()，當然還有對value的迭代values（爲什麼value的迭代不是返回Set，因爲value是可以重複的嘛），還有對整個鍵值對k-v的迭代entrySet()，和上面的代碼都是一個原理，這裏就不多講了。

7.2 LinkedHashMap的迭代器

看完HashMap的Iterator實現，再來看下LinkedHashMap是怎麼實現的吧（不從頭開始找了，直接看最核心代碼吧）。

private abstract class LinkedHashIterator<T> implements Iterator<T> {
       // header.after爲LinkedHashMap雙向鏈表的第一個節點，因爲LinkedHashMap的header節點不保存數據
       Entry<K,V> nextEntry    = header .after;
       // 最後一次返回的節點
       Entry<K,V> lastReturned = null;
 
        /**
        * The modCount value that the iterator believes that the backing
        * List should have.  If this expectation is violated, the iterator
        * has detected concurrent modification.
        */
        int expectedModCount = modCount;
 
        public boolean hasNext() {
            return nextEntry != header;
       }
 
        public void remove() {
           if (lastReturned == null)
               throw new IllegalStateException();
           if (modCount != expectedModCount)
               throw new ConcurrentModificationException();
 
            LinkedHashMap. this.remove(lastReturned .key);
            lastReturned = null;
            expectedModCount = modCount ;
       }
 
       Entry<K,V> nextEntry() {
           if (modCount != expectedModCount)
               throw new ConcurrentModificationException();
            if (nextEntry == header)
                throw new NoSuchElementException();
 
            // 將要返回的節點nextEntry賦值給lastReturned
            // 將nextEntry賦值給臨時變量e（因爲接下來nextEntry要指向下一個節點）
            Entry<K,V> e = lastReturned = nextEntry ;
            // 將nextEntry指向下一個節點
            nextEntry = e.after ;
            // 放回當前需返回的節點
            return e;
       }
}

　　可以看出LinkedHashMap的迭代器，不在遍歷真個Hash表，而只是遍歷其自身維護的雙向循環鏈表，這樣就不在需要對數組中是否爲空節點進行的判斷。所以說LinkedHashMap在迭代器上的效率面通常是高與HashMap的，既然這裏是通常，那麼什麼時候不通常呢，那就是HashMap中元素較少，分佈均勻，沒有空節點的時候。

Map的迭代器源碼讀起來比較不太容易懂（主要是各種調用，各種內部類，核心代碼不好找），但是找到核心代碼後，邏輯原理也就很容易看懂了，當然前提是建立在瞭解了HashMap和LinkedHashMap的底層存儲結構。

額，這一篇確實是講HashSet的，不是講Map，這算不算走題了。。。

HashSet 完！

HashSet源碼解析&Map迭代器給jdk寫註釋系列之jdk1.6容器(6)

【面試準備】又一次失敗的面試經歷，題目離譜～資深軟件測試工程師

dotnet 8 版本與銀河麒麟V10和UOS系統的 glibc 兼容性

JVM內部機制詳解

Iterator設計模式給jdk寫註釋系列之jdk1.6容器(3)

LinkedList源碼解析給jdk寫註釋系列之jdk1.6容器(2)

Java爲什麼使用System.arraycopy來操作數組

TreeMap源碼解析給jdk寫註釋系列之jdk1.6容器(7)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

HashSet源碼解析&Map迭代器 給jdk寫註釋系列之jdk1.6容器(6)

HashSet源碼解析&Map迭代器給jdk寫註釋系列之jdk1.6容器(6)