我們都知道java集合中有兩個重要的對象HashSet和HashMap,爲什麼處於這麼重要的位置呢,首先set集合中我們存放的是一個沒有重複對象的集合,這給我們編程提供了非常方便的操作,我們不用擔心set集合中會有兩個重複的對象,但是也會有缺點,我們遍歷會存在一定麻煩;然後就是我們的map,我們的map存放的key-value的形式了,跟我們對象中的屬性和屬性值類似的東西,那不知道大多數朋友們知道其中的原理了嗎?我們今天通過源碼來分析下我們的set和map集合。
既然使用,我們還是從初始化開始(Set set = new HashSet()),然後add(new Object());
//HashSet中維護着一個map??
private transient HashMap<E,Object> map;
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
* default initial capacity (16) and load factor (0.75).
*/
public HashSet() {
map = new HashMap<>();
}
/**
* Adds the specified element to this set if it is not already present.
* More formally, adds the specified element <tt>e</tt> to this set if
* this set contains no element <tt>e2</tt> such that
* <tt>(e==null ? e2==null : e.equals(e2))</tt>.
* If this set already contains the element, the call leaves the set
* unchanged and returns <tt>false</tt>.
*
* @param e element to be added to this set
* @return <tt>true</tt> if this set did not already contain the specified
* element
*/
//add方法也是調用的map對象中的put方法(e作爲key,PRESENT = new Object()作爲value)
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
我們可以看到我們初始化Set時會同時初始化了map,我們add添加對象時,調用的map對象中的put方法,因爲我們的map是key爲唯一的key-value形式,所以我們的hashSet就是依照這個關係來保證了對象唯一的。所以我們重點去看下我們的hashMap的put方法如何保證了key的唯一性。
下面是map初始化時初始化的部分代碼,除了賦予loadFactor值外,其他都爲默認、table爲null、entySet爲null等
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
//map維護的Node數組
transient Node<K,V>[] table;
/**
* Holds cached entrySet(). Note that AbstractMap fields are used
* for keySet() and values().
*/
//map存放的key對應的set集合
transient Set<Map.Entry<K,V>> entrySet;
/**
* The number of key-value mappings contained in this map.
*/
//map集合的大小
transient int size;
/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash). This field is used to make iterators on Collection-views of
* the HashMap fail-fast. (See ConcurrentModificationException).
*/
//修改此map的次數
transient int modCount;
/**
* The next size value at which to resize (capacity * load factor).
*
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;
/**
* The load factor for the hash table.
*
* @serial
*/
final float loadFactor;
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
//初始化時,只初始化了loadFactor,其他都爲默認
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
初始化之後是我們的put方法。拿我們的set.add()方法去看,我們重點看我們key是如何保證唯一的。
/**
* Adds the specified element to this set if it is not already present.
* More formally, adds the specified element <tt>e</tt> to this set if
* this set contains no element <tt>e2</tt> such that
* <tt>(e==null ? e2==null : e.equals(e2))</tt>.
* If this set already contains the element, the call leaves the set
* unchanged and returns <tt>false</tt>.
*
* @param e element to be added to this set
* @return <tt>true</tt> if this set did not already contain the specified
* element
*/
//set中的add方法,很明顯我們可以看到調用的是map.put方法
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
//=====上面是set中的代碼,下面是map中的方法,貼到一塊了============
/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
*/
//map插入鍵值對執行的方法
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
//調用key.hashCode()方法 並且於該值無符號右移16位 異或取值
/**
*曾經我一度想知道這個值是多少,但是發現就算知道也沒什麼用
*我們只需要知道同一個對象調用這個方法之後,返回的int值是一樣的就夠了
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
/**
* Implements Map.put and related methods
*
* @param hash hash for key
* @param key the key
* @param value the value to put
* @param onlyIfAbsent if true, don't change existing value
* @param evict if false, the table is in creation mode.
* @return previous value, or null if none
*/
//這是我們要看的重點方法了
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
在分析putVal方法之前,我先去網上找了一張map的結構圖,這樣更清晰一點:
可以看到我們的map的結構是數組和單鏈表共同組合成的一種結構,數組也就是我們的table,然而我們的table的類型是Node類型的節點
Node結構:我們可以清楚的看到其中存放的有一個hash,key,value還有一個關鍵的next指向鏈接起來的後面的節點(從而構成了單鏈表)
/**
* Basic hash bin node, used for most entries. (See below for
* TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
*/
//這個是我們的Node節點的結構,我們可以清楚的看到next指向的就是下個節點,從而構成單鏈表
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
Node(int hash, K key, V value, Node<K,V> next) {
this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}
public final K getKey() { return key; }
public final V getValue() { return value; }
public final String toString() { return key + "=" + value; }
public final int hashCode() {
return Objects.hashCode(key) ^ Objects.hashCode(value);
}
public final V setValue(V newValue) {
V oldValue = value;
value = newValue;
return oldValue;
}
public final boolean equals(Object o) {
if (o == this)
return true;
if (o instanceof Map.Entry) {
Map.Entry<?,?> e = (Map.Entry<?,?>)o;
if (Objects.equals(key, e.getKey()) &&
Objects.equals(value, e.getValue()))
return true;
}
return false;
}
}
我們現在逐句來看下我們的putVal方法都進行了什麼操作:
首次插入初始化table數組
//聲明變量沒什麼看的
Node<K,V>[] tab; Node<K,V> p; int n, i;
//首先判斷該table數組是否爲null,首次插入key-value時table爲null
if ((tab = table) == null || (n = tab.length) == 0)
//初始化我們的table,默認長度爲16,talbe = (Node<K,V>[])new Node[16]
//感興趣的朋友可以去看下resize方法詳細步驟
n = (tab = resize()).length;
判斷table數組對應位置是否爲null,如果爲null,插入該位置對應單鏈表的首個值
我們的n爲16,一個int的hash值與(16 - 1)進行&操作不管是什麼值,我們只需要知道肯定是個不大於15的值就行,爲什麼是15?(因爲我們剛纔初始化table數組時長度爲16(下標0-15),所以我們保證不超過這個值就行,不是很明白的可以回過頭看map的結構圖)
舉例:比如我們插入的key的hash爲0000 1000,這個數與15進行&操作
tab[8] == null?我們第一次插入值肯定爲null啊,毋庸置疑,此時我們new了一個next->null的Node節點賦給了tab[8],有人說不是從下標0開始的嗎?map沒有規定一定要從下標爲0的開始,只要是0-15隨意
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
執行完if語句之後,此時我們的table就有鍵值對了,tab在下標爲8的位置有一個只有一個key-value(我們拿set做例子,自然key就是我們的key,但是value只是一個new Object而已,下面的例子都是這樣的key-value不做解釋了)節點的單鏈表
第二次插入key-value
我們接着看我們的else語句,假設我們第二次插入的key-value,並且同樣也插入table下標爲8位置:
else {
Node<K,V> e; K k;
//我們從if條件中可以看到p節點即爲table下標8位置的節點(也就是我們首次插入的節點Node)
//我們可以看到這個判斷條件是去比較將要插入的key於p節點(也就是該位置的單鏈表的第一個節點)的key的hash是否相同
//如果相同則將p賦予e,否則看我們的else
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
//這個條件可以忽略,有興趣的朋友可以研究下TreeNode和Node的區別
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
//我們直接來看如果key的hash不相同的情況
else {
//我們可以看到此循環的目的是拿到node.next->null的節點(也就是talbe下標爲8的單鏈表的最後一個節點)
for (int binCount = 0; ; ++binCount) {
//如果循環結束直接鏈接在最後
if ((e = p.next) == null) {
//找到node.next -> null的節點是將next-> 新夠成的節點newNode
//構成單鏈表
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
//否則就判斷此節點於目標節點(new節點)key是否相同(hash是否相同)
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
總結一下上面的代碼:
首先我們先判斷此單鏈表的第一個節點是否與目標節點(新節點)的key是同對象(也就是hash是否相同),如果相同拿出此節點賦予e變量,如果不相同,則循環單鏈表,如果有key相同的拿出相同的節點賦予e變量,如果循環結束都沒有則鏈接在此鏈表的最後。
也就是如果此鏈表有key相同的將鏈表中已有的節點拿出來,沒有key相同的鏈接在此鏈表最後
拿到key對象相同的node節點,次節點可能爲null(不存在key相同的node),替換老value值
//這個e變量也就是上面拿到的重複key的node,如果沒有則爲null
//如果e不爲null,新的value值將覆蓋oldvalue,最終返回
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
上面代碼就是如果e不爲null,說明存在key相同的鍵,新value替換老value並返回
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
最後這些就沒什麼重要的了,如果沒有重複的key,則返回null。