HashMap , 翻译解释

JDK1.7版本的HASHMAP
HashMap文件头注释翻译:

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
↓↓↓↓↓
基于Map接口实现的Hash表 , 提供了map的所有可选操作 , 并且允许多个值(value)为null,一个键(key)为null. [HashMap基本类似于HashTable,除了非线程安全(unsynchronized)和允许null] . 这个类不能保证map的顺序[1],特殊说明,随着时间推移,不保证map的顺序不变[2]

[1] . map是无序的. 跟put顺序没关系.
[2] .

  • hashmap的每次put都在去判断是否需要resize,
  • resize操作时,会收集系统信息,判断是否需要调整计算hash的性能(hashseed),
  • 如果需要,hashseed会不同,
  • 那么计算的hash值也不同.
  • 那么key落在的位置也不同.
  • 那么举例就是,hashseed=0时计算hash为A在前,B在后.hashseed=1时,计算hash为B在前,A在后.
  • 那么可得,迭代器迭代的也是完全无序的,可以理解为乱序.

 

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
↓↓↓↓↓
本类提供的基础操作性能恒定(constant-time performance)[get和put操作],可以认为散列函数在桶中均匀的分散元素.[3]
迭代器(Iterator)迭代集合所需要的时间与HashMap实例的"容量[桶的数量]"成正比,再加上其大小[键值对的数量][4].
因此,如果需要迭代器的性能,就不要设置太高的初始容量(initial capacity)(或者太低的负载因子(load factor))[5]

[3] . 性能恒定的说法是有一定前提的.(单指get和put操作)

  1. 负载因子(load factor)适当,默认的0.75在多数场景下是比较OK的.太高,会有hash冲突,太低则浪费空间.
  2. 不触发扩容. 扩容相对来说是比较耗时的操作.所以就需要在初始化时,尽量准确的提供初始容量.后续就会尽量少的触发扩容操作.
  3. 在上两个适当的情况下.散列算法基本会将key均匀分布在数组中,无论数据量有多大,那么get和put都是直接寻址,快速,性能好.

[4] . 容量(capacity)和大小(size)是两个概念 .

  1. 容量(capacity)是初始化数组的个数,举例:默认初始容量是16,就算没有put数据,容量还是16.
  2. 大小(size),在没有put数据时,size为0,put一个数据,size为1.等等.
  3. so~ ,没有数据时, capacity=16 , size=0 , put一次后, capacity=16 , size=1 .
  4. so~ ,迭代器(iterator)是按容量去循环,再加每个容量里链的长度. 所以就算size=1,iterator循环16次.
1HashIterator() {
2    expectedModCount = modCount;
3    if (size > 0) { // advance to first entry
4        Entry[] t = table;
5        //循环查询hash表,直到不为null的数据存在,并赋值给next
6        while (index < t.length && (next = t[index++]) == null);
7    }
8}

[5] . 官方建议,容量不要太大,合适就行;负载因子(load factor)太低,也会造成容量过大,实际使用较小.不然迭代器在要做太多无用功.

感觉自己的翻译还是太差了,比较吃力,自己可以看的懂,但解释不出来….,还是只贴原文吧

 

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor.[6] The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.[7] The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.[8] When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.[9]

[6] . HashMap有两个特别重要的性能指标:初始容量(initial capacity)和负载因子(load factor)
[7] . 容量(capacity),就是hash表里桶的数量(桶==key数组),初始容量就是hashmap在创建时的指定的一个容量,不是实际容量.(实际容量会在实际put时计算,实际容量=MAX(2n,initialCapacity),最接近初始容量的一个2的幂值,比如初始容量=20,那么实际容量=32)
[8] . 负载因子(load factor),调节hashMap性能的重要参数
loadFactor<=>capacitysize​,当比例值大于等于loadFactor,则触发扩容,.比例值小于loadFactor,put时不触发扩容.
[9] . size>=capacity∗loadFactor ,触发扩容,会重新进行hash计算(内部数据结构被重建).扩容后的容量是原容量的2倍,即:2∗capacity

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.[10]

[10] . 这段就重点说下负载因子(load factor). 默认的0.75,是在时间和空间成本上取了个折中;较大的值的话,虽然减少了空间浪费,增大了hash冲突的概率,增大了链表的长度.增大了查找的成本.
所以在设置初始容量的时候,考虑好数据量和负载因子,尽量减少重新散列(rehash)操作的次数.

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table.[11]

[11] . 如果你需要在map中存储大量数据,那么一个足够大的初始容量才是最好的,避免扩容操作.

重点来了!
Note that this implementation is not synchronized.[12] If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.[13] (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map.[14]

[12] . 本实现类不是线程安全的
[13] . 如果有多个线程要访问hashMap,并且至少有一个线程要修改数据结构,那么就必须要做线程同步.(修改操作包括:put,putAll,remove等).
[14] . 通常是在对象上加同步锁 (synchroniz) 来实现同步操作.

If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:Map m = Collections.synchronizedMap(new HashMap(...));[15]

[15] . 可以用Collections.synchronizedMap来包含对象,使其成为线程同步对象.最好的创建对象的时候,就包含起来.
这是硬操作,把hashmap用synchronized关键字包裹.性能不好,同步可以用concurrenthashmap

The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.[16]

[16] . 迭代器(iterator)的快速失败(fail-fast)原则: 在多线程的时候,一个线程在迭代,另一个线程修改了结构(put,putAll,remove).那么会抛出ConcurrentModificationException异常.退出迭代.除非是iterator自己的remove方法.
场景疑问:A,B两个线程都在迭代map,A线程iterator.remove了.B线程会不会抛异常??

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

  • 上一段说的挺好,这一段又开始尥蹶子了.
  • 意思是多线程操作发生 时,不能保证一定会有异常抛出(即快速失败行为).多线程操作的时候,谁都保证不了.
  • iterator已经尽最大努力去抛出异常ConcurrentModificationException.所以不能依赖这些个异常去做决断.只是提供一个参考.(你地明白…)

到此结束.打完收功!

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章