JAVA中为什么Map桶中个数超过8才转为树

原創

stuqbx

2019-03-30 06:42

为什么要转换？

因为Map中桶的元素初始化是链表保存的，其查找性能是O(n)，而树结构能将查找性能提升到O(log(n))。当链表长度很小的时候，即使遍历，速度也非常快，但是当链表长度不断变长，肯定会对查询性能有一定的影响，所以才需要转成树。

为什么阈值是8？

转换后存储的数据结构TreeNodes占用空间是普通Nodes的两倍，只有当bin包含足够多的节点时才会转成TreeNodes，而是否足够多是由TREEIFY_THRESHOLD的值决定的。

在hashCode离散性很好的情况下，树型bin（桶，即bucket，HashMap中hashCode值一样的元素保存的地方）用到的概率非常小，因为数据均匀分布在每个bin中，几乎不会有bin中链表长度会达到阈值。事实上，在随机hashCode的情况下，在bin中节点的分布频率遵循如下的泊松分布（http://en.wikipedia.org/wiki/Poisson_distribution）。

在扩容阈值为0.75的情况下，（即使因为扩容而方差很大）遵循着参数平均为0.5的泊松分布。忽略方差，按公式

计算，概率如下：

长度	概率
0	0.60653066
1	0.30326533
2	0.07581633
3	0.01263606
4	0.00157952
5	0.00015795
6	0.00001316
7	0.00000094
8	0.00000006

如上，一个bin中链表长度达到8个元素的概率为0.00000006，几乎是不可能事件。

大部分情况下，链表存储能节约存储空间同时有着良好的查找性能；极个别情况下，节点数达到8个，转为红黑树，能获得更好的查找性能，同时因为是个别情况，不需要大量的存储空间。

所以，阈值8是时间和空间的权衡，是根据概率统计决定的。不得不感叹，发展30年的Java每一项改动和优化都是非常严谨和科学的。

附. JDK(1.8.0_45)中的相关注释

HashMap类第174～197行

     * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million

ConcurrentHashMap中第327~349行也有关于此的说法，大同小异。

     * The main disadvantage of per-bin locks is that other update
     * operations on other nodes in a bin list protected by the same
     * lock can stall, for example when user equals() or mapping
     * functions take a long time.  However, statistically, under
     * random hash codes, this is not a common problem.  Ideally, the
     * frequency of nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average, given the resizing threshold
     * of 0.75, although with a large variance because of resizing
     * granularity. Ignoring variance, the expected occurrences of
     * list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The
     * first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

JAVA中为什么Map桶中个数超过8才转为树

为什么要转换？

为什么阈值是8？

附. JDK(1.8.0_45)中的相关注释

985 硕士程序员，空窗 4 个月没有 Offer！

一文搞懂 Spring 循环依赖

赛博斗地主——使用大语言模型扮演Agent智能体玩牌类游戏。

VScode右键打开(添加到右键)

记一次 .NET某工控视觉自动化系统卡死分析

功夫貸支付服務架構演進之路——解決的問題

JAVA中爲什麼Map桶（鏈表）長度超過8才轉爲紅黑樹

將文件複製到指定路徑[C# 文件操作]

WinSock網絡通信程序設計入門

MFC的CSocket的一個小Bug？

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結