Java數據存儲類型ArrayList、HashSet、HashMap、LinkedList使用不同遍歷方法效率研究By Python

原創

頔潇

2020-06-13 12:43

Java不同數據存儲類型使用不同遍歷方法效率研究

GitHub代碼倉庫

數據存儲類型

ArrayList

HashSet

HashMap

LinkedList

遍歷方法

傳統遍歷方法

for(int i=0;i<list.size();i++) {
   String str = list.get(i);
   ...
}

內置迭代器

for (String str : list) {
   ...
}

顯式迭代器

Iterator<String> it = list.iterator();
while(it.hasNext()) {
  String str = it.next();
  ...
}

測試代碼模板

使用大小爲 $N$ 的數組，遍歷 $M$ 邊，平均遍歷速度定義爲 $T/(N*M)$

    private static ArrayList<String> list = new ArrayList<>();
    private final static int N = 1000000, M = 1000;
    private final static String STR = "abcdefg";

首先建立一個固定數組，供給3個遍歷方法使用

    @BeforeClass
    public static void CreateList() {
		for (int i = 0; i < N; i++) {
		    list.add(STR);
		}
    }

使用JUnit測試單元記錄時間

傳統遍歷方法For

    @Test
    public void FOR() {
		for (int k = 0; k < M; k++) {
		    for (int i = 0; i < list.size(); i++) {
			String str = list.get(i);
		    }
		}
    }

內置迭代器

    @Test
    public void Inner_Iteration() {
		for (int k = 0; k < M; k++) {
		    for (String str : list) {
			String s = str;
		    }
		}
    }

顯式迭代器

    @Test
    public void Explicit_Iteration() {
		for (int k = 0; k < M; k++) {
		    Iterator<String> it = list.iterator();
		    while (it.hasNext()) {
			String str = it.next();
		    }
		}
    }

根據不同的存儲類型進行更改
eg. HashMap 要設置key和value

Python數據可視化

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
%matplotlib inline

以ArrayList代碼爲例
柱狀圖

f, ax= plt.subplots()
ArrayList = pd.read_csv("../csv/ArrayList.csv")
sns.barplot(data=ArrayList)
ax.set_title("ArrayList")

折線圖

f, ax= plt.subplots()
sns.lineplot(data=ArrayList)
ax.set_title("ArrayList")
plt.ylim(0.5, 1.0)

ArrayList

整體來說，在ArrayList中for的遍歷速度最快，內置迭代器和顯式迭代器相當
並且ArrayList迭代非常穩定，尤其是for

Explicit_Iteration	FOR	Inner_Iteration
0.775	0.585	0.757
0.785	0.577	0.748
0.783	0.581	0.769
0.775	0.583	0.77
0.778	0.602	0.788
0.784	0.586	0.767
0.775	0.581	0.807
0.814	0.587	0.766
0.786	0.604	0.839
0.815	0.593	0.775

HashMap

在HashMap中不能使用For，內置迭代器略優

Explicit_Iteration	Inner_Iteration
0.608	0.507
0.622	0.613
0.585	0.485
0.587	0.533
0.639	0.526
0.593	0.555
0.534	0.431
0.581	0.504
0.598	0.481
0.64	0.504

HashSet

在HashSet中同樣不能使用for， 基本相同，內置迭代器較不穩定

Explicit_Iteration	Inner_Iteration
2.5	2.517
2.532	2.477
2.684	2.523
2.697	3.303
2.631	2.515
2.956	3.016
3.281	3.051
3.074	2.95
3.206	2.935
2.89	2.887

LinkedList

注：在鏈表中可以使用for，但是根據一定的測試，隨着數據規模增加，運行時間呈指數型增長，與另兩種遍歷方法不在一個數量級上，所以不加入統計。

在LinkedList中，內置迭代器和顯式迭代器效率相當，多個單次試驗觀察來說顯式迭代器略優

Explicit_Iteration	Inner_Iteration
1.865	2.101
1.883	1.929
1.9	1.941
1.843	1.903
1.89	1.897
1.835	1.921
1.841	1.846
1.914	1.902
1.872	1.859
1.816	1.827

數據處理和數據清洗

將4中存儲類型的數據整合起來
將數據統一到同樣的數據尺度

Base = 0

ArrayList /= 10**(Base-8)
ArrayList['Type'] = 'ArrayList'     # 8

HashMap /= 10**(Base-7)
HashMap['Type'] = 'HashMap'         # 7

HashSet /= 10**(Base-8)
HashSet['Type'] = 'HashSet'         # 8

LinkedList /= 10**(Base-8)
LinkedList['Type'] = 'LinkedList'  # 8

由於不同數據類型效率差異較大
作者選擇通過取對數 $log_{10}()$ 的方法
然後取相反數（時間越少，效率越高）
這使數據數量級接近，能容易可視化

data = pd.concat([ArrayList, HashMap, HashSet, LinkedList], ignore_index=True).drop(['FOR'], axis=1)
data[['Explicit_Iteration', 'Inner_Iteration']] = np.log10(data[['Explicit_Iteration', 'Inner_Iteration']])

整體可視化對比

將內置迭代器、顯式迭代器分別處理後數據對比
下圖的數據表示效率相對值的數量級

f, ax= plt.subplots()
sns.barplot(x='Type', y='Inner_Iteration', data=data)
ax.set_title("Inner_Iteration")

f, ax= plt.subplots()
sns.barplot(x='Type', y='Explicit_Iteration', data=data)
ax.set_title("Explicit_Iteration")

分析

本質：數組、集合、字典、鏈表，4種數據結構的差異在遍歷方式上的體現

ArrayList

ArrayList本質上是一個動態數組，數組對於大量隨機訪問有着高效的響應速度
迭代器在ArrayList這種不依賴__iter__、__next__方法的對象，在使用迭代器時的訪存速度遠不如有序訪存的FOR
由於數組線性存儲，導致ArrayList增加和刪除操作效率較低
因此ArrayList適用於不定長、不頻繁增刪的數據存儲

HashMap

Java種的Map主要分爲HashMap和TreeMap，屬於非Collection接口
Map需要有鍵key和值value，內部元素無序，因此無法用For訪問
使用迭代器時，由於鍵值的唯一性，單個元素查找速度似乎較快
但迭代對象時鍵值的集合，keySet()的迭代需要佔用時間
而且鍵值是無序的，一定程度上，不滿足良好局部性的要求

HashSet

HashSet繼承了Collection種的Set
HashSet調用了HashMap.put()方法，將值直接作爲鍵
所以HashSet在訪問時一定優於HashMap，因爲Set不需要對Keys的迭代
事實證明，HashSet確實遠優於HashMap，但從實用性角度來說卻不如HashMap

LinkedList

LinkedList本質上是一個雙向鏈表
鏈表的特點就是容易增加和刪除，但隨機訪問單個元素效率很低

總結

不同的4種數據存儲類型中3種迭代方式效率不同
內置迭代器和顯式迭代器效率相當，for在ArrayList中效率較高、LinkedList很差
HashSet的迭代效率較高，HashMap迭代效率較低
單從迭代器迭代速度來說：HashSet > LinkedList > ArrayList > HashMap

總體評價：

ArrayList：少增刪，求穩定

HashMap：字典功能，效率低

HashSet：大數據非數字隨機元素查找極快

LinkedList：增刪高速，嚴禁用for

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Java數據存儲類型ArrayList、HashSet、HashMap、LinkedList使用不同遍歷方法效率研究By Python

Java不同數據存儲類型使用不同遍歷方法效率研究

GitHub代碼倉庫

數據存儲類型

遍歷方法

測試代碼模板

使用JUnit測試單元記錄時間

Python數據可視化

ArrayList

HashMap

HashSet

LinkedList

數據處理和數據清洗

整體可視化對比

分析

ArrayList

HashMap

HashSet

LinkedList

總結

測試人員都是畫畫大神，讓我看看誰還不會用代碼圖？

Object.values()對象遍歷

我拍了拍Redis，被移出了羣聊···

網絡現代化通向雲原生應用的高速公路

面試官：說說你對序列化的理解

形式語言與自動機總結筆記

Python機器學習項目：基於數據挖掘的抖音商用廣告視頻識別

2020美賽建模C題思路和理解

Locally Private k-Means Clustering（本地私有k均值聚類）論文閱讀報告

二叉查找樹 / 二叉搜索樹數據結構原理、示例和算法實現

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結