本文是Java 6,7,8中的String.intern —— 字符串池的後續,“字符串池”這篇文章介紹了Java 7和8中String.intern()
方法的實現以及使用它的優勢,鑑於其篇幅已經很長,所以我寫了本文來介紹多線程訪問String.intern
時的性能特徵。
測試程序將從多個線程調用String.intern()
。它們將模擬大多數現代服務器應用的行爲(比如特定的網絡爬蟲)。爲測試高競爭的場景,這些程序將在新工作站運行,配置是Intel Xeon E5-2650 CPU (8物理16虛擬內核@ 2 Ghz),128 Gb RAM。爲利用所有的物理內核我們將創建8個線程。
四個測試程序如下:
- 參照程序——前一篇文章中的
testLongLoop
方法單線程調用String.intern()
,用來展示沒有任何競爭時運行速度。 - 8個線程調用不同字符串的
String.intern()
方法,該interned字符串用每個線程的線程號作爲前綴。該測試展示了String.intern()
的同步開銷。理論上這是最壞的情況:一個實際應用所做的唯一事情就是多個線程都循環調用String.intern()
,這樣的情況幾乎不可能出現。 - 開始時啓動第一個線程interning字符串集合。2秒延遲後啓動第二個線程interning相同的字符串集合。我們希望對於第一個線程以下的假設爲true:
str.intern()==str
;第二個線程str.intern()!=str
爲true。這將證明不存在線程本地的JVM字符串池。 - 所有8個線程intern相同字符串集合。這種情況更加接近真實的情形——對於JVM字符串池添加和查詢字符串的混合操作。但是,如此高的JVM字符串池讀競爭也很少發生。
單線程循環調用String.intern()——參照測試
測試一:使用前一篇文章中的testLongLoop
方法在單個線程中將字符串循環添加到JVM字符串池。與之前不同,這次我們每一百萬字符串輸出時間快照。比如log中3000000; time = 0.402 sec
這一行表示,當初始時字符串池有二百萬字符串時增加一百萬花費0.402秒(現在池中有三百萬字符串)。測試程序(本文的其他程序也一樣)運行設置-Xmx64G -XX:StringTableSize=1000003
(JVM字符串池大小)。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
|
1000000; time = 0.362 sec 2000000; time = 0.377 sec 3000000; time = 0.402 sec 4000000; time = 0.434 sec 5000000; time = 0.36 sec 6000000; time = 0.364 sec 7000000; time = 0.351 sec 8000000; time = 0.379 sec 9000000; time = 0.445 sec 10000000; time = 0.487 sec 11000000; time = 0.518 sec 12000000; time = 0.59 sec 13000000; time = 0.695 sec 14000000; time = 0.768 sec 15000000; time = 0.815 sec 16000000; time = 0.873 sec 17000000; time = 0.954 sec 18000000; time = 1.031 sec 19000000; time = 1.081 sec 20000000; time = 1.12 sec 21000000; time = 1.17 sec 22000000; time = 1.194 sec 23000000; time = 1.264 sec 24000000; time = 1.291 sec 25000000; time = 1.352 sec 26000000; time = 1.421 sec 27000000; time = 1.476 sec 28000000; time = 1.514 sec 29000000; time = 1.612 sec 30000000; time = 1.643 sec 31000000; time = 1.695 sec 32000000; time = 1.703 sec 33000000; time = 1.81 sec 34000000; time = 1.854 sec 35000000; time = 1.943 sec 36000000; time = 1.937 sec 37000000; time = 2.0 sec 38000000; time = 2.102 sec 39000000; time = 2.124 sec 40000000; time = 2.212 sec 41000000; time = 2.225 sec 42000000; time = 2.305 sec 43000000; time = 2.344 sec 44000000; time = 2.379 sec 45000000; time = 2.46 sec 46000000; time = 2.557 sec 47000000; time = 2.656 sec 48000000; time = 2.627 sec 49000000; time = 2.629 sec 50000000; time = 2.735 sec 51000000; time = 2.738 sec 52000000; time = 2.823 sec 53000000; time = 2.861 sec 54000000; time = 2.974 sec 55000000; time = 3.027 sec 56000000; time = 3.05 sec 57000000; time = 3.088 sec 58000000; time = 3.161 sec 59000000; time = 3.244 sec 60000000; time = 3.31 sec 61000000; time = 3.327 sec 62000000; time = 3.382 sec 63000000; time = 3.445 sec 64000000; time = 3.524 sec 65000000; time = 3.669 sec 66000000; time = 3.596 sec 67000000; time = 3.673 sec 68000000; time = 3.705 sec 69000000; time = 3.752 sec 70000000; time = 3.81 sec 71000000; time = 3.898 sec 72000000; time = 3.93 sec 73000000; time = 4.0 sec 74000000; time = 4.133 sec 75000000; time = 4.109 sec 76000000; time = 4.193 sec 77000000; time = 4.182 sec 78000000; time = 4.283 sec 79000000; time = 4.349 sec 80000000; time = 4.395 sec |
可以看到,結果呈線性增加,這也確認了“字符串池是個固定大小的hash map,每個bucket中包含了字符串鏈表”的實現。稍後將看到它與多線程結果的比較。
8個線程interning不同的字符串集合
下一個測試將檢測JVM線程池的同步限制:每個線程都將創建唯一的字符串集合併在循環中intern字符串。這是個競爭相當高的場景——實際應用中很少出現超過2或3個String.intern()
同時調用(這個假設基於現代CPU當前的核數)。
我們使用下面的方法測試多線程。測試#2和#4的唯一不同點在於將要intern的字符串(仔細閱讀方法的註釋)。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
private
static void multiThreadedInternTest(
final int
threads, final
int cnt ) { final
CountDownLatch latch = new
CountDownLatch( threads ); for
( int
i = 0 ; i < threads; ++i ) { final
int threadNo = i; final
Runnable task = new
Runnable() { @Override public
void run() { latch.countDown(); try
{ latch.await();
//start all threads simultaneously }
catch ( InterruptedException ignored ) { } final
List<String> lst = new
ArrayList<String>( 100
); long
start = System.currentTimeMillis(); for
( int
i = 0 ; i < cnt; ++i ) { //this line is used in 8 writers scenario final
String str = "Thread #"
+ threadNo + " : "
+ i; //use the following line for 1 writer, 7 readers scenario //final String str = "Thread #0 : " + i; lst.add( str.intern() ); if
( i % 10000
== 0 ) { System.out.println(
"Thread # " + threadNo +
" : " + i + "; time = "
+ ( System.currentTimeMillis() - start ) / 1000.0
+ " sec"
); start = System.currentTimeMillis(); } } System.out.println(
"Total length = "
+ lst.size() ); } }; new
Thread( task ).start(); } } |
這是“8個線程同時寫入”的測試用例(JVM設置相同-XX:StringTableSize=1000003 -Xmx64G
)。我排除了被垃圾回收影響的行(-verbose:gc
)。log的含義也與之前的測試相同,只有一點不同是增加了線程號。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
|
Thread # 6 : 1000000; time = 5.922 sec Thread # 3 : 1000000; time = 6.037 sec Thread # 4 : 1000000; time = 6.065 sec Thread # 0 : 1000000; time = 6.066 sec Thread # 1 : 1000000; time = 6.075 sec Thread # 5 : 1000000; time = 6.091 sec Thread # 2 : 1000000; time = 6.169 sec Thread # 7 : 1000000; time = 6.288 sec Thread # 1 : 4000000; time = 4.991 sec Thread # 0 : 4000000; time = 4.983 sec Thread # 3 : 4000000; time = 5.067 sec Thread # 2 : 4000000; time = 5.024 sec Thread # 4 : 4000000; time = 5.028 sec Thread # 6 : 4000000; time = 5.052 sec Thread # 5 : 4000000; time = 5.102 sec Thread # 7 : 4000000; time = 5.083 sec Thread # 1 : 6000000; time = 7.012 sec Thread # 0 : 6000000; time = 7.06 sec Thread # 2 : 6000000; time = 6.99 sec Thread # 3 : 6000000; time = 7.09 sec Thread # 4 : 6000000; time = 7.045 sec Thread # 6 : 6000000; time = 7.173 sec Thread # 5 : 6000000; time = 7.16 sec Thread # 7 : 6000000; time = 7.175 sec Thread # 1 : 8000000; time = 9.098 sec Thread # 0 : 8000000; time = 9.157 sec Thread # 2 : 8000000; time = 9.157 sec Thread # 3 : 8000000; time = 9.2 sec Thread # 4 : 8000000; time = 9.222 sec Thread # 6 : 8000000; time = 9.308 sec Thread # 5 : 8000000; time = 9.314 sec Thread # 7 : 8000000; time = 9.332 sec Thread # 1 : 11000000; time = 12.987 sec Thread # 2 : 11000000; time = 13.028 sec Thread # 0 : 11000000; time = 13.063 sec Thread # 4 : 11000000; time = 13.007 sec Thread # 3 : 11000000; time = 13.04 sec Thread # 6 : 11000000; time = 13.238 sec Thread # 5 : 11000000; time = 13.268 sec Thread # 7 : 11000000; time = 13.209 sec Thread # 1 : 15000000; time = 21.826 sec Thread # 2 : 15000000; time = 22.124 sec Thread # 4 : 15000000; time = 22.142 sec Thread # 3 : 15000000; time = 22.144 sec Thread # 0 : 15000000; time = 22.384 sec Thread # 7 : 15000000; time = 23.129 sec Thread # 6 : 15000000; time = 23.228 sec Thread # 5 : 15000000; time = 23.244 sec Thread # 1 : 17000000; time = 32.329 sec Thread # 2 : 17000000; time = 32.488 sec Thread # 4 : 17000000; time = 32.489 sec Thread # 3 : 17000000; time = 32.448 sec Thread # 0 : 17000000; time = 32.603 sec Thread # 7 : 17000000; time = 32.567 sec Thread # 5 : 17000000; time = 32.574 sec Thread # 6 : 17000000; time = 32.791 sec Thread # 1 : 19000000; time = 37.914 sec Thread # 2 : 19000000; time = 37.895 sec Thread # 3 : 19000000; time = 37.827 sec Thread # 4 : 19000000; time = 38.014 sec Thread # 0 : 19000000; time = 37.981 sec [GC 15464352K->15461583K(23223936K), 31.1850310 secs] Thread # 7 : 19000000; time = 69.329 sec Thread # 5 : 19000000; time = 69.291 sec Thread # 6 : 19000000; time = 69.446 sec Thread # 1 : 21000000; time = 43.171 sec Thread # 2 : 21000000; time = 43.225 sec Thread # 3 : 21000000; time = 43.265 sec Thread # 4 : 21000000; time = 43.175 sec Thread # 0 : 21000000; time = 43.206 sec Thread # 7 : 21000000; time = 42.972 sec Thread # 5 : 21000000; time = 42.983 sec Thread # 6 : 21000000; time = 43.025 sec |
將線程分組來解釋該測試結果。比如,下面的小片段表示intern八百萬字符串花費將近43秒(每個線程intern一百萬字符串),當時jvm池中已有的字符串數量爲20M*8=160M(1.6億)。
1
2
3
4
5
6
7
8
|
Thread # 1 : 21000000; time = 43.171 sec Thread # 2 : 21000000; time = 43.225 sec Thread # 3 : 21000000; time = 43.265 sec Thread # 4 : 21000000; time = 43.175 sec Thread # 0 : 21000000; time = 43.206 sec Thread # 7 : 21000000; time = 42.972 sec Thread # 5 : 21000000; time = 42.983 sec Thread # 6 : 21000000; time = 43.025 sec |
單線程與多線程String.intern測試比較
未被垃圾回收影響的第一組信息爲處理字符串數量爲11M(一千百萬)——對應到單線程情況
爲intern字符串88M。簡單估算一下。假設單線程時結果線性增加,需要計算單線程時間(84M)作爲我們[80M;88M]的単線程用例的平均時間。大約爲time(80M) + (time(80M) - time(76M)) = 4.4 sec + (4.4 sec - 4.2 sec) = 4.6 sec
。相同情況的多線程模式下增加一百萬字符串的時間約爲43 sec / 8 = 5.375 sec
,可以看到8個併發線程添加字符串到字符串池中僅增加了17%的開銷。即多線程寫入字符串到池中付出的代價是很小的。
JVM字符串池線程本地化測試
在同步開銷很低的情況下,我開始思考JVM字符串池實際上是否是線程本地的?在那種情況下如果我們從兩個不同的線程intern相同的字符串我們會獲取到兩個不同的對象。下面的測試先啓動第一個線程intern字符串,sleep兩秒後開始另一個相同的線程。我們期望第一個線程中str.intern() == str
,因爲我們intern一個新的字符串,那麼它保存在JVM池中,但是第二個線程中str.intern() != str
,因爲這個字符串已經在第一個線程中。
這個測試有另一個副作用——某些時刻第二個線程將彌補時間差距,因爲JVM字符串池讀操作要比寫入快很多。
下面是測試代碼:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
|
private
static void multiThreadedLocalPoolTest(
final int
cnt ) throws
InterruptedException { final
Runnable task1 = new
Runnable() { @Override public
void run() { final
List<String> lst = new
ArrayList<String>( 100
); long
start = System.currentTimeMillis(); for
( int
i = 0 ; i < cnt; ++i ) { final
String str = Integer.toString( i ); final
String interned = str.intern(); if
( str != interned ) System.out.println(
"Thread 0: different interned "
+ str ); lst.add( interned ); if
( i % 1000000
== 0 ) { System.out.println(
"Thread 0 : " + i +
"; time = " + ( System.currentTimeMillis() - start ) /
1000.0 + " sec"
); start = System.currentTimeMillis(); } } System.out.println(
"Total length = "
+ lst.size() ); } }; final
Runnable task2 = new
Runnable() { @Override public
void run() { final
List<String> lst = new
ArrayList<String>( 100
); long
start = System.currentTimeMillis(); for
( int
i = 0 ; i < cnt; ++i ) { final
String str = Integer.toString( i ); final
String interned = str.intern(); if
( str == interned ) System.out.println(
"Thread 1: same interned "
+ str ); lst.add( interned ); if
( i % 1000000
== 0 ) { System.out.println(
"Thread 1 : " + i +
"; time = " + ( System.currentTimeMillis() - start ) /
1000.0 + " sec"
); start = System.currentTimeMillis(); } } System.out.println(
"Total length = "
+ lst.size() ); } }; final
Thread thread1 = new
Thread( task1 ); thread1.start(); Thread.sleep(
2000 ); final
Thread thread2 = new
Thread( task2 ); thread2.start();} |
測試的輸出顯示了一些1000以下的interned字符串(至少我這裏是的)。之後的輸出爲空直到線程#1追上線程#0。這個點後面的輸出就沒有什麼意義了。這個測試證明了沒有本地線程池(否則線程#1將從開始打印所有的interned值)。
“1寫入7讀取”測試
最後將測試“1線程寫入7線程讀取”場景的性能。我們將啓動8個線程interning相同的字符串集合。就是說只有一個線程添加字符串到字符串池,而其他線程僅僅從池中查詢數據。使用測試2中的multiThreadedInternTest
方法。唯一不同的地方是intern的字符串不包含特定的線程前綴。
以下是測試結果。每一百萬個字符串輸出一行。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
|
Thread # 4 : 1000000; time = 0.807 sec Thread # 6 : 7000000; time = 1.201 sec Thread # 2 : 8000000; time = 1.244 sec Thread # 2 : 10000000; time = 1.639 sec Thread # 6 : 11000000; time = 1.65 sec Thread # 6 : 12000000; time = 1.726 sec Thread # 5 : 14000000; time = 1.588 sec Thread # 7 : 15000000; time = 1.612 sec Thread # 6 : 17000000; time = 1.715 sec Thread # 6 : 18000000; time = 1.711 sec Thread # 2 : 19000000; time = 1.762 sec Thread # 3 : 21000000; time = 1.857 sec Thread # 1 : 21000000; time = 1.858 sec Thread # 3 : 22000000; time = 1.877 sec Thread # 6 : 23000000; time = 1.991 sec Thread # 7 : 25000000; time = 2.052 sec Thread # 2 : 26000000; time = 2.15 sec Thread # 3 : 27000000; time = 2.17 sec Thread # 7 : 28000000; time = 2.145 sec Thread # 0 : 31000000; time = 2.341 sec Thread # 2 : 32000000; time = 2.353 sec Thread # 1 : 33000000; time = 2.392 sec Thread # 2 : 35000000; time = 2.548 sec Thread # 7 : 36000000; time = 2.499 sec Thread # 0 : 37000000; time = 2.532 sec Thread # 0 : 38000000; time = 2.622 sec Thread # 7 : 40000000; time = 2.748 sec Thread # 6 : 41000000; time = 2.768 sec Thread # 3 : 42000000; time = 2.835 sec Thread # 0 : 44000000; time = 2.813 sec Thread # 5 : 45000000; time = 2.979 sec Thread # 4 : 46000000; time = 2.996 sec Thread # 4 : 47000000; time = 3.067 sec Thread # 7 : 49000000; time = 2.976 sec Thread # 0 : 50000000; time = 3.191 sec Thread # 3 : 51000000; time = 3.102 sec Thread # 7 : 52000000; time = 3.214 sec Thread # 3 : 54000000; time = 3.401 sec Thread # 4 : 55000000; time = 3.409 sec Thread # 2 : 56000000; time = 3.471 sec Thread # 2 : 57000000; time = 3.448 sec Thread # 6 : 59000000; time = 3.67 sec Thread # 5 : 60000000; time = 3.797 sec Thread # 6 : 61000000; time = 3.744 sec Thread # 0 : 62000000; time = 3.748 sec Thread # 1 : 64000000; time = 3.921 sec Thread # 1 : 66000000; time = 4.042 sec Thread # 4 : 68000000; time = 4.115 sec Thread # 2 : 69000000; time = 4.167 sec Thread # 2 : 70000000; time = 4.276 sec Thread # 7 : 71000000; time = 4.23 sec Thread # 7 : 73000000; time = 4.38 sec Thread # 6 : 74000000; time = 4.439 sec Thread # 3 : 75000000; time = 4.403 sec Thread # 4 : 76000000; time = 4.414 sec Thread # 6 : 77000000; time = 4.499 sec Thread # 6 : 78000000; time = 4.582 sec Thread # 0 : 80000000; time = 4.706 sec Thread # 6 : 80000000; time = 4.706 sec Thread # 1 : 80000000; time = 4.706 sec Thread # 4 : 80000000; time = 4.706 sec Thread # 2 : 80000000; time = 4.706 sec Thread # 3 : 80000000; time = 4.706 sec Thread # 7 : 80000000; time = 4.706 sec Thread # 5 : 80000000; time = 4.706 sec |
將該結果與第一個測試比較,可以看到這種場景與單線程情形相比開銷大約增加了9%。
總結
- 在多線程代碼中也自由地使用
String.intern()
方法吧。“8寫入”場景下與“1寫入”(單線程)相比僅僅增加17%的開銷。測試中“1寫入7讀取”場景下與單線程相比增加9%開銷。 - JVM字符串池不是線程本地的。添加到池中的每個字符串對於JVM的所有線程都是可用的,這進一步增加了程序內存消耗。
文章轉載自:http://www.importnew.com/12452.html
原文鏈接:
java-performance 翻譯: ImportNew.com
- hejiani
譯文鏈接: http://www.importnew.com/12452.html
[ 轉載請保留原文出處、譯者和譯文鏈接。]