Java 7、8中的String.intern

本文由 ImportNew - hejiani 翻譯自 java-performance。歡迎加入Java小組。轉載請參見文章末尾的要求。

本文是Java 6,7,8中的String.intern —— 字符串池的後續,“字符串池”這篇文章介紹了Java 7和8中String.intern()方法的實現以及使用它的優勢,鑑於其篇幅已經很長,所以我寫了本文來介紹多線程訪問String.intern時的性能特徵。

測試程序將從多個線程調用String.intern()。它們將模擬大多數現代服務器應用的行爲(比如特定的網絡爬蟲)。爲測試高競爭的場景,這些程序將在新工作站運行,配置是Intel Xeon E5-2650 CPU (8物理16虛擬內核@ 2 Ghz),128 Gb RAM。爲利用所有的物理內核我們將創建8個線程。

四個測試程序如下:

  1. 參照程序——前一篇文章中的testLongLoop方法單線程調用String.intern(),用來展示沒有任何競爭時運行速度。
  2. 8個線程調用不同字符串的String.intern()方法,該interned字符串用每個線程的線程號作爲前綴。該測試展示了String.intern()的同步開銷。理論上這是最壞的情況:一個實際應用所做的唯一事情就是多個線程都循環調用String.intern(),這樣的情況幾乎不可能出現。
  3. 開始時啓動第一個線程interning字符串集合。2秒延遲後啓動第二個線程interning相同的字符串集合。我們希望對於第一個線程以下的假設爲true:str.intern()==str;第二個線程str.intern()!=str爲true。這將證明不存在線程本地的JVM字符串池。
  4. 所有8個線程intern相同字符串集合。這種情況更加接近真實的情形——對於JVM字符串池添加和查詢字符串的混合操作。但是,如此高的JVM字符串池讀競爭也很少發生。

單線程循環調用String.intern()——參照測試

測試一:使用前一篇文章中的testLongLoop方法在單個線程中將字符串循環添加到JVM字符串池。與之前不同,這次我們每一百萬字符串輸出時間快照。比如log中3000000; time = 0.402 sec這一行表示,當初始時字符串池有二百萬字符串時增加一百萬花費0.402秒(現在池中有三百萬字符串)。測試程序(本文的其他程序也一樣)運行設置-Xmx64G -XX:StringTableSize=1000003(JVM字符串池大小)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
1000000; time = 0.362 sec
2000000; time = 0.377 sec
3000000; time = 0.402 sec
4000000; time = 0.434 sec
5000000; time = 0.36 sec
6000000; time = 0.364 sec
7000000; time = 0.351 sec
8000000; time = 0.379 sec
9000000; time = 0.445 sec
10000000; time = 0.487 sec
11000000; time = 0.518 sec
12000000; time = 0.59 sec
13000000; time = 0.695 sec
14000000; time = 0.768 sec
15000000; time = 0.815 sec
16000000; time = 0.873 sec
17000000; time = 0.954 sec
18000000; time = 1.031 sec
19000000; time = 1.081 sec
20000000; time = 1.12 sec
21000000; time = 1.17 sec
22000000; time = 1.194 sec
23000000; time = 1.264 sec
24000000; time = 1.291 sec
25000000; time = 1.352 sec
26000000; time = 1.421 sec
27000000; time = 1.476 sec
28000000; time = 1.514 sec
29000000; time = 1.612 sec
30000000; time = 1.643 sec
31000000; time = 1.695 sec
32000000; time = 1.703 sec
33000000; time = 1.81 sec
34000000; time = 1.854 sec
35000000; time = 1.943 sec
36000000; time = 1.937 sec
37000000; time = 2.0 sec
38000000; time = 2.102 sec
39000000; time = 2.124 sec
40000000; time = 2.212 sec
41000000; time = 2.225 sec
42000000; time = 2.305 sec
43000000; time = 2.344 sec
44000000; time = 2.379 sec
45000000; time = 2.46 sec
46000000; time = 2.557 sec
47000000; time = 2.656 sec
48000000; time = 2.627 sec
49000000; time = 2.629 sec
50000000; time = 2.735 sec
51000000; time = 2.738 sec
52000000; time = 2.823 sec
53000000; time = 2.861 sec
54000000; time = 2.974 sec
55000000; time = 3.027 sec
56000000; time = 3.05 sec
57000000; time = 3.088 sec
58000000; time = 3.161 sec
59000000; time = 3.244 sec
60000000; time = 3.31 sec
61000000; time = 3.327 sec
62000000; time = 3.382 sec
63000000; time = 3.445 sec
64000000; time = 3.524 sec
65000000; time = 3.669 sec
66000000; time = 3.596 sec
67000000; time = 3.673 sec
68000000; time = 3.705 sec
69000000; time = 3.752 sec
70000000; time = 3.81 sec
71000000; time = 3.898 sec
72000000; time = 3.93 sec
73000000; time = 4.0 sec
74000000; time = 4.133 sec
75000000; time = 4.109 sec
76000000; time = 4.193 sec
77000000; time = 4.182 sec
78000000; time = 4.283 sec
79000000; time = 4.349 sec
80000000; time = 4.395 sec

可以看到,結果呈線性增加,這也確認了“字符串池是個固定大小的hash map,每個bucket中包含了字符串鏈表”的實現。稍後將看到它與多線程結果的比較。

8個線程interning不同的字符串集合

下一個測試將檢測JVM線程池的同步限制:每個線程都將創建唯一的字符串集合併在循環中intern字符串。這是個競爭相當高的場景——實際應用中很少出現超過2或3個String.intern()同時調用(這個假設基於現代CPU當前的核數)。

我們使用下面的方法測試多線程。測試#2和#4的唯一不同點在於將要intern的字符串(仔細閱讀方法的註釋)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
private static void multiThreadedInternTest( final int threads, final int cnt )
{
    final CountDownLatch latch = new CountDownLatch( threads );
    for ( int i = 0; i < threads; ++i )
    {
        final int threadNo = i;
        final Runnable task = new Runnable() {
            @Override
            public void run() {
                latch.countDown();
                try {
                    latch.await(); //start all threads simultaneously
                } catch ( InterruptedException ignored ) {
                }
  
                final List<String> lst = new ArrayList<String>( 100 );
                long start = System.currentTimeMillis();
                for ( int i = 0; i < cnt; ++i )
                {
                    //this line is used in 8 writers scenario
                    final String str = "Thread #" + threadNo + " : " + i;
                    //use the following line for 1 writer, 7 readers scenario
                    //final String str = "Thread #0 : " + i;
                    lst.add( str.intern() );
                    if ( i % 10000 == 0 )
                    {
                        System.out.println( "Thread # " + threadNo + " : " + i +
                            "; time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                        start = System.currentTimeMillis();
                    }
                }
                System.out.println( "Total length = " + lst.size() );
            }
        };
        new Thread( task ).start();
    }
}

這是“8個線程同時寫入”的測試用例(JVM設置相同-XX:StringTableSize=1000003 -Xmx64G)。我排除了被垃圾回收影響的行(-verbose:gc)。log的含義也與之前的測試相同,只有一點不同是增加了線程號。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
Thread # 6 : 1000000; time = 5.922 sec
Thread # 3 : 1000000; time = 6.037 sec
Thread # 4 : 1000000; time = 6.065 sec
Thread # 0 : 1000000; time = 6.066 sec
Thread # 1 : 1000000; time = 6.075 sec
Thread # 5 : 1000000; time = 6.091 sec
Thread # 2 : 1000000; time = 6.169 sec
Thread # 7 : 1000000; time = 6.288 sec
Thread # 1 : 4000000; time = 4.991 sec
Thread # 0 : 4000000; time = 4.983 sec
Thread # 3 : 4000000; time = 5.067 sec
Thread # 2 : 4000000; time = 5.024 sec
Thread # 4 : 4000000; time = 5.028 sec
Thread # 6 : 4000000; time = 5.052 sec
Thread # 5 : 4000000; time = 5.102 sec
Thread # 7 : 4000000; time = 5.083 sec
Thread # 1 : 6000000; time = 7.012 sec
Thread # 0 : 6000000; time = 7.06 sec
Thread # 2 : 6000000; time = 6.99 sec
Thread # 3 : 6000000; time = 7.09 sec
Thread # 4 : 6000000; time = 7.045 sec
Thread # 6 : 6000000; time = 7.173 sec
Thread # 5 : 6000000; time = 7.16 sec
Thread # 7 : 6000000; time = 7.175 sec
Thread # 1 : 8000000; time = 9.098 sec
Thread # 0 : 8000000; time = 9.157 sec
Thread # 2 : 8000000; time = 9.157 sec
Thread # 3 : 8000000; time = 9.2 sec
Thread # 4 : 8000000; time = 9.222 sec
Thread # 6 : 8000000; time = 9.308 sec
Thread # 5 : 8000000; time = 9.314 sec
Thread # 7 : 8000000; time = 9.332 sec
Thread # 1 : 11000000; time = 12.987 sec
Thread # 2 : 11000000; time = 13.028 sec
Thread # 0 : 11000000; time = 13.063 sec
Thread # 4 : 11000000; time = 13.007 sec
Thread # 3 : 11000000; time = 13.04 sec
Thread # 6 : 11000000; time = 13.238 sec
Thread # 5 : 11000000; time = 13.268 sec
Thread # 7 : 11000000; time = 13.209 sec
Thread # 1 : 15000000; time = 21.826 sec
Thread # 2 : 15000000; time = 22.124 sec
Thread # 4 : 15000000; time = 22.142 sec
Thread # 3 : 15000000; time = 22.144 sec
Thread # 0 : 15000000; time = 22.384 sec
Thread # 7 : 15000000; time = 23.129 sec
Thread # 6 : 15000000; time = 23.228 sec
Thread # 5 : 15000000; time = 23.244 sec
Thread # 1 : 17000000; time = 32.329 sec
Thread # 2 : 17000000; time = 32.488 sec
Thread # 4 : 17000000; time = 32.489 sec
Thread # 3 : 17000000; time = 32.448 sec
Thread # 0 : 17000000; time = 32.603 sec
Thread # 7 : 17000000; time = 32.567 sec
Thread # 5 : 17000000; time = 32.574 sec
Thread # 6 : 17000000; time = 32.791 sec
Thread # 1 : 19000000; time = 37.914 sec
Thread # 2 : 19000000; time = 37.895 sec
Thread # 3 : 19000000; time = 37.827 sec
Thread # 4 : 19000000; time = 38.014 sec
Thread # 0 : 19000000; time = 37.981 sec
[GC 15464352K->15461583K(23223936K), 31.1850310 secs]
Thread # 7 : 19000000; time = 69.329 sec
Thread # 5 : 19000000; time = 69.291 sec
Thread # 6 : 19000000; time = 69.446 sec
Thread # 1 : 21000000; time = 43.171 sec
Thread # 2 : 21000000; time = 43.225 sec
Thread # 3 : 21000000; time = 43.265 sec
Thread # 4 : 21000000; time = 43.175 sec
Thread # 0 : 21000000; time = 43.206 sec
Thread # 7 : 21000000; time = 42.972 sec
Thread # 5 : 21000000; time = 42.983 sec
Thread # 6 : 21000000; time = 43.025 sec

將線程分組來解釋該測試結果。比如,下面的小片段表示intern八百萬字符串花費將近43秒(每個線程intern一百萬字符串),當時jvm池中已有的字符串數量爲20M*8=160M(1.6億)。

1
2
3
4
5
6
7
8
Thread # 1 : 21000000; time = 43.171 sec
Thread # 2 : 21000000; time = 43.225 sec
Thread # 3 : 21000000; time = 43.265 sec
Thread # 4 : 21000000; time = 43.175 sec
Thread # 0 : 21000000; time = 43.206 sec
Thread # 7 : 21000000; time = 42.972 sec
Thread # 5 : 21000000; time = 42.983 sec
Thread # 6 : 21000000; time = 43.025 sec

單線程與多線程String.intern測試比較

未被垃圾回收影響的第一組信息爲處理字符串數量爲11M(一千百萬)——對應到單線程情況
爲intern字符串88M。簡單估算一下。假設單線程時結果線性增加,需要計算單線程時間(84M)作爲我們[80M;88M]的単線程用例的平均時間。大約爲time(80M) + (time(80M) - time(76M)) = 4.4 sec + (4.4 sec - 4.2 sec) = 4.6 sec。相同情況的多線程模式下增加一百萬字符串的時間約爲43 sec / 8 = 5.375 sec,可以看到8個併發線程添加字符串到字符串池中僅增加了17%的開銷。即多線程寫入字符串到池中付出的代價是很小的。

JVM字符串池線程本地化測試

在同步開銷很低的情況下,我開始思考JVM字符串池實際上是否是線程本地的?在那種情況下如果我們從兩個不同的線程intern相同的字符串我們會獲取到兩個不同的對象。下面的測試先啓動第一個線程intern字符串,sleep兩秒後開始另一個相同的線程。我們期望第一個線程中str.intern() == str,因爲我們intern一個新的字符串,那麼它保存在JVM池中,但是第二個線程中str.intern() != str,因爲這個字符串已經在第一個線程中。

這個測試有另一個副作用——某些時刻第二個線程將彌補時間差距,因爲JVM字符串池讀操作要比寫入快很多。

下面是測試代碼:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
private static void multiThreadedLocalPoolTest( final int cnt ) throws InterruptedException {
    final Runnable task1 = new Runnable() {
        @Override
        public void run() {
            final List<String> lst = new ArrayList<String>( 100 );
            long start = System.currentTimeMillis();
            for ( int i = 0; i < cnt; ++i )
            {
                final String str = Integer.toString( i );
                final String interned = str.intern();
                if ( str != interned )
                    System.out.println( "Thread 0: different interned " + str );
                lst.add( interned );
                if ( i % 1000000 == 0 )
                {
                    System.out.println( "Thread 0 : " + i + "; time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                    start = System.currentTimeMillis();
                }
            }
        System.out.println( "Total length = " + lst.size() );
        }
    };
  
    final Runnable task2 = new Runnable() {
    @Override
    public void run() {
        final List<String> lst = new ArrayList<String>( 100 );
        long start = System.currentTimeMillis();
        for ( int i = 0; i < cnt; ++i )
        {
            final String str = Integer.toString( i );
            final String interned = str.intern();
            if ( str == interned )
                System.out.println( "Thread 1: same interned " + str );
            lst.add( interned );
            if ( i % 1000000 == 0 )
            {
                System.out.println( "Thread 1 : " + i + "; time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                start = System.currentTimeMillis();
            }
        }
        System.out.println( "Total length = " + lst.size() );
    }
};
  
final Thread thread1 = new Thread( task1 );
thread1.start();
  
Thread.sleep( 2000 );
  
final Thread thread2 = new Thread( task2 );
thread2.start();}

測試的輸出顯示了一些1000以下的interned字符串(至少我這裏是的)。之後的輸出爲空直到線程#1追上線程#0。這個點後面的輸出就沒有什麼意義了。這個測試證明了沒有本地線程池(否則線程#1將從開始打印所有的interned值)。

“1寫入7讀取”測試

最後將測試“1線程寫入7線程讀取”場景的性能。我們將啓動8個線程interning相同的字符串集合。就是說只有一個線程添加字符串到字符串池,而其他線程僅僅從池中查詢數據。使用測試2中的multiThreadedInternTest方法。唯一不同的地方是intern的字符串不包含特定的線程前綴。

以下是測試結果。每一百萬個字符串輸出一行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Thread # 4 : 1000000; time = 0.807 sec
Thread # 6 : 7000000; time = 1.201 sec
Thread # 2 : 8000000; time = 1.244 sec
Thread # 2 : 10000000; time = 1.639 sec
Thread # 6 : 11000000; time = 1.65 sec
Thread # 6 : 12000000; time = 1.726 sec
Thread # 5 : 14000000; time = 1.588 sec
Thread # 7 : 15000000; time = 1.612 sec
Thread # 6 : 17000000; time = 1.715 sec
Thread # 6 : 18000000; time = 1.711 sec
Thread # 2 : 19000000; time = 1.762 sec
Thread # 3 : 21000000; time = 1.857 sec
Thread # 1 : 21000000; time = 1.858 sec
Thread # 3 : 22000000; time = 1.877 sec
Thread # 6 : 23000000; time = 1.991 sec
Thread # 7 : 25000000; time = 2.052 sec
Thread # 2 : 26000000; time = 2.15 sec
Thread # 3 : 27000000; time = 2.17 sec
Thread # 7 : 28000000; time = 2.145 sec
Thread # 0 : 31000000; time = 2.341 sec
Thread # 2 : 32000000; time = 2.353 sec
Thread # 1 : 33000000; time = 2.392 sec
Thread # 2 : 35000000; time = 2.548 sec
Thread # 7 : 36000000; time = 2.499 sec
Thread # 0 : 37000000; time = 2.532 sec
Thread # 0 : 38000000; time = 2.622 sec
Thread # 7 : 40000000; time = 2.748 sec
Thread # 6 : 41000000; time = 2.768 sec
Thread # 3 : 42000000; time = 2.835 sec
Thread # 0 : 44000000; time = 2.813 sec
Thread # 5 : 45000000; time = 2.979 sec
Thread # 4 : 46000000; time = 2.996 sec
Thread # 4 : 47000000; time = 3.067 sec
Thread # 7 : 49000000; time = 2.976 sec
Thread # 0 : 50000000; time = 3.191 sec
Thread # 3 : 51000000; time = 3.102 sec
Thread # 7 : 52000000; time = 3.214 sec
Thread # 3 : 54000000; time = 3.401 sec
Thread # 4 : 55000000; time = 3.409 sec
Thread # 2 : 56000000; time = 3.471 sec
Thread # 2 : 57000000; time = 3.448 sec
Thread # 6 : 59000000; time = 3.67 sec
Thread # 5 : 60000000; time = 3.797 sec
Thread # 6 : 61000000; time = 3.744 sec
Thread # 0 : 62000000; time = 3.748 sec
Thread # 1 : 64000000; time = 3.921 sec
Thread # 1 : 66000000; time = 4.042 sec
Thread # 4 : 68000000; time = 4.115 sec
Thread # 2 : 69000000; time = 4.167 sec
Thread # 2 : 70000000; time = 4.276 sec
Thread # 7 : 71000000; time = 4.23 sec
Thread # 7 : 73000000; time = 4.38 sec
Thread # 6 : 74000000; time = 4.439 sec
Thread # 3 : 75000000; time = 4.403 sec
Thread # 4 : 76000000; time = 4.414 sec
Thread # 6 : 77000000; time = 4.499 sec
Thread # 6 : 78000000; time = 4.582 sec
Thread # 0 : 80000000; time = 4.706 sec
Thread # 6 : 80000000; time = 4.706 sec
Thread # 1 : 80000000; time = 4.706 sec
Thread # 4 : 80000000; time = 4.706 sec
Thread # 2 : 80000000; time = 4.706 sec
Thread # 3 : 80000000; time = 4.706 sec
Thread # 7 : 80000000; time = 4.706 sec
Thread # 5 : 80000000; time = 4.706 sec

將該結果與第一個測試比較,可以看到這種場景與單線程情形相比開銷大約增加了9%。

總結

  • 在多線程代碼中也自由地使用String.intern()方法吧。“8寫入”場景下與“1寫入”(單線程)相比僅僅增加17%的開銷。測試中“1寫入7讀取”場景下與單線程相比增加9%開銷。
  • JVM字符串池不是線程本地的。添加到池中的每個字符串對於JVM的所有線程都是可用的,這進一步增加了程序內存消耗。

文章轉載自:http://www.importnew.com/12452.html

原文鏈接: java-performance 翻譯: ImportNew.com - hejiani
譯文鏈接: http://www.importnew.com/12452.html
[ 轉載請保留原文出處、譯者和譯文鏈接。]



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章