String.intern in Java 6, 7 and 8 – string pooling (在Java 6,7和8中的String.intern - 字符串常量池化)

String.intern in Java 6, 7 and 8 – string pooling (在Java 6,7和8中的String.intern - 字符串常量池化)

原文鏈接:http://java-performance.info/string-intern-in-java-6-7-8

關鍵點

  • java 6中字符串常量池存儲於永久代區中,由於此區域大小固定,因此String.intern的使用較易方式OOM。應避免在Java6中使用此方法。
    • 永久代大小固定,且無法再運行時中動態改變
  • java 7 和 8 中字符串常量池存儲於堆中,且可被垃圾回收
    • 從程序的根節點出發,如果那些JVM字符串常量池中的字符串沒有被引用,那麼這些字符是可被回收的
  • 字符串常量池的實現:
    • 原理:字符串常量池是通過一個固定大小的哈希表實現的。表中每個存儲單元(bucket)都存有一系列具有同哈希值的字符串
    • 大小:
      • 默認值 1009 個 buckets,在Java7u40中它增長爲60013(在Java8中也是同樣的值)
      • 在Java 6u30, Java6u41 和 Java7u20 以及以後的版本中是可配置的,需要使用參數 XX:StringTableSize=N
      • 分配的值應該爲質數,以獲取更高性能
      • 請在Java 7 和 8 中使用 -XX:StringTableSize JVM參數以設置字符串常量池表大小。它大小固定,因爲它是通過在存儲單元存儲了鏈表的哈希表實現的。爲你程序中的不同字符串的數量進行預估(也就是那些你想緩存的字符串),並將池大小設爲約等於此值的素數並乘上2(以減少可能發生的衝突)。這會讓String.intern運行在一個常數時間內,並且每個緩存字符串所需內存會很小(在同任務量下,顯式使用Java WeakHashMap會產生4-5倍多的內存開銷)。
    • 手動實現(與JVM的字符串常量池原理相同)
      • 應該避免在程序中使用自己實現的字符串常量池,否則會有更高的內存開銷
private static final WeakHashMap<String, WeakReference<String>> s_manualCache =
    new WeakHashMap<String, WeakReference<String>>( 100000 );

private static String manualIntern( final String str )
{
    final WeakReference<String> cached = s_manualCache.get( str );
    if ( cached != null )
    {
        final String value = cached.get();
        if ( value != null )
            return value;
    }
    s_manualCache.put( str, new WeakReference<String>( str ) );
    return str;
}

正文翻譯

String pooling (字符串常量池化)

String pooling (aka string canonicalisation) is a process of replacing several String objects with equal value but different identity with a single shared String object. You can achieve this goal by keeping your own Map<String, String> (with possibly soft or weak references depending on your requirements) and using map values as canonicalised values. Or you can use String.intern() method which is provided to you by JDK.

字符串常量池化(也稱爲字符串標準化)是一個將多個不同,但持有相同內容的String對象歸一爲一個共享的String對象的過程。你可通過自己實現一個Map<String,String>,並將map的值作爲標準化值來達到這一目的(在此過程中可能需要軟引用或弱引用,這取決於你的需求)。或者,你直接使用JDK中自帶的String.intern()方法也可以。

At times of Java 6 using String.intern() was forbidden by many standards due to a high possibility to get an OutOfMemoryException if pooling went out of control. Oracle Java 7 implementation of string pooling was changed considerably. You can look for details in http://bugs.sun.com/view_bug.do?bug_id=6962931 and http://bugs.sun.com/view_bug.do?bug_id=6962930.

很多標準都禁止在Java 6 中使用String.intern()方法,因爲如果字符串常量池化失去控制,那麼就很可能發生OOM。Oracle Java 7 對字符串常量池化的實現進行了大幅修改。你可以在這兩個網站上查找到修改的具體細節:http://bugs.sun.com/view_bug.do?bug_id=6962931http://bugs.sun.com/view_bug.do?bug_id=6962930.

String.intern() in Java 6 (Java 6 中的String.intern())

In those good old days all interned strings were stored in the PermGen – the fixed size part of heap mainly used for storing loaded classes and string pool. Besides explicitly interned strings, PermGen string pool also contained all literal strings earlier used in your program (the important word here is used – if a class or method was never loaded/called, any constants defined in it will not be loaded).

之前所有緩存的字符串都存儲在永久代區 —— 堆中一個固定大小的區域,用於存儲已加載的類和字符串常量池。除了那些被顯式緩存的字符串,永久代的字符串常量池也保留了所有先前在程序中用過的字符串字面量(關鍵在於“用過的”—— 如果一個類或方法從未被加載或調用,那麼任何在其中定義的常量都不會被載入)。

The biggest issue with such string pool in Java 6 was its location – the PermGen. PermGen has a fixed size and can not be expanded at runtime. You can set it using -XX:MaxPermSize=N option. As far as I know, the default PermGen size varies between 32M and 96M depending on the platform. You can increase its size, but its size will still be fixed. Such limitation required very careful usage of String.intern – you’d better not intern any uncontrolled user input using this method. That’s why string pooling at times of Java 6 was mostly implemented in the manually managed maps.

Java 6 中這樣的常量池的最大問題就是它所處的位置——永久代區。永久代大小固定,且無法再運行時中動態改變。你可以使用 XX:MaxPermSize=N 選項來設置它的大小。據我所知,永久代區大小從32M到96M不等,具體數值取決於平臺。你可增加永久代區的大小,但是增加後其大小依舊固定。這樣的限制要求人們更加謹慎的使用String.intern()方法——你最好不要用此方法來緩存任何不可控的用戶輸入。這也是爲何當初在Java6中大部分字符串常量池化都是通過手動創建map來實現的原因。

String.intern() in Java 7 (Java 7 中的String.intern())

Oracle engineers made an extremely important change to the string pooling logic in Java 7 – the string pool was relocated to the heap. It means that you are no longer limited by a separate fixed size memory area. All strings are now located in the heap, as most of other ordinary objects, which allows you to manage only the heap size while tuning your application. Technically, this alone could be a sufficient reason to reconsider using String.intern() in your Java 7 programs. But there are other reasons.

在Java7中,Oracle工程師對字符串常量池化的邏輯做了一個及其重要的修改——字符串常量池重新遷回到堆。這意味着你不會再被一塊固定大小內存區域所限制。正如其他大部分常規對象一樣,所有的字符串現在都存儲在了堆中。這允許你在調試程序時,只需管理好堆大小就行了。從技術上講,單此一點就足以讓我們重新考慮在Java7中使用String.intern()了。當然還有其他一些原因。

String pool values are garbage collected (字符串常量池中的數據是可以被垃圾回收的)

Yes, all strings in the JVM string pool are eligible for garbage collection if there are no references to them from your program roots. It applies to all discussed versions of Java. It means that if your interned string went out of scope and there are no other references to it – it will be garbage collected from the JVM string pool.

是的,從你程序的根節點出發,如果那些JVM字符串常量池中的字符串沒有被引用,那麼這些字符是可被回收的。這個原理對所有涉及的Java版本有效。也就是說,如果你緩存的字符串超出了作用域或者沒有其他引用指向它,它就會從虛擬機字符串常量池中被回收。

Being eligible for garbage collection and residing in the heap, a JVM string pool seems to be a right place for all your strings, isn’t it? In theory it is true – non-used strings will be garbage collected from the pool, used strings will allow you to save memory in case then you get an equal string from the input. Seems to be a perfect memory saving strategy? Nearly so. You must know how the string pool is implemented before making any decisions.

存儲於堆且可被回收,看起來這個常量池是你所有字符串的最佳歸宿,不是嘛?理論上不錯——無用者被回收,有用者被緩存,以備後用。看起來是一個完美的節約內存策略?差不多。不過在下任何定論前,你一定要明白字符串常量池是如何實現的。

JVM string pool implementation in Java 6, 7 and 8(虛擬機字符串常量池在Java 6,7,8中的實現)

The string pool is implemented as a fixed capacity hash map with each bucket containing a list of strings with the same hash code. Some implementation details could be obtained from the following Java bug report: http://bugs.sun.com/view_bug.do?bug_id=6962930.

字符串常量池是通過一個固定大小的哈希表實現的。表中每個存儲單元都存有一系列具有同哈希值的字符串。一些具體的實現細節你可以從下面這個bug報告中看到:http://bugs.sun.com/view_bug.do?bug_id=6962930.

The default pool size is 1009 (it is present in the source code of the above mentioned bug report, increased in Java7u40). It was a constant in the early versions of Java 6 and became configurable between Java6u30 and Java6u41. It is configurable in Java 7 from the beginning (at least it is configurable in Java7u02). You need to specify -XX:StringTableSize=N, where N is the string pool map size. Ensure it is a prime number for the better performance.

默認池大小爲1009(這個數字出現在上面提到的bug報告中的源碼裏)。在Java6的早期版本里,它是個常量。但在Java6u30和Java6u41兩個版本中,它變成了可被配置的數值。在Java7一開始的版本里,這個值就是可配置的(至少在Java7u20中是可配置的)。你只需指定 XX:StringTableSize=N即可。這個N就是常量池的大小。確保數值是素數以獲取更高性能。

This parameter will not help you a lot in Java 6, because you are still limited by a fixed size PermGen size. The further discussion will exclude Java 6.

這個參數並不會在Java6中給你很多幫助,因爲你依舊被一個固定大小的永久代區所束縛着。接下來的討論將不會包含Java6.1

Java7 (until Java7u40)

In Java 7, on the other hand, you are limited only by a much higher heap size. It means that you can set the string pool size to a rather high value in advance (this value depends on your application requirements). As a rule, one starts worrying about the memory consumption when the memory data set size grows to at least several hundred megabytes. In this situation, allocating 8-16 MB for a string pool with one million entries seems to be a reasonable trade off (do not use 1,000,000 as a -XX:StringTableSize value – it is not prime; use 1,000,003 instead).

換言之,在Java7中,你僅僅被一個更大堆空間所限制着。也就是說,你可以提前將字符串常量池設置爲一個更高的值(這個值依據於你程序的需求)。通常,在內存數據增長了至少幾百兆時,人們都會開始擔心起內存開銷。在這種情況下,爲常量池分配 8- 16MB的,可存儲一百萬實體的空間是個不錯的權衡(不要使用1,000,000作爲 -XX:StringTableSize 的值——這不是質數;你應該轉而使用1,000,003)。

You may expect a uniform distribution of interned strings in the buckets – read my experiments in the hashCode method performance tuning article.

你也許想知道存儲單元裏緩存字符串的評價分佈,請閱讀我文章《hashCode方法性能調試》。

You must set a higher -XX:StringTableSize value (compared to the default 1009) if you intend to actively use String.intern() – otherwise this method performance will soon degrade to a linked list performance.

如果你想要頻繁使用String.intern()方法,你一定要爲 -XX:StringTableSize設置一個更高的值(與默認值1009比)—— 否則此方法的性能將很快銳減爲同鏈表一樣的性能。

I have not noticed a dependency from a string length to a time to intern a string for string lengths under 100 characters (I feel that duplicates of even 50 character long strings are rather unlikely in the real world data, so 100 chars seems to be a good test limit for me).

我還注意到,字符串長度與緩存100個字符以下的字符串的時間兩者間是有關的(我甚至感覺連拷貝長爲50個字符的字符串在現實中都十分罕見,因此100個字符長度看起來對我是個不錯的測試上限)。

Here is an extract from the test application log with the default pool size: time to intern 10.000 strings (second number) after a given number of strings was already interned (first number); Integer.toString( i ), where i between 0 and 999,999 were interned:

下面是從測試程序中提取的部分日誌,測試以默認池大小進行:當一個給定數量(第一個數字)的字符串被緩存時,再去緩存10,000個字符串的時間(第二個數字);其中調用的Integer.toString(i),將把0到999,999這些數字的字符串字面量進行緩存:

0; time = 0.0 sec
50000; time = 0.03 sec
100000; time = 0.073 sec
150000; time = 0.13 sec
200000; time = 0.196 sec
250000; time = 0.279 sec
300000; time = 0.376 sec
350000; time = 0.471 sec
400000; time = 0.574 sec
450000; time = 0.666 sec
500000; time = 0.755 sec
550000; time = 0.854 sec
600000; time = 0.916 sec
650000; time = 1.006 sec
700000; time = 1.095 sec
750000; time = 1.273 sec
800000; time = 1.248 sec
850000; time = 1.446 sec
900000; time = 1.585 sec
950000; time = 1.635 sec
1000000; time = 1.913 sec

These test results were obtained on Core [email protected] CPU. As you can see, they grow linearly and I was able to intern only approximately 5,000 strings per second when the JVM string pool size contained one million strings. It is unacceptably slow for most of applications having to handle a large amount of data in memory.

這些測試結果是基於Core [email protected] CPU得到的。如你所見,時間呈線性增長。當JVM字符串常量池持有一百萬個字符串時,我每秒僅能大約緩存5000個字符串。這對於大部分需要在內存中處理大量數據的程序而言是不可接受的速度。

Now the same test results with -XX:StringTableSize=100003 option:

現在使用 -XX:StringTableSize=100003 選項進行同樣的測試:

50000; time = 0.017 sec
100000; time = 0.009 sec
150000; time = 0.01 sec
200000; time = 0.009 sec
250000; time = 0.007 sec
300000; time = 0.008 sec
350000; time = 0.009 sec
400000; time = 0.009 sec
450000; time = 0.01 sec
500000; time = 0.013 sec
550000; time = 0.011 sec
600000; time = 0.012 sec
650000; time = 0.015 sec
700000; time = 0.015 sec
750000; time = 0.01 sec
800000; time = 0.01 sec
850000; time = 0.011 sec
900000; time = 0.011 sec
950000; time = 0.012 sec
1000000; time = 0.012 sec

As you can see, in this situation it takes nearly constant time to insert strings in the pool (there is no more than 10 strings in the bucket on average). Here are results with the same settings, but now we will insert up to 10 million strings in the pool (which means 100 strings in the bucket on average)

如你所見,在這種情況下,向池中插入字符串的耗時幾乎是固定值(每個存儲單元中評價不會超出10個字符串)。下面我們繼續使用相同的設定進行試驗,不過我將會向池中插入至多一千萬個字符串(也就是每個存儲單元將平均持有100個字符)。

2000000; time = 0.024 sec
3000000; time = 0.028 sec
4000000; time = 0.053 sec
5000000; time = 0.051 sec
6000000; time = 0.034 sec
7000000; time = 0.041 sec
8000000; time = 0.089 sec
9000000; time = 0.111 sec
10000000; time = 0.123 sec

Now let’s increase the pool size to one million buckets: (1,000,003 to be precise):

現在讓我們將池大小增長爲一百萬個存儲單元(準確說是1,000,003個):

1000000; time = 0.005 sec
2000000; time = 0.005 sec
3000000; time = 0.005 sec
4000000; time = 0.004 sec
5000000; time = 0.004 sec
6000000; time = 0.009 sec
7000000; time = 0.01 sec
8000000; time = 0.009 sec
9000000; time = 0.009 sec
10000000; time = 0.009 sec

As you can see, times are flat and do not look much different from “zero to one million” table for the ten times small string pool. Even my slow laptop can add one million new strings to the JVM string pool per second provided that the pool size is high enough.

如你所見,時間幾乎是固定,而且幾乎與“一百萬0秒”的十倍小的池沒什麼兩樣。只要池大小足夠高,甚至我的這個很慢的筆記本都可以每秒向JVM字符串常量池新加入一百萬的字符串。

Shall we still use manual string pools? (我們還要使用自己的實現的池嘛?)

Now we need to compare this JVM string pool with a WeakHashMap

private static final WeakHashMap<String, WeakReference<String>> s_manualCache =
    new WeakHashMap<String, WeakReference<String>>( 100000 );

private static String manualIntern( final String str )
{
    final WeakReference<String> cached = s_manualCache.get( str );
    if ( cached != null )
    {
        final String value = cached.get();
        if ( value != null )
            return value;
    }
    s_manualCache.put( str, new WeakReference<String>( str ) );
    return str;
}

This is the output for the same test using this manual pool:

這是使用手動實現的池進行同樣測試下的輸出:

0; manual time = 0.001 sec
50000; manual time = 0.03 sec
100000; manual time = 0.034 sec
150000; manual time = 0.008 sec
200000; manual time = 0.019 sec
250000; manual time = 0.011 sec
300000; manual time = 0.011 sec
350000; manual time = 0.008 sec
400000; manual time = 0.027 sec
450000; manual time = 0.008 sec
500000; manual time = 0.009 sec
550000; manual time = 0.008 sec
600000; manual time = 0.008 sec
650000; manual time = 0.008 sec
700000; manual time = 0.008 sec
750000; manual time = 0.011 sec
800000; manual time = 0.007 sec
850000; manual time = 0.008 sec
900000; manual time = 0.008 sec
950000; manual time = 0.008 sec
1000000; manual time = 0.008 sec

Manually written pool has provided comparable performance when JVM has sufficient memory. Unfortunately, for my test case (interning String.valueOf(0 < N < 1,000,000,000) ) of very short strings to intern, it allowed me to keep only ~2.5M such strings with -Xmx1280M. JVM string pool (size=1,000,003), on the other hand, provided the same flat performance characteristics until JVM ran out of memory with 12,72M strings in the pool (5 times more). As I think, it is a valuable hint to get rid of manual string pooling in your programs.

當手動實現的池具有足夠大小時,它就可以和JVM池的表現相提並論。不幸的是,對於我的這個緩存超短字符串的測試用例(緩存 String.valueOf(0 < N < 1,000,000,000)),在使用 -Xmx1280M 選項下,JVM僅僅允許我保留約250萬的字符串。換言之,直到JVM因存儲1272萬(5倍多)個字符串而內存溢出時,纔可以達到JVM字符串常量池(大小=1,000,003)同樣流暢的性能。因此我認爲,避免在程序中使用手動實現的字符串常量池是明智的。

String.intern() in Java 7u40+ and Java 8 (在Java7u40+ 和 Java8中的String.intern())

String pool size was increased in Java7u40 (this was a major performance update) to 60013. This value allows you to have approximately 30.000 distinct strings in the pool before your start experiencing collisions. Generally, this is sufficient for data which actually worth to intern. You can obtain this value using -XX:+PrintFlagsFinal JVM parameter.

字符串常量池在Java7u40版本中增長爲60013(這是一個主要的性能提升)。這個數字允許你向池中緩存約30萬個不同字符串同時不遇到衝突。總的來說,這對於真正值得你緩存的數據來說空間足夠大了。你可以通過 -XX:+PrintFlagsFinal 虛擬機參數獲取這個值。

I have tried to run the same tests on the original release of Java 8. Java 8 still accepts -XX:StringTableSize parameter and provides the comparable to Java 7 performance. The only important difference is that the default pool size was increased in Java 8 to 60013:

我嘗試過在Java8原版中進行同樣的測試。Java8依舊支持 -XX:StringTableSize 參數且提供了與Java7 同樣的性能表現。只是唯一的不同在於默認池大小增長爲60013了:

50000; time = 0.019 sec
100000; time = 0.009 sec
150000; time = 0.009 sec
200000; time = 0.009 sec
250000; time = 0.009 sec
300000; time = 0.009 sec
350000; time = 0.011 sec
400000; time = 0.012 sec
450000; time = 0.01 sec
500000; time = 0.013 sec
550000; time = 0.013 sec
600000; time = 0.014 sec
650000; time = 0.018 sec
700000; time = 0.015 sec
750000; time = 0.029 sec
800000; time = 0.018 sec
850000; time = 0.02 sec
900000; time = 0.017 sec
950000; time = 0.018 sec
1000000; time = 0.021 sec

Test Code 測試代碼

Test code for this article is rather simple: a method creates and interns new strings in a loop. We also measure time it took to intern the current 10.000 strings. It worth to run this program with -verbose:gc JVM parameter to see when and what garbage collections will happen. You may also want to specify the maximal heap size using -Xmx parameter.

本文的測試代碼相當簡單:一個在循環創建並緩存字符串的方法。我們同時計算了它緩存當前10,000個字符串的耗時。運行此程序時,十分提倡使用 -verbose:gc 這個虛擬機參數,以便查看GC何時何地發生。你也可能想通過使用 -Xmx參數來指定最大堆空間。

There are 2 tests: testStringPoolGarbageCollection will show you that a JVM string pool is actually garbage collected - check the garbage collection log messages as well as time it took to intern the strings on the second pass. This test will fail on Java 6 default PermGen size, so either update it, or update the test method argument, or use Java 7.

這裏有2個測試:testStringPoolGarbageCollection 測試將會證明JVM字符串常量池真的可以被垃圾回收 —— 查看垃圾回收日誌並在隨後查看緩存字符串的耗時。這個測試在Java6中默認的永久代區大小中會失敗。因此要麼更新大小,要麼更新測試方法參數,要麼使用Java7。

Second test will show you how many interned strings could be stored in memory. Run it on Java 6 with 2 different memory settings - for example -Xmx128M and -Xmx1280M (10 times more). Most likely you will see that it will not affect the number of strings you can put in the pool. On the other hand, in Java 7 you will be able to fill the whole heap with your strings.

第二個測試將會向你展示內存中可緩存多少字符串。請在Java6中通過兩個不同的內存設定運行此測試。例如 -Xmx128M 和 -Xmx1280M(十倍)。你很可能會發現這並不會影響可在池中緩存字符串的數目。換言之,在Java7中,你可以用你的字符串填滿整個堆。

/**
 * Testing String.intern.
 *
 * Run this class at least with -verbose:gc JVM parameter.
 */
public class InternTest {
    public static void main( String[] args ) {
        testStringPoolGarbageCollection();
        testLongLoop();
    }

    /**
     * Use this method to see where interned strings are stored
     * and how many of them can you fit for the given heap size.
     */
    private static void testLongLoop()
    {
        test( 1000 * 1000 * 1000 );
        //uncomment the following line to see the hand-written cache performance
        //testManual( 1000 * 1000 * 1000 );
    }

    /**
     * Use this method to check that not used interned strings are garbage collected.
     */
    private static void testStringPoolGarbageCollection()
    {
        //first method call - use it as a reference
        test( 1000 * 1000 );
        //we are going to clean the cache here.
        System.gc();
        //check the memory consumption and how long does it take to intern strings
        //in the second method call.
        test( 1000 * 1000 );
    }

    private static void test( final int cnt )
    {
        final List<String> lst = new ArrayList<String>( 100 );
        long start = System.currentTimeMillis();
        for ( int i = 0; i < cnt; ++i )
        {
            final String str = "Very long test string, which tells you about something " +
            "very-very important, definitely deserving to be interned #" + i;
//uncomment the following line to test dependency from string length
//            final String str = Integer.toString( i );
            lst.add( str.intern() );
            if ( i % 10000 == 0 )
            {
                System.out.println( i + "; time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                start = System.currentTimeMillis();
            }
        }
        System.out.println( "Total length = " + lst.size() );
    }

    private static final WeakHashMap<String, WeakReference<String>> s_manualCache =
        new WeakHashMap<String, WeakReference<String>>( 100000 );

    private static String manualIntern( final String str )
    {
        final WeakReference<String> cached = s_manualCache.get( str );
        if ( cached != null )
        {
            final String value = cached.get();
            if ( value != null )
                return value;
        }
        s_manualCache.put( str, new WeakReference<String>( str ) );
        return str;
    }

    private static void testManual( final int cnt )
    {
        final List<String> lst = new ArrayList<String>( 100 );
        long start = System.currentTimeMillis();
        for ( int i = 0; i < cnt; ++i )
        {
            final String str = "Very long test string, which tells you about something " +
                "very-very important, definitely deserving to be interned #" + i;
            lst.add( manualIntern( str ) );
            if ( i % 10000 == 0 )
            {
                System.out.println( i + "; manual time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                start = System.currentTimeMillis();
            }
        }
        System.out.println( "Total length = " + lst.size() );
    }
}

Summary (總結)

  • Stay away from String.intern() method on Java 6 due to a fixed size memory area (PermGen) used for JVM string pool storage.
  • Java 7 and 8 implement the string pool in the heap memory. It means that you are limited by the whole application memory for string pooling in Java 7 and 8.
  • Use -XX:StringTableSize JVM parameter in Java 7 and 8 to set the string pool map size. It is fixed, because it is implemented as a hash map with lists in the buckets. Approximate the number of distinct strings in your application (which you intend to intern) and set the pool size equal to some prime number close to this value multiplied by 2 (to reduce the likelihood of collisions). It will allow String.intern to run in the constant time and requires a rather small memory consumption per interned string (explicitly used Java WeakHashMap will consume 4-5 times more memory for the same task).
  • The default value of -XX:StringTableSize parameter is 1009 in Java 6 and Java 7 until Java7u40. It was increased to 60013 in Java 7u40 (same value is used in Java 8 as well).
  • If you are not sure about the string pool usage, try -XX:+PrintStringTableStatistics JVM argument. It will print you the string pool usage when your program terminates.
  • 不要在Java6中使用String.intern方法,由於Java6對JVM字符串常量池的存儲是在一個固定內存區域(永久代區)進行的。
  • Java 7 和 8 將池實現於堆內存中。也就是說在Java 7和8中,你是由程序總內存大小所限制的
  • 請在Java 7 和 8 中使用 -XX:StringTableSize JVM參數以設置字符串常量池表大小。它大小固定,因爲它是通過在存儲單元存儲了鏈表的哈希表實現的。爲你程序中的不同字符串的數量進行預估(也就是那些你想緩存的字符串),並將池大小設爲約等於此值的素數並乘上2(以減少可能發生的衝突)。這會讓String.intern運行在一個常數時間內,並且每個緩存字符串所需內存會很小(在同任務量下,顯式使用Java WeakHashMap會產生4-5倍多的內存開銷)。
  • 在Java6以及Java 7 直到 Java7u40前,-XX:StringTableSize 參數默認值是1009。在Java7u40中它增長爲60013(在Java8中也是同樣的值)。
  • 如果你不確定字符串常量池的使用情況,嘗試使用 -XX:+PrintStringTableStatics 虛擬機參數。它將會在你程序結束時打印出你的字符串常量池的使用情況。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章