Java中subString和split導致的內存溢出和對策

下面的一個例子說明String的substring方法引起的OutOfMemoryError問題:


[java] view plaincopyprint?public class TestGC {    
  private String large = new String(new char[100000]);    
   
  public String getSubString() {    
    return this.large.substring(0,2);    
  }    
   
  public static void main(String[] args) {    
    ArrayList<String> subStrings = new ArrayList<String>();    
    for (int i = 0; i <1000000; i++) {    
      TestGC testGC = new TestGC();    
      subStrings.add(testGC.getSubString());    
    }    
  }    

public class TestGC {  
  private String large = new String(new char[100000]);  
 
  public String getSubString() {  
    return this.large.substring(0,2);  
  }  
 
  public static void main(String[] args) {  
    ArrayList<String> subStrings = new ArrayList<String>();  
    for (int i = 0; i <1000000; i++) {  
      TestGC testGC = new TestGC();  
      subStrings.add(testGC.getSubString());  
    }  
  }  
}運行該程序,結果出現:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

爲什麼會出現這個情況?查看一下JDK String類substring方法的源碼,可以找到原因,源碼如下:


[java] view plaincopyprint?   public String substring(int beginIndex, int endIndex) { 
if (beginIndex < 0) { 
    throw new StringIndexOutOfBoundsException(beginIndex); 

if (endIndex > count) { 
    throw new StringIndexOutOfBoundsException(endIndex); 

if (beginIndex > endIndex) { 
    throw new StringIndexOutOfBoundsException(endIndex - beginIndex); 

return ((beginIndex == 0) && (endIndex == count)) ? this : 
    new String(offset + beginIndex, endIndex - beginIndex, value); 
   } 

    public String substring(int beginIndex, int endIndex) {
 if (beginIndex < 0) {
     throw new StringIndexOutOfBoundsException(beginIndex);
 }
 if (endIndex > count) {
     throw new StringIndexOutOfBoundsException(endIndex);
 }
 if (beginIndex > endIndex) {
     throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
 }
 return ((beginIndex == 0) && (endIndex == count)) ? this :
     new String(offset + beginIndex, endIndex - beginIndex, value);
    }該方法最後一行,調用了String的一個私有的構造方法,如下:


[java] view plaincopyprint?   // Package private constructor which shares value array for speed.  
   String(int offset, int count, char value[]) { 
this.value = value; 
this.offset = offset; 
this.count = count; 
   } 

    // Package private constructor which shares value array for speed.
    String(int offset, int count, char value[]) {
 this.value = value;
 this.offset = offset;
 this.count = count;
    }從該構造函數的訪問權限和註釋,可以看出,SUN爲了優化性能而專門寫了這個構造方法。
該方法爲了避免內存拷貝,提高性能,並沒有重新創建char數組,而是直接複用了原String對象的char[],通過改變偏移

量和長度來標識不同的字符串內容。也就是說,substring出的來String小對象,仍然會指向原String大對象的char[],

所以就導致了OutOfMemoryError問題。
找到問題之後,將上面代碼中,getSubString的方法修改一下,如下:


[java] view plaincopyprint?public String getSubString() { 
    return new String(this.large.substring(0,2));  

    public String getSubString() {
        return new String(this.large.substring(0,2));
    }將substring的結果,重新new一個String出來。再運行該程序,則沒有出現OutOfMemoryError的問題。爲什麼?因

爲此時調用的是String類的public的構造方法,該方法源碼如下:


[java] view plaincopyprint?   public String(String original) { 
int size = original.count; 
char[] originalValue = original.value; 
char[] v; 
    if (originalValue.length > size) { 
        // The array representing the String is bigger than the new  
        // String itself.  Perhaps this constructor is being called  
        // in order to trim the baggage, so make a copy of the array.  
           int off = original.offset; 
           v = Arrays.copyOfRange(originalValue, off, off+size); 
    } else { 
        // The array representing the String is the same  
        // size as the String, so no point in making a copy.  
    v = originalValue; 
    } 
this.offset = 0; 
this.count = size; 
this.value = v; 
   } 

    public String(String original) {
 int size = original.count;
 char[] originalValue = original.value;
 char[] v;
   if (originalValue.length > size) {
      // The array representing the String is bigger than the new
      // String itself.  Perhaps this constructor is being called
      // in order to trim the baggage, so make a copy of the array.
            int off = original.offset;
            v = Arrays.copyOfRange(originalValue, off, off+size);
  } else {
      // The array representing the String is the same
      // size as the String, so no point in making a copy.
     v = originalValue;
  }
 this.offset = 0;
 this.count = size;
 this.value = v;
    }從代碼可以看出,在String對象中value的length大於count的情況下,會重新創建一個char[],並進行內存拷貝。


除了substring方法之後,String的split方法,也存在同樣的問題,split的源碼如下:


[java] view plaincopyprint?public String[] split(String regex, int limit) { 
urn Pattern.compile(regex).split(this, limit); 

    public String[] split(String regex, int limit) {
 return Pattern.compile(regex).split(this, limit);
    }可以看出,String的split方法通過Pattern的split方法來實現,Pattern的split方法源碼如下:


[java] view plaincopyprint?public String[] split(CharSequence input, int limit) { 
        int index = 0; 
        boolean matchLimited = limit > 0; 
        ArrayList<String> matchList = new ArrayList<String>(); 
        Matcher m = matcher(input); 
 
        // Add segments before each match found  
        while(m.find()) { 
            if (!matchLimited || matchList.size() < limit - 1) { 
                String match = input.subSequence(index, m.start()).toString(); 
                matchList.add(match); 
                index = m.end(); 
            } else if (matchList.size() == limit - 1) { // last one  
                String match = input.subSequence(index, 
                                                 input.length()).toString(); 
                matchList.add(match); 
                index = m.end(); 
            } 
        } 
 
        // If no match was found, return this  
        if (index == 0) 
            return new String[] {input.toString()}; 
 
        // Add remaining segment  
        if (!matchLimited || matchList.size() < limit) 
            matchList.add(input.subSequence(index, input.length()).toString()); 
 
        // Construct result  
        int resultSize = matchList.size(); 
        if (limit == 0) 
            while (resultSize > 0 && matchList.get(resultSize-1).equals("")) 
                resultSize--; 
        String[] result = new String[resultSize]; 
        return matchList.subList(0, resultSize).toArray(result); 
    } 

public String[] split(CharSequence input, int limit) {
        int index = 0;
        boolean matchLimited = limit > 0;
        ArrayList<String> matchList = new ArrayList<String>();
        Matcher m = matcher(input);

        // Add segments before each match found
        while(m.find()) {
            if (!matchLimited || matchList.size() < limit - 1) {
                String match = input.subSequence(index, m.start()).toString();
                matchList.add(match);
                index = m.end();
            } else if (matchList.size() == limit - 1) { // last one
                String match = input.subSequence(index,
                                                 input.length()).toString();
                matchList.add(match);
                index = m.end();
            }
        }

        // If no match was found, return this
        if (index == 0)
            return new String[] {input.toString()};

        // Add remaining segment
        if (!matchLimited || matchList.size() < limit)
            matchList.add(input.subSequence(index, input.length()).toString());

        // Construct result
        int resultSize = matchList.size();
        if (limit == 0)
            while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
                resultSize--;
        String[] result = new String[resultSize];
        return matchList.subList(0, resultSize).toArray(result);
    }方法中的第9行:Stirng match = input.subSequence(intdex, m.start()).toString();
調用了String類的subSequence方法,該方法源碼如下:


[java] view plaincopyprint?public CharSequence subSequence(int beginIndex, int endIndex) { 
    return this.substring(beginIndex, endIndex); 

    public CharSequence subSequence(int beginIndex, int endIndex) {
        return this.substring(beginIndex, endIndex);
    }通過代碼可以看出,最終調用的是String類的substring方法,因此存在同樣的問題。split出來的小對象,直接使

用原String對象的char[]。

 


看了一下StringBuilder和StringBuffer的substring方法,則不存在這樣的問題。其源碼如下:


[java] view plaincopyprint?public String substring(int start, int end) { 
(start < 0) 
 throw new StringIndexOutOfBoundsException(start); 
(end > count) 
 throw new StringIndexOutOfBoundsException(end); 
(start > end) 
 throw new StringIndexOutOfBoundsException(end - start); 
    return new String(value, start, end - start); 

    public String substring(int start, int end) {
 if (start < 0)
     throw new StringIndexOutOfBoundsException(start);
 if (end > count)
     throw new StringIndexOutOfBoundsException(end);
 if (start > end)
     throw new StringIndexOutOfBoundsException(end - start);
        return new String(value, start, end - start);
    }最後一行,調用了String類的public構造方法,方法源碼如下:


[java] view plaincopyprint?public String(char value[], int offset, int count) { 
    if (offset < 0) { 
        throw new StringIndexOutOfBoundsException(offset); 
    } 
    if (count < 0) { 
        throw new StringIndexOutOfBoundsException(count); 
    } 
    // Note: offset or count might be near -1>>>1.  
    if (offset > value.length - count) { 
        throw new StringIndexOutOfBoundsException(offset + count); 
    } 
    this.offset = 0; 
    this.count = count; 
    this.value = Arrays.copyOfRange(value, offset, offset+count); 

    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.offset = 0;
        this.count = count;
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }該方法不是直接使用原String對象的char[],而是重新進行了內存拷貝。

發佈了26 篇原創文章 · 獲贊 2 · 訪問量 2萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章