主要分析
Stringtokenizer > string.subString > splitter.on(guava) 三種字符串截取類
1.首先介紹 String.subString () 方法 :不支持正則;
public String substring(int beginIndex, int endIndex) {
int length = length();
checkBoundsBeginEnd(beginIndex, endIndex, length);
int subLen = endIndex - beginIndex;
if (beginIndex == 0 && endIndex == length) {
return this;
}
return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)
: StringUTF16.newString(value, beginIndex, subLen);
}
它首先new一個string 對象,對於這個string對象如何賦值,如何截取。下一個函數public static String newString(byte[] val, int index, int len) {
return new String(Arrays.copyOfRange(val, index, index + len),
LATIN1);
}
Arrays.copyOfRange() // 使用這個方法
public static byte[] copyOfRange(byte[] original, int from, int to) {
int newLength = to - from;
if (newLength < 0)
throw new IllegalArgumentException(from + " > " + to);
byte[] copy = new byte[newLength];
System.arraycopy(original, from, copy, 0,
Math.min(original.length - from, newLength));
return copy;
}
最後還是歸結到 System.arraycopy() 這個使用 C語言使用的native方法上,還是 多佔用了內存,但是在性能上使用了單次遍歷截取,沒有回溯。
2. Splitter.on() 這個方法性能就很低,他是guava包中的新方法,代碼使用了兩次循環遍歷並錯誤回溯的 樸素字符串匹配,性能極差;
優點就是返回一個list集合,編碼很簡潔,但是不排除有很多坑,源碼註釋不夠多;
這是一張簡單的性能對比圖面;
下面是一個源碼;
public static Splitter on(final String separator) {
checkArgument(separator.length() != 0, "The separator may not be the empty string.");
if (separator.length() == 1) {
return Splitter.on(separator.charAt(0));
}
return new Splitter(
new Strategy() {
@Override
public SplittingIterator iterator(Splitter splitter, CharSequence toSplit) {
return new SplittingIterator(splitter, toSplit) {
@Override
public int separatorStart(int start) {
int separatorLength = separator.length();
positions:
for (int p = start, last = toSplit.length() - separatorLength; p <= last; p++) {
for (int i = 0; i < separatorLength; i++) {
if (toSplit.charAt(i + p) != separator.charAt(i)) {
continue positions;
}
}
return p;
}
return -1;
}
@Override
public int separatorEnd(int separatorPosition) {
return separatorPosition + separator.length();
}
};
}
});
}
這是Splitter.on(String )的核心方法
上圖代碼就展示了它的極差性能體現之處。
3. Stringtokenier 這個類是三種方法中性能最快的,但是它的返回 只能通過迭代獲取每一個值,在代碼簡潔性上不如上兩種。
因爲它不支持正則,所以它不需要迭代回溯 ;
它分爲初始化,和迭代求解兩個過程 完成字符串截取; 但它所要維護的類變量會多一些,內存少量增加;
private int scanToken(int startPos) {
int position = startPos;
while (position < maxPosition) {
if (!hasSurrogates) {
char c = str.charAt(position);
if ((c <= maxDelimCodePoint) && (delimiters.indexOf(c) >= 0))
break;
position++;
} else {
int c = str.codePointAt(position);
if ((c <= maxDelimCodePoint) && isDelimiter(c))
break;
position += Character.charCount(c);
}
}
if (retDelims && (startPos == position)) {
if (!hasSurrogates) {
char c = str.charAt(position);
if ((c <= maxDelimCodePoint) && (delimiters.indexOf(c) >= 0))
position++;
} else {
int c = str.codePointAt(position);
if ((c <= maxDelimCodePoint) && isDelimiter(c))
position += Character.charCount(c);
}
}
return position;
}
這是它的核心函數; 可見它是不支持正則的
4. string.split() 速度一般但支持正則;
String.split()也可以支持正則,返回數組;
核心函數:
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.length() == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
int last = length();
list.add(substring(off, last));
off = last;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[]{this};
// Add remaining segment
if (!limited || list.size() < limit)
list.add(substring(off, length()));
// Construct result
int resultSize = list.size();
if (limit == 0) {
while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
resultSize--;
}
}
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}