版本:JDK1.8
1.String類簡述
String類表示字符串。所有Java程序中的字符串文字,如"abc",都是該類的實例。
String類是不可變(final)的,對String類的任何改變,都是返回一個新的String類對象.
2.源碼閱讀
- 1.String類的實現、繼承情況:
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence
java.io.Serializable
:這個序列化接口沒有任何方法和域,僅用於標識序列化的語意。
Comparable<String>
:這個接口只有一個compareTo(T 0)
接口,用於對兩個實例化對象比較大小。
CharSequence
:這個接口是一個只讀的字符序列。CharSequence
只包括了以下幾個API接口:length()
,charAt(intindex)
,subSequence(intstart,intend)
另外,除了String實現了CharSequence
之外,StringBuffer
和StringBuilder
也實現了CharSequence
接口。
- 2.主要變量
/** 用於存儲String的內容 */
private final char value[];
/** String實例hashcode的緩存 */
private int hash; // 默認值爲 0
/** 序列化 */
private static final long serialVersionUID = -6849794470754667710L;
/**
*用於聲明類的可序列化字段
*/
private static final ObjectStreamField[] serialPersistentFields =
new ObjectStreamField[0];
/**
*其根本就是持有一個靜態內部類(CaseInsensitiveComparator),用於忽略大小寫得比較兩個字符串。
*/
public static final Comparator<String> CASE_INSENSITIVE_ORDER
= new CaseInsensitiveComparator();
- 3.內部類
String
中的內部類CaseInsensitiveComparator
主要作用就是複用忽略大小寫比較方法compareToIgnoreCase(String str)
public static final Comparator<String> CASE_INSENSITIVE_ORDER
= new CaseInsensitiveComparator();
private static class CaseInsensitiveComparator
implements Comparator<String>, java.io.Serializable {
// use serialVersionUID from JDK 1.2.2 for interoperability
private static final long serialVersionUID = 8575799808933029326L;
public int compare(String s1, String s2) {
int n1 = s1.length();
int n2 = s2.length();
int min = Math.min(n1, n2);
for (int i = 0; i < min; i++) {
char c1 = s1.charAt(i);
char c2 = s2.charAt(i);
if (c1 != c2) {
c1 = Character.toUpperCase(c1);
c2 = Character.toUpperCase(c2);
if (c1 != c2) {
c1 = Character.toLowerCase(c1);
c2 = Character.toLowerCase(c2);
if (c1 != c2) {
// No overflow because of numeric promotion
return c1 - c2;
}
}
}
}
return n1 - n2;
}
/** Replaces the de-serialized object. */
private Object readResolve() { return CASE_INSENSITIVE_ORDER; }
}
-
4.方法
-
4.1構造方法
String支持多種初始化方法,包括接收String,char[],byte[],StringBuffer等多種參數類型的初始化方法。但本質上,其實就是將接收到的參數傳遞給全局變量value[],不建議使用new關鍵字創建String實例,因爲String是不可變的。
public String() {
this.value = "".value;
}
public String(String original) {
this.value = original.value;
this.hash = original.hash;
}
public String(char value[]) {
this.value = Arrays.copyOf(value, value.length);
}
public String(char value[], int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count <= 0) {
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
if (offset <= value.length) {
this.value = "".value;
return;
}
}
// Note: offset or count might be near -1>>>1.
if (offset > value.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
this.value = Arrays.copyOfRange(value, offset, offset+count);
}
public String(int[] codePoints, int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count <= 0) {
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
if (offset <= codePoints.length) {
this.value = "".value;
return;
}
}
// Note: offset or count might be near -1>>>1.
if (offset > codePoints.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
final int end = offset + count;
// Pass 1: Compute precise size of char[]
int n = count;
for (int i = offset; i < end; i++) {
int c = codePoints[i];
if (Character.isBmpCodePoint(c))
continue;
else if (Character.isValidCodePoint(c))
n++;
else throw new IllegalArgumentException(Integer.toString(c));
}
// Pass 2: Allocate and fill in char[]
final char[] v = new char[n];
for (int i = offset, j = 0; i < end; i++, j++) {
int c = codePoints[i];
if (Character.isBmpCodePoint(c))
v[j] = (char)c;
else
Character.toSurrogates(c, v, j++);
}
this.value = v;
}
@Deprecated
public String(byte ascii[], int hibyte, int offset, int count) {
checkBounds(ascii, offset, count);
char value[] = new char[count];
if (hibyte == 0) {
for (int i = count; i-- > 0;) {
value[i] = (char)(ascii[i + offset] & 0xff);
}
} else {
hibyte <<= 8;
for (int i = count; i-- > 0;) {
value[i] = (char)(hibyte | (ascii[i + offset] & 0xff));
}
}
this.value = value;
}
@Deprecated
public String(byte ascii[], int hibyte) {
this(ascii, hibyte, 0, ascii.length);
}
private static void checkBounds(byte[] bytes, int offset, int length) {
if (length < 0)
throw new StringIndexOutOfBoundsException(length);
if (offset < 0)
throw new StringIndexOutOfBoundsException(offset);
if (offset > bytes.length - length)
throw new StringIndexOutOfBoundsException(offset + length);
}
public String(byte bytes[], int offset, int length, String charsetName)
throws UnsupportedEncodingException {
if (charsetName == null)
throw new NullPointerException("charsetName");
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(charsetName, bytes, offset, length);
}
public String(byte bytes[], int offset, int length, Charset charset) {
if (charset == null)
throw new NullPointerException("charset");
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(charset, bytes, offset, length);
}
public String(byte bytes[], String charsetName)
throws UnsupportedEncodingException {
this(bytes, 0, bytes.length, charsetName);
}
public String(byte bytes[], Charset charset) {
this(bytes, 0, bytes.length, charset);
}
public String(byte bytes[], int offset, int length) {
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(bytes, offset, length);
}
public String(byte bytes[]) {
this(bytes, 0, bytes.length);
}
public String(StringBuffer buffer) {
synchronized(buffer) {
this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
}
}
public String(StringBuilder builder) {
this.value = Arrays.copyOf(builder.getValue(), builder.length());
}
String(char[] value, boolean share) {
// assert share : "unshared not supported";
this.value = value;
}
- 4.2常用方法
知道了String
其實內部是通過char[]
實現的,那麼就不難發現length(),isEmpty(),charAt()
這些方法其實就是在內部調用數組的方法。
/**
* 字符串長度
*/
public int length() {
return value.length;
}
/**
*判斷是否爲空
*/
public boolean isEmpty() {
return value.length == 0;
}
/**
*獲取指定下標字符
*/
public char charAt(int index) {
if ((index < 0) || (index >= value.length)) {
throw new StringIndexOutOfBoundsException(index);
}
return value[index];
}
將字符串複製到指定數組中,可以看到,這個兩個重載方法本質上都是調用System.arraycopy()這個函數,包括在jdk很多其他源碼中都是這樣,比如ThreadPoolExcuter,看似有很多個重載,其實本質上都是調用同樣的一個函數,只是會給你不同的默認初始值。
/**
*將字符串複製到dst數組中,複製到dst數組中的起始位置可以指定。值得注意的是,該方法並沒有檢測複製到dst數組後是否越界
*/
void getChars(char dst[], int dstBegin) {
System.arraycopy(value, 0, dst, dstBegin, value.length);
}
/**
* 該方法的作用是將當前字符串從srcBegin到srcEnd-1位置上的字符複製到字符數組dst中,並從dst的dstBegin處開始存放
*/
public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
if (srcBegin < 0) {
throw new StringIndexOutOfBoundsException(srcBegin);
}
if (srcEnd > value.length) {
throw new StringIndexOutOfBoundsException(srcEnd);
}
if (srcBegin > srcEnd) {
throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
}
System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
}
將String字符串轉成二進制的幾種方式,可以指定byte數組,也能讓其返回一個byte
數組。本質上,其實都是調用了StringCoding.encode()
這個靜態方法。
/**
* 這個方法不能正確地將字符轉換成字節。在jdk 1.1中,它使用平臺的默認字符集。
* 現已過時,不建議使用
*/
@Deprecated
public void getBytes(int srcBegin, int srcEnd, byte dst[], int dstBegin) {
if (srcBegin < 0) {
throw new StringIndexOutOfBoundsException(srcBegin);
}
if (srcEnd > value.length) {
throw new StringIndexOutOfBoundsException(srcEnd);
}
if (srcBegin > srcEnd) {
throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
}
Objects.requireNonNull(dst);
int j = dstBegin;
int n = srcEnd;
int i = srcBegin;
char[] val = value; /* avoid getfield opcode */
while (i < n) {
dst[j++] = (byte)val[i++];
}
}
/**
* @charsetName :指定字符集
*/
public byte[] getBytes(String charsetName)
throws UnsupportedEncodingException {
if (charsetName == null) throw new NullPointerException();
return StringCoding.encode(charsetName, value, 0, value.length);
}
public byte[] getBytes(Charset charset) {
if (charset == null) throw new NullPointerException();
return StringCoding.encode(charset, value, 0, value.length);
}
public byte[] getBytes() {
return StringCoding.encode(value, 0, value.length);
}
String中重寫了equals()與hashCod()
方法
/**
* 只比較String中value數組每個字符的值,相同爲true
*/
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = value.length;
if (n == anotherString.value.length) {
char v1[] = value;
char v2[] = anotherString.value;
int i = 0;
while (n-- != 0) {
if (v1[i] != v2[i])
return false;
i++;
}
return true;
}
}
return false;
}
/**
* 重寫hashCode計算方法
*/
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
String中其他比較、匹配方法,比如contentEquals(CharSequence cs)
, contentEquals(StringBuffer sb)
/**
* 比較內容
*/
public boolean contentEquals(StringBuffer sb) {
return contentEquals((CharSequence)sb);
}
private boolean nonSyncContentEquals(AbstractStringBuilder sb) {
char v1[] = value;
char v2[] = sb.getValue();
int n = v1.length;
if (n != sb.length()) {
return false;
}
for (int i = 0; i < n; i++) {
if (v1[i] != v2[i]) {
return false;
}
}
return true;
}
/**
* 先判斷實例,再對比
*/
public boolean contentEquals(CharSequence cs) {
// Argument is a StringBuffer, StringBuilder
if (cs instanceof AbstractStringBuilder) {
if (cs instanceof StringBuffer) {
synchronized(cs) {
return nonSyncContentEquals((AbstractStringBuilder)cs);
}
} else {
return nonSyncContentEquals((AbstractStringBuilder)cs);
}
}
// Argument is a String
if (cs instanceof String) {
return equals(cs);
}
// Argument is a generic CharSequence
char v1[] = value;
int n = v1.length;
if (n != cs.length()) {
return false;
}
for (int i = 0; i < n; i++) {
if (v1[i] != cs.charAt(i)) {
return false;
}
}
return true;
}
/**
* 不區分大小寫比較內容
*/
public boolean equalsIgnoreCase(String anotherString) {
return (this == anotherString) ? true
: (anotherString != null)
&& (anotherString.value.length == value.length)
&& regionMatches(true, 0, anotherString, 0, value.length);
}
/**
*區分大小寫比較內容,其核心就是那個while循環,通過從第一個開始比較每一個字符,當遇到第一個較小的字符時,判定該字符串小。
*/
public int compareTo(String anotherString) {
int len1 = value.length;
int len2 = anotherString.value.length;
int lim = Math.min(len1, len2);
char v1[] = value;
char v2[] = anotherString.value;
int k = 0;
while (k < lim) {
char c1 = v1[k];
char c2 = v2[k];
if (c1 != c2) {
return c1 - c2;
}
k++;
}
return len1 - len2;
}
/**
* 不區分大小寫比較
*/
public int compareToIgnoreCase(String str) {
return CASE_INSENSITIVE_ORDER.compare(this, str);
}
/**
* 比較該字符串和其他一個字符串從分別指定地點開始的n個字符是否相等。看代碼可知道,其原理還是通過一個while去循環對應的比較區域進行判斷,但在比較之前會做判定,判定給定參數是否越界。
*/
public boolean regionMatches(int toffset, String other, int ooffset,
int len) {
char ta[] = value;
int to = toffset;
char pa[] = other.value;
int po = ooffset;
// Note: toffset, ooffset, or len might be near -1>>>1.
if ((ooffset < 0) || (toffset < 0)
|| (toffset > (long)value.length - len)
|| (ooffset > (long)other.value.length - len)) {
return false;
}
while (len-- > 0) {
if (ta[to++] != pa[po++]) {
return false;
}
}
return true;
}
/**
* 比較該字符串和其他一個字符串從分別指定地點開始的n個字符是否相等。看代碼可知道,其原理還是通過一個while去循環對應的比較區域進行判斷,但在比較之前會做判定,判定給定參數是否越界。
*/
public boolean regionMatches(boolean ignoreCase, int toffset,
String other, int ooffset, int len) {
char ta[] = value;
int to = toffset;
char pa[] = other.value;
int po = ooffset;
// Note: toffset, ooffset, or len might be near -1>>>1.
if ((ooffset < 0) || (toffset < 0)
|| (toffset > (long)value.length - len)
|| (ooffset > (long)other.value.length - len)) {
return false;
}
while (len-- > 0) {
char c1 = ta[to++];
char c2 = pa[po++];
if (c1 == c2) {
continue;
}
if (ignoreCase) {
// If characters don't match but case may be ignored,
// try converting both characters to uppercase.
// If the results match, then the comparison scan should
// continue.
char u1 = Character.toUpperCase(c1);
char u2 = Character.toUpperCase(c2);
if (u1 == u2) {
continue;
}
// Unfortunately, conversion to uppercase does not work properly
// for the Georgian alphabet, which has strange rules about case
// conversion. So we need to make one last check before
// exiting.
if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
continue;
}
}
return false;
}
return true;
}
/**
* 匹配以某一段字符串開頭
*/
public boolean startsWith(String prefix, int toffset) {
char ta[] = value;
int to = toffset;
char pa[] = prefix.value;
int po = 0;
int pc = prefix.value.length;
// Note: toffset might be near -1>>>1.
if ((toffset < 0) || (toffset > value.length - pc)) {
return false;
}
while (--pc >= 0) {
if (ta[to++] != pa[po++]) {
return false;
}
}
return true;
}
/**
* 匹配以某一段字符串開頭
*/
public boolean startsWith(String prefix) {
return startsWith(prefix, 0);
}
/**
*匹配以某一段字符串結尾
*/
public boolean endsWith(String suffix) {
return startsWith(suffix, value.length - suffix.value.length);
}
索引位置查找,String中Index
相關方法,主要實現思路是遍歷char數組,返回當前字符的索引,當所要搜索的對象是一個字符串時,先是進行一系列的初始判定,比如子串長度不能大於當前字符串。然後在當前字符串中找到子串的第一個字符的位置 i ,從這個位置開始,和子串每一個字符比較。若完全匹配,則返回結果,如果在這個過程中,某個字符不匹配,則從 i+1 的位置開始繼續尋找子串第一個字符的位置,後繼續比較。
public int indexOf(String str) {
if (coder() == str.coder()) {
return isLatin1() ? StringLatin1.indexOf(value, str.value)
: StringUTF16.indexOf(value, str.value);
}
if (coder() == LATIN1) { // str.coder == UTF16
return -1;
}
return StringUTF16.indexOfLatin1(value, str.value);
}
.........>省略部分代碼<...................
static int lastIndexOf(byte[] src, byte srcCoder, int srcCount,
String tgtStr, int fromIndex) {
byte[] tgt = tgtStr.value;
byte tgtCoder = tgtStr.coder();
int tgtCount = tgtStr.length();
/*
* Check arguments; return immediately where possible. For
* consistency, don't check for null str.
*/
int rightIndex = srcCount - tgtCount;
if (fromIndex > rightIndex) {
fromIndex = rightIndex;
}
if (fromIndex < 0) {
return -1;
}
/* Empty string always matches. */
if (tgtCount == 0) {
return fromIndex;
}
if (srcCoder == tgtCoder) {
return srcCoder == LATIN1
? StringLatin1.lastIndexOf(src, srcCount, tgt, tgtCount, fromIndex)
: StringUTF16.lastIndexOf(src, srcCount, tgt, tgtCount, fromIndex);
}
if (srcCoder == LATIN1) { // && tgtCoder == UTF16
return -1;
}
// srcCoder == UTF16 && tgtCoder == LATIN1
return StringUTF16.lastIndexOfLatin1(src, srcCount, tgt, tgtCount, fromIndex);
}
public boolean contains(CharSequence s) {
return indexOf(s.toString()) > -1;
}
截取操作,字符串中的substring(int beginIndex, int endIndex)
,這個方法可以返回字符串中一個子串,看最後一行可以發現,其實就是指定頭尾,然後構造一個新的字符串。
public String substring(int beginIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
int subLen = value.length - beginIndex;
if (subLen < 0) {
throw new StringIndexOutOfBoundsException(subLen);
}
return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
}
public String substring(int beginIndex, int endIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > value.length) {
throw new StringIndexOutOfBoundsException(endIndex);
}
int subLen = endIndex - beginIndex;
if (subLen < 0) {
throw new StringIndexOutOfBoundsException(subLen);
}
return ((beginIndex == 0) && (endIndex == value.length)) ? this
: new String(value, beginIndex, subLen);
}
public CharSequence subSequence(int beginIndex, int endIndex) {
return this.substring(beginIndex, endIndex);
}
concat(String str)
的作用是將str拼接到當前字符串後面,通過代碼也可以看出其實就是建一個新的字符串。
public String concat(String str) {
int otherLen = str.length();
if (otherLen == 0) {
return this;
}
int len = value.length;
char buf[] = Arrays.copyOf(value, len + otherLen);
str.getChars(buf, len);
return new String(buf, true);
}
替換操作,主要是將原來字符串中的oldChar全部替換成newChar。看這裏實現,主要是先找到第一個所要替換的字符串的位置 i ,將i之前的字符直接複製到一個新char數組。然後從 i 開始再對每一個字符進行判斷是不是所要替換的字符。主要使用正則方式來匹配需要找的字串。
public String replace(char oldChar, char newChar) {
if (oldChar != newChar) {
int len = value.length;
int i = -1;
char[] val = value; /* avoid getfield opcode */
while (++i < len) {
if (val[i] == oldChar) {
break;
}
}
if (i < len) {
char buf[] = new char[len];
for (int j = 0; j < i; j++) {
buf[j] = val[j];
}
while (i < len) {
char c = val[i];
buf[i] = (c == oldChar) ? newChar : c;
i++;
}
return new String(buf, true);
}
}
return this;
}
public String replaceFirst(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceFirst(replacement);
}
public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
public boolean matches(String regex) {
return Pattern.matches(regex, this);
}
切割操作,主要還是通過正則方式來匹配需要找的“分割符”,其中值得注意的是limit的取值,當limit=n>0時,那麼字符串最多被切割成n個,數組最後一個將包含第n個分割符後的所有字串;當n=0時(默認值),如果分割符後爲空字串,該空字串將被丟棄;當n<0時,空字串不會被丟棄。
public String[] split(String regex, int limit) {
char ch = 0;
if (((regex.value.length == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[]{this};
// Add remaining segment
if (!limited || list.size() < limit)
list.add(substring(off, value.length));
// Construct result
int resultSize = list.size();
if (limit == 0) {
while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
resultSize--;
}
}
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
public String[] split(String regex) {
return split(regex, 0);
}
其他valueOf(),trim(),toUpperCase() 等轉換方法,這裏就不再詳述,後面有時間再補上。
3.相關問題
- 1.String中hashcode是怎麼實現的?
源碼:
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
hashCode 的實現其實就是使用數學公式:s[0] * 31^(n-1) + s[1] * 31^(n-2) + ... + s[n-1]
將上一次的計算結果作爲31的權重去計算當前的算子,之所以選擇31作爲係數,主要是出於效率方面的考慮。
在存儲數據計算hash地址的時候,我們希望儘量減少有同樣的hash地址,所謂“衝突”。如果使用相同hash地址的數據過多,
那麼這些數據所組成的hash鏈就更長,從而降低了查詢效率!所以在選擇係數的時候要選擇儘量長(31 = 11111[2])的係數
並且讓乘法儘量不要溢出(如果選擇大於11111的數,很容易溢出)的係數,因爲如果計算出來的hash地址越大,
所謂的“衝突”就越少,查找起來效率也會提高。
- 2.String s="abc"和String s=new String(“abc”)區別;
Java運行環境有一個字符串池,由String類維護。
執行語句String s="abc"時,首先查看字符串池中是否存在字符串"abc",
如果存在則直接將"abc"賦給s,不創建對象,如果不存在則先在字符串池中新建一個字符串"abc",創建一個對象,然後再將其賦給s。
執行語句String s=new String("abc")時,不管字符串池中是否存在字符串"abc",
直接新建一個字符串"abc"對象(注意:同時會在字符串常量池中創建一個相同對象,共創建了2個對象),然後將其付給s。
前一語句的效率高,後一語句的效率低,因爲新建字符串佔用內存空間。