JAVA面試題解惑系列（六）——字符串（String）雜談

上一次我們已經一起回顧了面試題中常考的到底創建了幾個String對象的相關知識，這一次我們以幾個常見面試題爲引子，來回顧一下String對象相關的其它一些方面。

一、String類有length()方法嗎？數組有length()方法嗎？

String類當然有length()方法了，看看String類的源碼就知道了，這是這個方法的定義：

Java代碼

public int length() {
return count;
}

public int length() {
    return count;
}

String的長度實際上就是它的屬性--char型數組value的長度。數組是沒有length()方法的，大家知道，在JAVA中，數組也被作爲對象來處理，它的方法都繼承自Object類。數組有一個屬性length，這也是它唯一的屬性，對於所有類型的數組都是這樣。

二、一箇中文漢字能保存在一個char裏嗎？

請看下面的例子：

Java代碼

public class ChineseTest {
public static void main(String[] args) {
// 將一箇中文漢字賦值給一個char變量
char a = '中';
char b = '文';
char c = '測';
char d = '試';
char e = '成';
char f = '功';
System.out.print(a);
System.out.print(b);
System.out.print(c);
System.out.print(d);
System.out.print(e);
System.out.print(f);
}
}

public class ChineseTest {
	public static void main(String[] args) {
		// 將一箇中文漢字賦值給一個char變量
		char a = '中';
		char b = '文';
		char c = '測';
		char d = '試';
		char e = '成';
		char f = '功';
		System.out.print(a);
		System.out.print(b);
		System.out.print(c);
		System.out.print(d);
		System.out.print(e);
		System.out.print(f);
	}
}

編譯沒有報錯，運行結果：

中文測試成功

答案就不用說了。爲什麼一箇中文漢字可以保存在一個char變量裏呢？因爲在JAVA中，一個char是2個字節（byte），而一箇中文漢字是一個字符，也是2個字節。而英文字母都是一個字節的，因此它也能保存到一個byte裏，一箇中文漢字卻不能。請看：

Java代碼

public class ChineseTest {
public static void main(String[] args) {
// 將一個英文字母賦值給一個byte變量
byte a = 'a';
// 將一箇中文漢字賦值給一個byte變量時，編譯會報錯
// byte b = '中';
System.out.println("byte a = " + a);
// System.out.println("byte b = "+b);
}
}

public class ChineseTest {
	public static void main(String[] args) {
		// 將一個英文字母賦值給一個byte變量
		byte a = 'a';
		// 將一箇中文漢字賦值給一個byte變量時，編譯會報錯
		// byte b = '中';

		System.out.println("byte a = " + a);
		// System.out.println("byte b = "+b);
	}
}

運行結果：

byte a = 97

正如大家所看到的那樣，我們實際上是把字符'a'對應的ASCII碼值賦值給了byte型變量a。

讓我們回過頭來看看最初的例子，能不能將a、b、c、d、e、f拼接在一起一次輸出呢？讓我們試試看：

Java代碼

public class ChineseTest {
public static void main(String[] args) {
// 將一箇中文漢字賦值給一個char變量
char a = '中';
char b = '文';
char c = '測';
char d = '試';
char e = '成';
char f = '功';
System.out.print(a + b + c + d + e + f);
}
}

public class ChineseTest {
	public static void main(String[] args) {
		// 將一箇中文漢字賦值給一個char變量
		char a = '中';
		char b = '文';
		char c = '測';
		char d = '試';
		char e = '成';
		char f = '功';
		System.out.print(a + b + c + d + e + f);
	}
}

運行結果：

156035

這顯然不是我們想要的結果。只所以會這樣是因爲我們誤用了“+”運算符，當它被用於字符串和字符串之間，或者字符串和其他類型變量之間時，它產生的效果是字符串的拼接；但當它被用於字符和字符之間時，效果等同於用於數字和數字之間，是一種算術運算。因此我們得到的“156035”是'中'、'文'、'測'、'試'、'成'、'功'這六個漢字分別對應的數值算術相加後的結果。

三、字符串的反轉輸出。

這也是面試題中常考的一道。我們就以一個包含了全部26個英文字母，同時又具有完整含義的最短句子作爲例子來完成解答。先來看一下這個句子：

引用

A quick brown fox jumps over the lazy dog.（一隻輕巧的棕色狐狸從那條懶狗身上跳了過去。）

最常用的方式就是反向取出每個位置的字符，然後依次將它們輸出到控制檯：

Java代碼

public class StringReverse {
public static void main(String[] args) {
// 原始字符串
String s = "A quick brown fox jumps over the lazy dog.";
System.out.println("原始的字符串：" + s);
System.out.print("反轉後字符串：");
for (int i = s.length(); i > 0; i--) {
System.out.print(s.charAt(i - 1));
}
// 也可以轉換成數組後再反轉，不過有點多此一舉
char[] data = s.toCharArray();
System.out.println();
System.out.print("反轉後字符串：");
for (int i = data.length; i > 0; i--) {
System.out.print(data[i - 1]);
}
}
}

public class StringReverse {
	public static void main(String[] args) {
		// 原始字符串
		String s = "A quick brown fox jumps over the lazy dog.";
		System.out.println("原始的字符串：" + s);

		System.out.print("反轉後字符串：");
		for (int i = s.length(); i > 0; i--) {
			System.out.print(s.charAt(i - 1));
		}

		// 也可以轉換成數組後再反轉，不過有點多此一舉
		char[] data = s.toCharArray();
		System.out.println();
		System.out.print("反轉後字符串：");
		for (int i = data.length; i > 0; i--) {
			System.out.print(data[i - 1]);
		}
	}
}

運行結果：

原始的字符串：A quick brown fox jumps over the lazy dog.
反轉後字符串：.god yzal eht revo spmuj xof nworb kciuq A
反轉後字符串：.god yzal eht revo spmuj xof nworb kciuq A

以上兩種方式雖然常用，但卻不是最簡單的方式，更簡單的是使用現有的方法：

Java代碼

public class StringReverse {
public static void main(String[] args) {
// 原始字符串
String s = "A quick brown fox jumps over the lazy dog.";
System.out.println("原始的字符串：" + s);
System.out.print("反轉後字符串：");
StringBuffer buff = new StringBuffer(s);
// java.lang.StringBuffer類的reverse()方法可以將字符串反轉
System.out.println(buff.reverse().toString());
}
}

public class StringReverse {
	public static void main(String[] args) {
		// 原始字符串
		String s = "A quick brown fox jumps over the lazy dog.";
		System.out.println("原始的字符串：" + s);

		System.out.print("反轉後字符串：");
		StringBuffer buff = new StringBuffer(s);
		// java.lang.StringBuffer類的reverse()方法可以將字符串反轉
		System.out.println(buff.reverse().toString());
	}
}

運行結果：

原始的字符串：A quick brown fox jumps over the lazy dog.
反轉後字符串：.god yzal eht revo spmuj xof nworb kciuq A

四、按字節截取含有中文漢字的字符串。

要求實現一個按字節截取字符串的方法，比如對於字符串"我ZWR愛JAVA"，截取它的前四位字節應該是"我ZW"，而不是"我ZWR"，同時要保證不會出現截取了半個漢字的情況。

英文字母和中文漢字在不同的編碼格式下，所佔用的字節數也是不同的，我們可以通過下面的例子來看看在一些常見的編碼格式下，一個英文字母和一箇中文漢字分別佔用多少字節。

Java代碼

import java.io.UnsupportedEncodingException;
public class EncodeTest {
/**
* 打印字符串在指定編碼下的字節數和編碼名稱到控制檯
*
* @param s
* 字符串
* @param encodingName
* 編碼格式
*/
public static void printByteLength(String s, String encodingName) {
System.out.print("字節數：");
try {
System.out.print(s.getBytes(encodingName).length);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
System.out.println(";編碼：" + encodingName);
}
public static void main(String[] args) {
String en = "A";
String ch = "人";
// 計算一個英文字母在各種編碼下的字節數
System.out.println("英文字母：" + en);
EncodeTest.printByteLength(en, "GB2312");
EncodeTest.printByteLength(en, "GBK");
EncodeTest.printByteLength(en, "GB18030");
EncodeTest.printByteLength(en, "ISO-8859-1");
EncodeTest.printByteLength(en, "UTF-8");
EncodeTest.printByteLength(en, "UTF-16");
EncodeTest.printByteLength(en, "UTF-16BE");
EncodeTest.printByteLength(en, "UTF-16LE");
System.out.println();
// 計算一箇中文漢字在各種編碼下的字節數
System.out.println("中文漢字：" + ch);
EncodeTest.printByteLength(ch, "GB2312");
EncodeTest.printByteLength(ch, "GBK");
EncodeTest.printByteLength(ch, "GB18030");
EncodeTest.printByteLength(ch, "ISO-8859-1");
EncodeTest.printByteLength(ch, "UTF-8");
EncodeTest.printByteLength(ch, "UTF-16");
EncodeTest.printByteLength(ch, "UTF-16BE");
EncodeTest.printByteLength(ch, "UTF-16LE");
}
}

import java.io.UnsupportedEncodingException;

public class EncodeTest {
	/**
	 * 打印字符串在指定編碼下的字節數和編碼名稱到控制檯
	 * 
	 * @param s
	 *            字符串
	 * @param encodingName
	 *            編碼格式
	 */
	public static void printByteLength(String s, String encodingName) {
		System.out.print("字節數：");
		try {
			System.out.print(s.getBytes(encodingName).length);
		} catch (UnsupportedEncodingException e) {
			e.printStackTrace();
		}
		System.out.println(";編碼：" + encodingName);
	}

	public static void main(String[] args) {
		String en = "A";
		String ch = "人";

		// 計算一個英文字母在各種編碼下的字節數
		System.out.println("英文字母：" + en);
		EncodeTest.printByteLength(en, "GB2312");
		EncodeTest.printByteLength(en, "GBK");
		EncodeTest.printByteLength(en, "GB18030");
		EncodeTest.printByteLength(en, "ISO-8859-1");
		EncodeTest.printByteLength(en, "UTF-8");
		EncodeTest.printByteLength(en, "UTF-16");
		EncodeTest.printByteLength(en, "UTF-16BE");
		EncodeTest.printByteLength(en, "UTF-16LE");

		System.out.println();

		// 計算一箇中文漢字在各種編碼下的字節數
		System.out.println("中文漢字：" + ch);
		EncodeTest.printByteLength(ch, "GB2312");
		EncodeTest.printByteLength(ch, "GBK");
		EncodeTest.printByteLength(ch, "GB18030");
		EncodeTest.printByteLength(ch, "ISO-8859-1");
		EncodeTest.printByteLength(ch, "UTF-8");
		EncodeTest.printByteLength(ch, "UTF-16");
		EncodeTest.printByteLength(ch, "UTF-16BE");
		EncodeTest.printByteLength(ch, "UTF-16LE");
	}
}

運行結果如下：

英文字母：A
字節數：1;編碼：GB2312
字節數：1;編碼：GBK
字節數：1;編碼：GB18030
字節數：1;編碼：ISO-8859-1
字節數：1;編碼：UTF-8
字節數：4;編碼：UTF-16
字節數：2;編碼：UTF-16BE
字節數：2;編碼：UTF-16LE
中文漢字：人
字節數：2;編碼：GB2312
字節數：2;編碼：GBK
字節數：2;編碼：GB18030
字節數：1;編碼：ISO-8859-1
字節數：3;編碼：UTF-8
字節數：4;編碼：UTF-16
字節數：2;編碼：UTF-16BE
字節數：2;編碼：UTF-16LE

UTF-16BE和UTF-16LE是UNICODE編碼家族的兩個成員。UNICODE標準定義了UTF-8、UTF-16、UTF-32三種編碼格式，共有UTF-8、UTF-16、UTF-16BE、UTF-16LE、UTF-32、UTF-32BE、UTF-32LE七種編碼方案。JAVA所採用的編碼方案是UTF-16BE。從上例的運行結果中我們可以看出，GB2312、GBK、GB18030三種編碼格式都可以滿足題目的要求。下面我們就以GBK編碼爲例來進行解答。

我們不能直接使用String類的substring(int beginIndex, int endIndex)方法，因爲它是按字符截取的。'我'和'Z'都被作爲一個字符來看待，length都是1。實際上我們只要能區分開中文漢字和英文字母，這個問題就迎刃而解了，而它們的區別就是，中文漢字是兩個字節，英文字母是一個字節。

Java代碼

import java.io.UnsupportedEncodingException;
public class CutString {
/**
* 判斷是否是一箇中文漢字
*
* @param c
* 字符
* @return true表示是中文漢字，false表示是英文字母
* @throws UnsupportedEncodingException
* 使用了JAVA不支持的編碼格式
*/
public static boolean isChineseChar(char c)
throws UnsupportedEncodingException {
// 如果字節數大於1，是漢字
// 以這種方式區別英文字母和中文漢字並不是十分嚴謹，但在這個題目中，這樣判斷已經足夠了
return String.valueOf(c).getBytes("GBK").length > 1;
}
/**
* 按字節截取字符串
*
* @param orignal
* 原始字符串
* @param count
* 截取位數
* @return 截取後的字符串
* @throws UnsupportedEncodingException
* 使用了JAVA不支持的編碼格式
*/
public static String substring(String orignal, int count)
throws UnsupportedEncodingException {
// 原始字符不爲null，也不是空字符串
if (orignal != null && !"".equals(orignal)) {
// 將原始字符串轉換爲GBK編碼格式
orignal = new String(orignal.getBytes(), "GBK");
// 要截取的字節數大於0，且小於原始字符串的字節數
if (count > 0 && count < orignal.getBytes("GBK").length) {
StringBuffer buff = new StringBuffer();
char c;
for (int i = 0; i < count; i++) {
c = orignal.charAt(i);
buff.append(c);
if (CutString.isChineseChar(c)) {
// 遇到中文漢字，截取字節總數減1
--count;
}
}
return buff.toString();
}
}
return orignal;
}
public static void main(String[] args) {
// 原始字符串
String s = "我ZWR愛JAVA";
System.out.println("原始字符串：" + s);
try {
System.out.println("截取前1位：" + CutString.substring(s, 1));
System.out.println("截取前2位：" + CutString.substring(s, 2));
System.out.println("截取前4位：" + CutString.substring(s, 4));
System.out.println("截取前6位：" + CutString.substring(s, 6));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
}

import java.io.UnsupportedEncodingException;

public class CutString {

	/**
	 * 判斷是否是一箇中文漢字
	 * 
	 * @param c
	 *            字符
	 * @return true表示是中文漢字，false表示是英文字母
	 * @throws UnsupportedEncodingException
	 *             使用了JAVA不支持的編碼格式
	 */
	public static boolean isChineseChar(char c)
			throws UnsupportedEncodingException {
		// 如果字節數大於1，是漢字
		// 以這種方式區別英文字母和中文漢字並不是十分嚴謹，但在這個題目中，這樣判斷已經足夠了
		return String.valueOf(c).getBytes("GBK").length > 1;
	}

	/**
	 * 按字節截取字符串
	 * 
	 * @param orignal
	 *            原始字符串
	 * @param count
	 *            截取位數
	 * @return 截取後的字符串
	 * @throws UnsupportedEncodingException
	 *             使用了JAVA不支持的編碼格式
	 */
	public static String substring(String orignal, int count)
			throws UnsupportedEncodingException {
		// 原始字符不爲null，也不是空字符串
		if (orignal != null && !"".equals(orignal)) {
			// 將原始字符串轉換爲GBK編碼格式
			orignal = new String(orignal.getBytes(), "GBK");
			// 要截取的字節數大於0，且小於原始字符串的字節數
			if (count > 0 && count < orignal.getBytes("GBK").length) {
				StringBuffer buff = new StringBuffer();
				char c;
				for (int i = 0; i < count; i++) {
					c = orignal.charAt(i);
					buff.append(c);
					if (CutString.isChineseChar(c)) {
						// 遇到中文漢字，截取字節總數減1
						--count;
					}
				}
				return buff.toString();
			}
		}
		return orignal;
	}

	public static void main(String[] args) {
		// 原始字符串
		String s = "我ZWR愛JAVA";
		System.out.println("原始字符串：" + s);
		try {
			System.out.println("截取前1位：" + CutString.substring(s, 1));
			System.out.println("截取前2位：" + CutString.substring(s, 2));
			System.out.println("截取前4位：" + CutString.substring(s, 4));
			System.out.println("截取前6位：" + CutString.substring(s, 6));
		} catch (UnsupportedEncodingException e) {
			e.printStackTrace();
		}
	}
}

運行結果：

原始字符串：我ZWR愛JAVA
截取前1位：我
截取前2位：我
截取前4位：我ZW
截取前6位：我ZWR愛

http://www.javaeye.com/topic/216577

JAVA面試題解惑系列（六）——字符串（String）雜談

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

nodejs學習06——小案例

評估統計算法在銀行僞造鈔票檢測中的價值

C# Xmlserializer 程序集內存泄露

Java ThreadPoolShutdown

5月21日相聚上海張江！與文心大模型一起共建大模型產業應用生態圈

XML創建可排序、分頁的數據顯示頁面

JFreeChart API一覽

Eclipse插件精選（轉貼）

Hibernate、Spring和Struts工作原理及使用理由

關於dom處理表格的問題總結轉貼

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結