【131】Java把\u開頭的Unicode編碼轉換成漢字

最近工作中遇到需要調用第三方接口的需求。第三方接口返回的字符串中，會把中文轉義成 \u + Unicode 的形式。因此，我需要再把 \u + Unicode 轉換成漢字。

這裏，我們需要認識到 Java 代碼對於 \u 字符處理的內外有別。如果是編寫程序的時候，直接在字符串變量裏面寫 \u + Unicode ，Java 會自動轉成漢字。但是 Java 程序對於從外部輸入的 \u + Unicode 字符，會把 \u 視作普通字符，相當於 Java 字符串中的 "\\u" 。

下面是工具類代碼，用於把 \u + Unicode 轉換成漢字。

package zhangchao.common.unicode;

import java.util.regex.Pattern;

/**
 * 字符串中存在 反斜槓+u 開頭 的Unicode字符。本類用於把那些Unicode字符串轉換成漢字
 * @author 張超
 *
 */
public final class UicodeBackslashU {
	// 單個字符的正則表達式
	private static final String singlePattern = "[0-9|a-f|A-F]";
	// 4個字符的正則表達式
	private static final String pattern = singlePattern + singlePattern +
			singlePattern + singlePattern;
	

	
	/**
	 * 把 \\u 開頭的單字轉成漢字，如 \\u6B65 ->　步
	 * @param str
	 * @return
	 */
	private static String ustartToCn(final String str) {
		StringBuilder sb = new StringBuilder().append("0x")
				.append(str.substring(2, 6));
		Integer codeInteger = Integer.decode(sb.toString());
		int code = codeInteger.intValue();
		char c = (char)code;
		return String.valueOf(c);
	}
	
	/**
	 * 字符串是否以Unicode字符開頭。約定Unicode字符以 \\u開頭。
	 * @param str 字符串
	 * @return true表示以Unicode字符開頭.
	 */
	private static boolean isStartWithUnicode(final String str) {
		if (null == str || str.length() == 0) {
			return false;
		}
		if (!str.startsWith("\\u")) {
			return false;
		}
		// \u6B65
		if (str.length() < 6) {
			return false;
		}
		String content = str.substring(2, 6);
		
		boolean isMatch = Pattern.matches(pattern, content);
		return isMatch;
	}
	
	/**
	 * 字符串中，所有以 \\u 開頭的UNICODE字符串，全部替換成漢字
	 * @param strParam
	 * @return
	 */
	public static String unicodeToCn(final String str) {
		// 用於構建新的字符串
		StringBuilder sb = new StringBuilder();
		// 從左向右掃描字符串。tmpStr是還沒有被掃描的剩餘字符串。
		// 下面有兩個判斷分支：
		// 1. 如果剩餘字符串是Unicode字符開頭，就把Unicode轉換成漢字，加到StringBuilder中。然後跳過這個Unicode字符。
		// 2.反之， 如果剩餘字符串不是Unicode字符開頭，把普通字符加入StringBuilder，向右跳過1.
		int length = str.length();
		for (int i = 0; i < length;) {
			String tmpStr = str.substring(i);
			if (isStartWithUnicode(tmpStr)) { // 分支1
				sb.append(ustartToCn(tmpStr));
				i += 6;
			} else { // 分支2
				sb.append(str.substring(i, i + 1));
				i++;
			}
		}
		return sb.toString();
	}
}

下面我們要測試一下代碼。我們讀取了一個JSON文件，文件中有 \u + Unicode 的內容。

讀取文件的 FileUtils.java:

package zhangchao.common.utils;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;

/**
 * 文件工具類
 * @author 張超
 *
 */
public final class FileUtils {
	
	/**
	 * 讀取文件內容，並把內容作爲字符串返回
	 * @param f 要讀取的文件
	 * @return 字符串形式的文件內容。
	 */
	public static String readAsString(File f) {
		BufferedReader br = null;
		StringBuilder sb = new StringBuilder();
		try {
			br = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
			String str = br.readLine();
			while (null != str) {
				sb.append(str).append("\n");
				str = br.readLine();
			}
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		} finally {
			try {
				if (null != br) {
				br.close();
				br = null;
				}
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
		return sb.toString();
	}
}

用於測試的主類，TestUnicode.java：

package zhangchao.test;

import zhangchao.common.utils.FileUtils;
import zhangchao.common.unicode.UicodeBackslashU;

import java.io.File;

/**
 * 測試 \\u +　Unicode　轉換成漢字
 * @author 張超
 *
 */
public class TestUnicode {

	public static void main(String[] args) {
		String jsonStr = FileUtils.readAsString(new File("src/test/resources/MyJson.json"));
		String str = UicodeBackslashU.unicodeToCn(jsonStr);
		System.out.println(str);
	}

}

MyJson.json 的文件內容：

{
    "msg":"success",
    "data":{
        "userId":"12363324",
        "collegeName":"\u8BA1\u7B97\u673A\u5B66\u9662",
        "className":"\u8F6F\u4EF6\u4E00\u73ED"
    }
}

程序的運行結果：

{
    "msg":"success",
    "data":{
        "userId":"12363324",
        "collegeName":"計算機學院",
        "className":"軟件一班"
    }
}

下面的圖片解釋了 UicodeBackslashU 類的工作原理：

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【131】Java把\u開頭的Unicode編碼轉換成漢字

elk3

Python 將PDF轉爲PDF/A、PDF/X，以及PDF/A轉回PDF

號稱能打敗MLP的KAN到底行不行？數學核心原理全面解析

同事使用 insert into select 遷移數據，開開心心上線，上線後被公司開除！

DeepFilterNet復現

【143】Java獲取HTML代碼中視頻video標籤的URL地址

【142】Java獲取HTML代碼中的圖片URL地址

【141】Java獲得正則表達式匹配的內容

【137】MySQL5.7創建只讀用戶

【138】七牛雲兩個賬戶之間數據遷移

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結