JAVA获取文件编码

原創

2020-02-24 21:42

当读取文件时,我们一般都会指定文本或字符串使用的编码格式,但有时我们不清楚是什么编码的时候,我们需要分析文件或字符是什么编码,我们可以使用以下代码.

 /**
  * 获取文件编码
  * @param file 要分析的文件
  **/
public static String getCharset(File file) {
	String charset = "GBK"; // 默认编码
	byte[] first3Bytes = new byte[3];
	BufferedInputStream bis = null;
	try {
		boolean checked = false;
		bis = new BufferedInputStream(new FileInputStream(file));
		bis.mark(0);
		int read = bis.read(first3Bytes, 0, 3);
		if (read == -1)
			return charset;
		if (first3Bytes[0] == (byte) 0xFF && first3Bytes[1] == (byte) 0xFE) {
			charset = "UTF-16LE";
			checked = true;
		} else if (first3Bytes[0] == (byte) 0xEF
				&& first3Bytes[1] == (byte) 0xBB
				&& first3Bytes[2] == (byte) 0xBF) {
			charset = "UTF-8";
			checked = true;
		}
		bis.reset();
		if (!checked) {
			int loc = 0;
			while ((read = bis.read()) != -1) {
				loc++;
				if (read >= 0xF0)
					break;
				// 单独出现BF以下的，也算是GBK
				if (0x80 <= read && read <= 0xBF)
					break;
				if (0xC0 <= read && read <= 0xDF) {
					read = bis.read();
					if (0x80 <= read && read <= 0xBF)// 双字节 (0xC0 - 0xDF)
						// (0x80 -0xBF),也可能在GB编码内
						continue;
					else
						break;
					// 也有可能出错，但是机率较小
				} else if (0xE0 <= read && read <= 0xEF) {
					read = bis.read();
					if (0x80 <= read && read <= 0xBF) {
						read = bis.read();
						if (0x80 <= read && read <= 0xBF) {
							charset = "UTF-8";
							break;
						} else
							break;
					} else
						break;
				}
			}
			System.out.println(loc + " " + Integer.toHexString(read));
		}
		bis.close();
	} catch (Exception e) {
		e.printStackTrace();
	} finally {
		if (bis != null) {
			try {
				bis.close();
			} catch (Exception e) {
				e.printStackTrace();
			}
		}
	}
	return charset;
}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

JAVA获取文件编码

我真的从测试转成了开发......

[oeasy]python020在游戏中体验数值自由_勇闯地下城_终端文字游戏

docker启动hub和Chrome node

ORA-28000 the account is locked

Javascript || && 運算符

根據IP獲取天氣預報信息29種樣式

文字編碼轉換[待補充]

Eclipse 設置文件默認Editor

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結