判斷文件的編碼

首先，不同編碼的文本，是根據文本的前兩個字節來定義其編碼格式的。定義如下：

  ANSI：　　　　　　　　無格式定義；
  Unicode：　　　　　　前兩個字節爲FFFE；
  Unicode big endian：　前兩字節爲FEFF；　
  UTF-8：　　　　　　　前兩字節爲EFBB；　

  知道了各種編碼格式的區別，寫代碼就容易了.

public static String get_charset( File file ) {
String charset = "GBK";
byte[] first3Bytes = new byte[3];
try {
boolean;
BufferedInputStream bis = new BufferedInputStream( new FileInputStream( file ) );
bis.mark( 0 );
int read = bis.read( first3Bytes, 0, 3 );
if ( read == -1 ) return charset;
if ( first3Bytes[0] == (byte) 0xFF && first3Bytes[1] == (byte) 0xFE ) {
charset = "UTF-16LE";
checked = true;
}
else if ( first3Bytes[0] == (byte) 0xFE && first3Bytes[1] == (byte) 0xFF ) {
charset = "UTF-16BE";
checked = true;
}
else if ( first3Bytes[0] == (byte) 0xEF && first3Bytes[1] == (byte) 0xBB && first3Bytes[2] == (byte) 0xBF ) {
charset = "UTF-8";
checked = true;
}
bis.reset();
if ( !checked ) {
// int len = 0;
int loc = 0;
while ( (read = bis.read()) != -1 ) {
loc++;
if ( read >= 0xF0 ) break;
if ( 0x80 <= read && read <= 0xBF ) // 單獨出現BF以下的，也算是GBK
break;
if ( 0xC0 <= read && read <= 0xDF ) {
read = bis.read();
if ( 0x80 <= read && read <= 0xBF ) // 雙字節 (0xC0 - 0xDF) (0x80
// - 0xBF),也可能在GB編碼內
continue;
else break;
}
else if ( 0xE0 <= read && read <= 0xEF ) {// 也有可能出錯，但是機率較小
read = bis.read();
if ( 0x80 <= read && read <= 0xBF ) {
read = bis.read();
if ( 0x80 <= read && read <= 0xBF ) {
charset = "UTF-8";
break;
}
else break;
}
else break;
}
}
//System.out.println( loc + " " + Integer.toHexString( read ) );
}
bis.close();
} catch ( Exception e ) {
e.printStackTrace();
}
return charset;
}

From: http://ajava.org/code/I18N/14816.html

判斷文件的編碼

如何升級android的sdk

關於java面試題之拆分字符串，其中中文不能拆分成亂碼

判斷文件的編碼

爲什麼要使用線程池

OpenGL編程

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結