how to get charset from string and file

a.get charset from string

public String getCharsetFromString(String srcString) throws IOException {
BufferedInputStream bin = new BufferedInputStream(new ByteArrayInputStream(

srcString.getBytes()));
int p = (bin.read() << 8) + bin.read();
String code = null;
//the 0xefbb、0xfffe、0xfeff、0x5c75 at the beginning of each string, can be used to defines the char set
switch (p) {
case 0xefbb:
code = "UTF-8";
break;
case 0xfffe:
code = "Unicode";
break;
case 0xfeff:
code = "UTF-16BE";
break;
case 0x5c75:
code = "ANSI|ASCII";
break;
default:
code = "ISO-8859-1";
}
return code;

}

b.get charset from file(not sure)

public String getCharsetFromFile(String filePath)throwsIOException{

FileInputStream fis =null;

InputStreamReader isr =null;

String s;

try{

//new input stream reader is created

fis =newFileInputStream(filePath);

isr =newInputStreamReader(fis);

//the name of the character encoding returned

s=isr.getEncoding();

}catch(Exception e){

// print error

System.out.print("The stream is already closed");

}finally{

// closes the stream and releases resources associatedif(fis!=null)

fis.close();if(isr!=null)

isr.close();

}

return s;

}

You cannot determine the encoding of a arbitrary byte stream. This is the nature of encodings. A encoding means a mapping between a byte value and its representation. So every encoding "could" be the right.

The getEncoding() method will return the encoding which was set up (read the JavaDoc) for the stream. It will not guess the encoding for you.

Some streams tell you which encoding was used to create them: XML, HTML. But not an arbitrary byte stream.

Anyway, you could try to guess an encoding on your own if you have to. Every language has a common frequency for every char. In English the char e appears very often but ê will appear very very seldom. In a ISO-8859-1 stream there are usually no 0x00 chars. But a UTF-16 stream has a lot of them.

Or: you could ask the user. I've already seen applications which present you a snippet of the file in different encodings and ask you to select the "correct" one.

c.get charset from file(sure)

private String getCharsetByInputStream(InputStream ins){
String charset = "";
if(null != ins){
UniversalDetector detector = new UniversalDetector(null);
try {
byte[] buf = new byte[ins.available()];
int nread;

while ((nread = ins.read(buf)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
} catch (IOException e) {
LOG.error("--getCharsetByInputStream:error happened while getting charset from inputstream. ",e);
charset = "utf-8";
return charset;
}
detector.dataEnd();
charset = detector.getDetectedCharset();
if (charset == null || "".equals(charset)) {
charset = "utf-8";
}

detector.reset();
}else{
charset = "utf-8";
}

return charset;
}

link to http://code.google.com/p/juniversalchardet/

then read inputstream as string with detected charset

private String parseTruncaredSizeBinaryResourceToString(Integer resourceKey, Integer limitedSize){
String truncaredResourceText = null;
InputStream ins = proactiveAnalysisService.retrieveGlobalResourceBinary(resourceKey, null, limitedSize);
if (null != ins) {
String charset = getCharsetByInputStream(ins);
InputStreamReader reader = null;

try {

//skip to beginning after get charset by inputStream(which leads to end of inputStream)
ins.reset();
//ins.skip(ins.available());
} catch (IOException e) {
LOG.error("-- parseTruncaredSizeBinaryResourceToString:error happened while reading content from resource inputstream. ",e);
}

try {
reader = new InputStreamReader(ins, charset);
} catch (UnsupportedEncodingException e) {
LOG.error("-- parseTruncaredSizeBinaryResourceToString:error happened while reading content from resource inputstream. ",e);
return truncaredResourceText;
}

OutputStream out = null;
try {
out = new ByteArrayOutputStream();
int i = -1;
while ((i = reader.read()) != -1) {
out.write(i);
}

truncaredResourceText = out.toString();
} catch (IOException e) {
LOG.error("-- parseTruncaredSizeBinaryResourceToString:Error hanppend when reading inputsream. ", e);
} finally {
try{
if (null != out) {
out.close();
}
if (null != ins) {
ins.close();
}
if(null != reader){
reader.close();
}
}catch(IOException e){
LOG.error("-- parseTruncaredSizeBinaryResourceToString:Error hanppend while close inputsream. ",e);
}

}
}
return truncaredResourceText;

}

link to http://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream

andhow to change index position of inputstream http://stackoverflow.com/questions/3474911/changing-the-index-positioning-in-inputstream

how to get charset from string and file

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

Navicat安裝與激活教程

TDengine docker安裝方法

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

how to clear file input field?

how to get charset from string and file

How to write unit test for CommonsMultipartFile with Mock

我的友情鏈接

java.lang.OutOfMemoryError

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結