兩種轉碼方式的區別

原創

2020-06-21 21:15

這是張孝祥老師的java就業培訓視頻教程裏面的一道題目（有所變動）：
編寫下面的程序代碼，分析和觀察程序的運行結果：

import java.io.*;
public class TestCodeIO {
       public static void main(String[] args) throws Exception{
             InputStreamReader isr = new InputStreamReader(System.in,"iso8859-1");
             BufferedReader br = new BufferedReader (isr);
             String strLine = br.readLine();
             br.close();
             isr.close();
             System.out.println(strLine);
       }
}

運行程序後，輸入“中國”兩個字，輸出結果爲 ???ú
請按照下面兩種方法修改上述程序，是輸入的中文能夠正常輸出
1。修改程序中的語句

InputStreamReader isr = new InputStreamReader(System.in,"iso8859-1");

2。不修改上面的語句，修改下面的語句

System.out.println(strLine);

第一種該法很簡單，只要改成下面這樣就可以了，這裏不詳細討論

InputStreamReader isr = new InputStreamReader(System.in,"gb2312");

這裏我要詳細討論的是第二種該法怎麼改

起初我是這樣改的
System.out.println(new String (strLine.getBytes(),"iso8859-1"));
輸入“中國”後輸出的結果雖然不是上面所述的亂碼，但是還是亂碼，顯然這種該法是不正確的！

這裏我要感謝軟件民工告訴我的正確改法，使我恍然大悟

  System.out.println(new String (strLine.getBytes("iso8859-1")));

這兩種改法究竟有什麼區別呢？爲了方便大家閱讀，我先把正確和錯誤的改法帖出來：

import java.io.*;
      public class TestCodeIO {
            public static void main(String[] args) throws Exception{
                  InputStreamReader isr = new InputStreamReader(System.in,"iso8859-1");
                        //Create an InputStreamReader that uses the given charset decoder
                  BufferedReader br = new BufferedReader (isr);
                  String strLine = br.readLine();
                  br.close();
                  isr.close();
                  System.out.println(strLine);
                  System.out.println(new String (strLine.getBytes(),"iso8859-1"));//錯誤改法
                        //Encodes this String (strLine) into a sequence of bytes using the platform’s
                        //default charset(gb2312) then constructs a new String by decoding the
                       //specified array of bytes using the specified charset (iso8859-1)
                       //because this String (strLine) uses the charset decoder "iso8859-1",so it can
                       //only be encoded by "iso8859-1",cann’t be encoded by the platform’s default
                       //charset "gb2312",so this line is wrong.
                 System.out.println(new String (strLine.getBytes("iso8859-1")));//正確改法
                      //Encodes this String (strLine) into a sequence of bytes using the named
                      //charset (iso8859-1),then constructs a new String by decoding the
                      //specified array of bytes using the platform’s default charset (gb2312).
                      //This line is right.
        }
}

上面的英文註釋已經說得很清楚了，這裏我還是解釋一下吧：

首先是錯誤的改法

System.out.println(new String (strLine.getBytes(),"iso8859-1"));

這句代碼是將strLine中的字符串用系統默認的編碼方式（這裏是gb2312）
轉換爲字節序列，然後用指定的編碼方式（這裏是iso8859-1）構造一個新的
String對象，並打印到屏幕上。
錯誤在哪裏呢？
請注意這一段代碼

InputStreamReader isr = new InputStreamReader(System.in,"iso8859-1");
BufferedReader br = new BufferedReader (isr);
String strLine = br.readLine();

這裏strLine存儲的內容是用指定的編碼方式（iso8859-1）存儲的，而轉換成字節碼
的時候（這句代碼strLine.getBytes()）卻使用了系統默認的gb2312編碼，所以當然就
輸出亂碼了！然後用gb2312編碼的字節序列構建新的String對象的時候又使用了
iso8859-1編碼，所以輸出的亂碼和System.out.println(strLine)有所不同。

至於正確的改法就不用詳細說明了吧，首先將strLine用iso8859-1編碼方式轉換成字節
序列，然後用系統默認的編碼方式（gb2312）構建新的String對象，然後打印輸出。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

兩種轉碼方式的區別

dwr 簡單例子

兩種轉碼方式的區別

syntaxhighlighter語法高亮插件

jdk1.5增強的for循環

NetBeans使用Eclipse代碼樣式

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結