關於代碼點的理解

關於代碼單元和代碼點我的理解是:  
  1、一個代碼點可能包含一個或兩個代碼單元。  
  2、在我的測試程序中,“我   ”也只佔用一個代碼單元。即代碼點數等於代碼單元數。  
  下面是在unicode的官方網站上找到的關於unicode的中文,韓文,日文的一些說明:  
  Q:   I   have   heard   that   UTF-8   does   not   support   some   Japanese   characters.   Is   this   correct?  
   
  A:   There   is   a   lot   of   misinformation   floating   around   about   the   support   of   Chinese,   Japanese   and   Korean   (CJK)   characters.   The   Unicode   Standard   supports   all   of   the   CJK   characters   from   JIS   X   0208,   JIS   X   0212,   JIS   X   0221,   or   JIS   X   0213,   for   example,   and   many   more.   This   is   true   no   matter   which   encoding   form   of   Unicode   is   used:   UTF-8,   UTF-16,   or   UTF-32.  
   
  Unicode   supports   over   70,000   CJK   characters   right   now,   and   work   is   underway   to   encode   further   additions.   The   International   Standard   ISO/IEC   10646   and   the   Unicode   Standard   are   completely   synchronized   in   repertoire   and   content.   And   that   means   that   Unicode   has   the   same   repertoire   as   GB   18030,   since   that   also   is   synchronized   with   ISO   10646   —   although   with   a   different   ordering   and   byte   format.  
   
  是否無論是那個編碼方式(UTF-8,   UTF-16,   or   UTF-32)都可以對中文支持支持的程度都是一樣的,我的意思是三種編碼支持的中文字符數相等?  
   
  我的測試程序如下:  
  public   class   test0   {  
          public   static   void   main(String[]   args)  
                    {String   a="我   ";  
                      int   cuCount=a.length();  
                      System.out.println("the   number   of   code   units   required   for   string   /"test/"   in   the   UTF-16   encoding   is   "+cuCount);  
                      int   cpCount=a.codePointCount(0,   a.length());  
                      System.out.println("the   number   of   code   points   is   "+cpCount);  
                      System.out.println("the   end   of   string   /"我   /"   is   "+a.charAt(a.length()-1));                    
                     
                    }  
                     
  }  
   
  輸出結果爲:  
  the   number   of   code   units   required   for   string   "test"   in   the   UTF-16   encoding   is   2  
  the   number   of   code   points   is   2  
  the   end   of   string   "我   "   is   [空格]   
 
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章