关于代码点的理解

关于代码单元和代码点我的理解是:  
  1、一个代码点可能包含一个或两个代码单元。  
  2、在我的测试程序中,“我   ”也只占用一个代码单元。即代码点数等于代码单元数。  
  下面是在unicode的官方网站上找到的关于unicode的中文,韩文,日文的一些说明:  
  Q:   I   have   heard   that   UTF-8   does   not   support   some   Japanese   characters.   Is   this   correct?  
   
  A:   There   is   a   lot   of   misinformation   floating   around   about   the   support   of   Chinese,   Japanese   and   Korean   (CJK)   characters.   The   Unicode   Standard   supports   all   of   the   CJK   characters   from   JIS   X   0208,   JIS   X   0212,   JIS   X   0221,   or   JIS   X   0213,   for   example,   and   many   more.   This   is   true   no   matter   which   encoding   form   of   Unicode   is   used:   UTF-8,   UTF-16,   or   UTF-32.  
   
  Unicode   supports   over   70,000   CJK   characters   right   now,   and   work   is   underway   to   encode   further   additions.   The   International   Standard   ISO/IEC   10646   and   the   Unicode   Standard   are   completely   synchronized   in   repertoire   and   content.   And   that   means   that   Unicode   has   the   same   repertoire   as   GB   18030,   since   that   also   is   synchronized   with   ISO   10646   —   although   with   a   different   ordering   and   byte   format.  
   
  是否无论是那个编码方式(UTF-8,   UTF-16,   or   UTF-32)都可以对中文支持支持的程度都是一样的,我的意思是三种编码支持的中文字符数相等?  
   
  我的测试程序如下:  
  public   class   test0   {  
          public   static   void   main(String[]   args)  
                    {String   a="我   ";  
                      int   cuCount=a.length();  
                      System.out.println("the   number   of   code   units   required   for   string   /"test/"   in   the   UTF-16   encoding   is   "+cuCount);  
                      int   cpCount=a.codePointCount(0,   a.length());  
                      System.out.println("the   number   of   code   points   is   "+cpCount);  
                      System.out.println("the   end   of   string   /"我   /"   is   "+a.charAt(a.length()-1));                    
                     
                    }  
                     
  }  
   
  输出结果为:  
  the   number   of   code   units   required   for   string   "test"   in   the   UTF-16   encoding   is   2  
  the   number   of   code   points   is   2  
  the   end   of   string   "我   "   is   [空格]   
 
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章