2.2 - [lex.charset] - 【詞法.字符集】

請不要轉載本文;請不要以任何形式重新出版,發佈本文;請在下載本文 24 小時內將其刪除;禁止將本文用於商業目的。

2 Lexical conventions [lex]

2.2 Character sets [lex.charset]

 

2 詞法約定 【詞法】

2.2 字符集 【詞法.字符集】

 

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:15)

    a b c d e f g h i j k l m n o p q r s t u v w x y z
    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    0 1 2 3 4 5 6 7 8 9
    _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , / " '

 

基本源字符集由 96 個字符組成:空格字符,表示水平表格,垂直表格,換頁,換行的控制字符,加上下列 91 個圖形字符:15)

    a b c d e f g h i j k l m n o p q r s t u v w x y z
    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    0 1 2 3 4 5 6 7 8 9
    _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , / " '

 

The universal-character-name construct provides a way to name other characters.

    hex-quad:
        hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit

    universal-character-name:
        /u hex-quad
        /U hex-quad hex-quad

The character designated by the universal-character-name /UNNNNNNNN is that character whose character short name in ISO/IEC 10646 is NNNNNNNN; the character designated by the universal-character-name /uNNNN is that character whose character short name in ISO/IEC 10646 is 0000NNNN. If the hexadecimal value for a universal character name is less than 0x20 or in the range 0x7F-0x9F (inclusive), or if the universal character name designates a character in the basic source character set, then the program is ill-formed.

 

統一字符名稱提供了爲其他字符命名的構造。

    hex-四位組:
        十六進制數字 十六進制數字 十六進制數字 十六進制數字

    統一字符名稱:
        /u hex-四位組
        /U hex-四位組 hex-四位組

由統一字符名稱 /UNNNNNNNN 指定的字符是在 ISO/IEC 10646 中具有短名稱 NNNNNNNN 的字符;由統一字符名稱 /uNNNN 指定的字符是在 ISO/IEC 10646 中具有短名稱 0000NNNN 的字符。如果某個統一字符名稱的十六進制數值小於 0x20 或在 0x7F-0x9F 之間(包含的),或如果某個統一字符名稱指定的字符在基本源字符集中,程序就是病態形式的。

The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character (respectively, null wide character), whose representation has all zero bits. For each basic execution character set, the values of the members shall be non-negative and distinct from one another. The execution character set and the execution wide-character set are supersets of the basic execution character set and the basic execution wide-character set, respectively. The values of the members of the execution character sets are implementation-defined, and any additional members are locale-specific.

 

基本執行字符集基本執行寬字符集都應該包含所有基本源字符集的成員,加上表示警報,退格,回車的控制字符,再加上表現爲全零位的無效字符無效寬字符)。每個基本執行字符集的任何成員的值都應該爲非負數,並相互區別開。執行字符集執行寬字符集是基本執行字符集和基本執行寬字符集的超集,各自的執行字符集的成員數值由實現定義,並且任何額外成員是現場指定的。

 

15) The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set. However, because the mapping from source file characters to the source character set (described in translation phase 1) is specified as implementation-defined, an implementation is required to document how the basic source characters are represented in source files.

 

15) 基本源字符集的字型特意與 ISO/IEC 10646 中與 ASCII 字符集一致的子集相同。由於從源文件字符到源字符集的映射(在翻譯階段 1 中描述)是由實現定義的,實現應該提供基本源字符在源文件中表示方式的文檔。

 

PREV [lex.phases] | NEXT [lex.trigraph] 上一頁 【詞法.階段】 | 下一頁 【詞法.三連符】
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章