### 2 词法约定 【词法】

#### 2.2 字符集 【词法.字符集】

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:15)

    a b c d e f g h i j k l m n o p q r s t u v w x y z    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z    0 1 2 3 4 5 6 7 8 9    _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , / " '

    a b c d e f g h i j k l m n o p q r s t u v w x y z    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z    0 1 2 3 4 5 6 7 8 9    _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , / " '

The universal-character-name construct provides a way to name other characters.

universal-character-name:
/u hex-quad
/U hex-quad hex-quad

The character designated by the universal-character-name /UNNNNNNNN is that character whose character short name in ISO/IEC 10646 is NNNNNNNN; the character designated by the universal-character-name /uNNNN is that character whose character short name in ISO/IEC 10646 is 0000NNNN. If the hexadecimal value for a universal character name is less than 0x20 or in the range 0x7F-0x9F (inclusive), or if the universal character name designates a character in the basic source character set, then the program is ill-formed.

hex-四位组:
十六进制数字 十六进制数字 十六进制数字 十六进制数字

统一字符名称:
/u hex-四位组
/U hex-四位组 hex-四位组

The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character (respectively, null wide character), whose representation has all zero bits. For each basic execution character set, the values of the members shall be non-negative and distinct from one another. The execution character set and the execution wide-character set are supersets of the basic execution character set and the basic execution wide-character set, respectively. The values of the members of the execution character sets are implementation-defined, and any additional members are locale-specific.

15) The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set. However, because the mapping from source file characters to the source character set (described in translation phase 1) is specified as implementation-defined, an implementation is required to document how the basic source characters are represented in source files.

15) 基本源字符集的字型特意与 ISO/IEC 10646 中与 ASCII 字符集一致的子集相同。由于从源文件字符到源字符集的映射（在翻译阶段 1 中描述）是由实现定义的，实现应该提供基本源字符在源文件中表示方式的文档。

PREV [lex.phases] | NEXT [lex.trigraph] 上一页 【词法.阶段】 | 下一页 【词法.三连符】

0 0