编程语言

img lynnboy

2.2 - [lex.charset] - 【词法.字符集】

发表于2004/10/30 4:14:00  1580人阅读

请不要转载本文;请不要以任何形式重新出版,发布本文;请在下载本文 24 小时内将其删除;禁止将本文用于商业目的。

2 Lexical conventions [lex]

2.2 Character sets [lex.charset]

 

2 词法约定 【词法】

2.2 字符集 【词法.字符集】

 

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:15)

    a b c d e f g h i j k l m n o p q r s t u v w x y z
    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    0 1 2 3 4 5 6 7 8 9
    _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , / " '

 

基本源字符集由 96 个字符组成:空格字符,表示水平表格,垂直表格,换页,换行的控制字符,加上下列 91 个图形字符:15)

    a b c d e f g h i j k l m n o p q r s t u v w x y z
    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    0 1 2 3 4 5 6 7 8 9
    _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , / " '

 

The universal-character-name construct provides a way to name other characters.

    hex-quad:
        hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit

    universal-character-name:
        /u hex-quad
        /U hex-quad hex-quad

The character designated by the universal-character-name /UNNNNNNNN is that character whose character short name in ISO/IEC 10646 is NNNNNNNN; the character designated by the universal-character-name /uNNNN is that character whose character short name in ISO/IEC 10646 is 0000NNNN. If the hexadecimal value for a universal character name is less than 0x20 or in the range 0x7F-0x9F (inclusive), or if the universal character name designates a character in the basic source character set, then the program is ill-formed.

 

统一字符名称提供了为其他字符命名的构造。

    hex-四位组:
        十六进制数字 十六进制数字 十六进制数字 十六进制数字

    统一字符名称:
        /u hex-四位组
        /U hex-四位组 hex-四位组

由统一字符名称 /UNNNNNNNN 指定的字符是在 ISO/IEC 10646 中具有短名称 NNNNNNNN 的字符;由统一字符名称 /uNNNN 指定的字符是在 ISO/IEC 10646 中具有短名称 0000NNNN 的字符。如果某个统一字符名称的十六进制数值小于 0x20 或在 0x7F-0x9F 之间(包含的),或如果某个统一字符名称指定的字符在基本源字符集中,程序就是病态形式的。

The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character (respectively, null wide character), whose representation has all zero bits. For each basic execution character set, the values of the members shall be non-negative and distinct from one another. The execution character set and the execution wide-character set are supersets of the basic execution character set and the basic execution wide-character set, respectively. The values of the members of the execution character sets are implementation-defined, and any additional members are locale-specific.

 

基本执行字符集基本执行宽字符集都应该包含所有基本源字符集的成员,加上表示警报,退格,回车的控制字符,再加上表现为全零位的无效字符无效宽字符)。每个基本执行字符集的任何成员的值都应该为非负数,并相互区别开。执行字符集执行宽字符集是基本执行字符集和基本执行宽字符集的超集,各自的执行字符集的成员数值由实现定义,并且任何额外成员是现场指定的。

 

15) The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set. However, because the mapping from source file characters to the source character set (described in translation phase 1) is specified as implementation-defined, an implementation is required to document how the basic source characters are represented in source files.

 

15) 基本源字符集的字型特意与 ISO/IEC 10646 中与 ASCII 字符集一致的子集相同。由于从源文件字符到源字符集的映射(在翻译阶段 1 中描述)是由实现定义的,实现应该提供基本源字符在源文件中表示方式的文档。

 

PREV [lex.phases] | NEXT [lex.trigraph] 上一页 【词法.阶段】 | 下一页 【词法.三连符】
阅读全文
0 0

相关文章推荐

img
取 消
img