CSDN博客

img lynnboy

2.1 - [lex.phases] - 【词法.阶段】

发表于2004/10/30 2:56:00  1448人阅读

分类: ISO/IEC 14882:1998

请不要转载本文;请不要以任何形式重新出版,发布本文;请在下载本文 24 小时内将其删除;禁止将本文用于商业目的。

2 Lexical conventions [lex]

2.1 Phases of translation [lex.phases]

 

2 词法约定 【词法】

2.1 翻译阶段 【词法.阶段】

 

The precedence among the syntax rules of translation is specified by the following phases.13)
  1. Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences (2.3) are replaced by corresponding single-character internal representations. Any source file character not in the basic source character set (2.2) is replaced by the universal-character-name that designates that character. (An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (i.e. using the /uXXXX notation), are handled equivalently.)
  2. Each instance of a new-line character and an immediately preceding backslash character is deleted, splicing physical source lines to form logical source lines. If, as a result, a character sequence that matches the syntax of a universal-character-name is produced, the behavior is undefined. If a source file that is not empty does not end in a new-line character, or ends in a new-line character immediately preceded by a backslash character, the behavior is undefined.
  3. The source file is decomposed into preprocessing tokens (2.4) and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or partial comment14). Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined. The process of dividing a source file's characters into preprocessing tokens is context-dependent. [Example: see the handling of < within a #include preprocessing directive. ]
  4. Preprocessing directives are executed and macro invocations are expanded. If a character sequence that matches the syntax of a universal-character-name is produced by token concatenation (16.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively.
  5. Each source character set member, escape sequence, or universal-character-name in character literals and string literals converted to a member of the execution character set (2.13.2, 2.13.4).
  6. Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated.
  7. White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. (2.6). The resulting tokens are syntactically and semantically analyzed and translated. [Note: Source files, translation units and translated translation units need not necessarily be stored as files, nor need there be any one-to-one correspondence between these entities and any external representation. The description is conceptual only, and does not specify any particular implementation. ]
  8. Translated translation units and instantiation units are combined as follows: [Note: some or all of these may be supplied from a library. ] Each translated translation unit is examined to produce a list of required instantiations .[Note: this may include instantiations which have been explicitly requested (14.7.2). ] The definitions of the required templates are located. It is implementation-defined whether the source of the translation units containing these definitions is required to be available. [Note: an implementation could encode sufficient information into the translated unit so as to ensure the source is not required here. ] All the required instantiations are performed to produce instantiation units. [Note: these are similar to translated translation units, but contain no references to uninstantiated templates and no template definitions. ] The program is ill-formed if any instantiation fails.
  9. All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment.

 

翻译的语法规则中的先后次序由下面描述的阶段指定。13)
  1. 当需要时,源文件字符被以实现定义的方式物理映射到基本源字符集(为行结束符引入换行字符)。用相应内部表示的单个字符取代三字符序列(2.3)。不在基本源字符集(2.2)中的任何源文件字符由代表该字符的 统一字符名称取代。(实现可以使用任何内部编码,只要能够保证源文件中实际遇到的扩展字符与源文件中代表同一个扩展字符的统一字符名称(如使用 /uXXXX 标记)被等价处理即可。)
  2. 所有紧跟的反斜杠字符和换行字符都被删除,将物理的源文本行连接成逻辑的源文本行。如果其结果使得产生了一个符合统一字符名称语法的字符序列,则其行为是未定义的。如果源文件非空但不以换行字符结尾,或结尾的换行字符紧跟一个反斜杠字符,则其行为是未定义的。
  3. 源文件被分解为预处理标记和空白字符(包括注释)序列。源文件不应该在不完全预处理标记或不完全注释处结束14)。每个注释都被一个空格字符取代。换行字符保持不变。由实现定义是否将不含有换行字符的空白字符序列替换为单个空格字符。将源文件字符分割成预处理标记的过程与其上下文相关。【例:参见在 #include 预处理指令中 < 的处理。】
  4. 执行预处理指令并扩展宏调用。如果标记连接(16.3.3)产生了一个符合统一字符名称语法的字符序列,则其行为是未定义的。#include 预处理指令将导致命名的头或源文件从阶段 1 到阶段 4 的处理。
  5. 在字符文字量和字符串文字量中的每个源字符集成员,转义序列或统一字符名称被转换为执行字符集的成员(2.13.2,2.13.4)。
  6. 相邻的普通字符串文字量标记被连接起来。相邻的宽字符串文字量标记被连接起来。
  7. 空白字符分割标记不再有效。每个预处理标记被转换为一个标记。(2.6)。由此产生的标记将经过语法和语义分析并被翻译。【注:源文件,翻译单元和已翻译单元不必作为文件存储,也不必在这些实体和外部表示之间一一对应。这仅是个概念上的描述,不规定任何特定实现。】
  8. 已翻译单元和实例化单元按如下方式组合:【注:其中的某些或全部都可能由库提供。】检查每个已翻译单元以产生一系列被要求的实例化。【注:这可能包括显式要求的实例化(14.7.2)。】定位所需要的模板定义。包含这些定义的翻译单元的源是否必须可用由实现定义。【注:实现可以将充分的信息编码进翻译单元以保证此时不需要源。】所有要求的实例化的执行产生实例化单元。【注:它们与已翻译单元相似,但不包含任何为实例化模板的引用和任何模板定义。】如果任何实例化失败,则程序就是病态形式的。
  9. 解决所有外部对象和函数的引用。将库组件连接进程序,以满足当前翻译中未定义的函数和对象的外部引用。所有这些翻译器输出被汇集成一个程序映像,此影响包含在其执行环境中执行所需要的信息。

 

13) Implementations must behave as if these separate phases occur, although in practice different phases might be folded together.

 

13) 即使不同的阶段实际上可以重叠起来,实现仍然必须表现为好像这些阶段分别发生。

 

14) A partial preprocessing token would arise from a source file ending in the first portion of a multi-character token that requires a terminating sequence of characters, such as a header-name that is missing the closing " or >. A partial comment would arise from a source file ending with an unclosed /* comment.

 

14) 不完全预处理标记,可能在一个以某个需要终结字符序列的多字符标记的前一小段,例如丢失关闭的 " 或 > 的头名称,作为结尾的源文件中出现。不完全注释可能在一个以没有关闭的 /* 注释未结尾的源文件中出现。

 

PREV [lex] | NEXT [lex.charset] 上一页 【词法】 | 下一页 【词法.字符集】
0 0

相关博文

我的热门文章

img
取 消
img