CSDN博客

img slobber

Rich Text Format (RTF) 中文版规范,版本 1.6

发表于2004/10/17 21:17:00  2483人阅读

分类: RTF1.6 中文规范

  • 介绍
  • RTF 语法
  • RTF 查看器约定
  • 形式语法
  • RTF 文件内容
    • 头 
      • RTF 版本
      • 字符集 
      • Unicode RTF
      • 字体表 
      • 文件表 
      • 颜色表 
      • 样式表 
      • 列表符号表 
      • 跟踪改变 (修订标记)
    • 文档区域 
      • 信息组
      • 文档格式属性
      • 章节文本
      • 段落文本
      • 字符文本
      • 文档变量 
      • 书签
      • 图片
      • 对象
      • 绘图对象
      • Word 97-2000 RTF for Drawing Objects (Shapes)
      • 脚注 
      • 注释 
      • Fields
      • Form Fields
      • 索引项
      • 目录 
      • 双向语言支持
  • 亚洲语言支持
  • 附录 A: Sample RTF Reader Application 
    • How to Write an RTF Reader
    • A Sample RTF Reader Implementation
    • Notes on Implementing Other RTF Features
    • Other Problem Areas in RTF
  • 附录 B: Index of RTF Control Words
  • 附录 C: Control Words Introduced by Other Microsoft Products
    • Pocket Word
    • Exchange (Used in RTF<->HTML Conversions)

介绍

    The Rich Text Format (RTF) 规范是一种在应用程序间转换格式化文本和图形简易编码方法。当前,用户依赖特殊的转换软件在MS-DOSTM,MicrosoftTM WindowsTM,OS/2,MacintoshTM和Power MacintoshTM的应用程序之间来转换字处理文档。
The Rich Text Format (RTF) Specification is a method of encoding formatted text and graphics for easy transfer between applications. Currently, users depend on special translation software to move word-processing documents between different MS-DOSTM, MicrosoftTM WindowsTM, OS/2, MacintoshTM, and Power MacintoshTM applications.

    RTF规范提供一种交互的文本和图形格式,可以在不同输出设备,操作环境和操作系统上使用。RTF使用美国国家标准化组织(ANSI),PC-8,Macintosh,或IBM PC的字符集来控制文档的外观和格式,不管文档是显示在屏幕上还是从打印机打印出来。由于RTF规范,不同操作系统、不同软件创建的文档可以在其他的操作系统和程序中被识别出来。Macintoch和Power Macintosh版的Word 6.0(及以后版本)创建的RTF文件提供一种文件类型——“RTF”。
The RTF Specification provides a format for text and graphics interchange that can be used with different output devices, operating environments, and operating systems. RTF uses the American National Standards Institute (ANSI), PC-8, Macintosh, or IBM PC character set to control the representation and formatting of a document, both on the screen and in print. With the RTF Specification, documents created under different operating systems and with different software applications can be transferred between those operating systems and applications. RTF files created in Word 6.0 (and later) for the Macintosh and Power Macintosh have a file type of "RTF."

    将一个格式化的文件转换为RTF文件的软件成为编辑器。一个RTF编辑器分离现有文本中原程序的控制信息,并且生成一个包含着原文本和RTF组的新文件。将一个RTF文件转换成一个格式化文件的软件称为查看器。
Software that takes a formatted file and turns it into an RTF file is called a writer. An RTF writer separates the application's control information from the actual text and writes a new file containing the text and the RTF groups associated with that text. Software that translates an RTF file into a formatted file is called a reader. 

    这里提供了一个RTF查看器的例子(阅读本文档的附录A:RTF查看器程序示例)。它应用此规范而设计出来。希望它对那些有兴趣开发自己RTF查看器的开发者有所帮助。附录A介绍了这个程序的结构和用法。这个RTF查看器不是一个商业产品,Microsoft公司不对RTF查看器代码和RTF规范提供技术及其他形式的支持。关于如何从Microsoft下载中心下载此示例的更多信息,请访问以下Web地址:www.microsoft.com/downloads/search.asp然后搜索“RTF Reader”。
A sample RTF reader application is available (see the Appendix A: Sample RTF Reader Application section of this document). It is designed for use with the specification to assist those interested in developing their own RTF readers. This application and its use are described in Appendix A. The sample RTF reader is not a for-sale product, and Microsoft does not provide technical or any other type of support for the sample RTF reader code or the RTF specification. For more information on how to download the sample RTF reader from the Microsoft Download Center, please visit the following Web address: www.microsoft.com/downloads/search.asp and then search on "RTF Reader."

    RTF版本1.6包含所有在Microsoft Word for Windows 95 v7.0, Word 97 for Windows, Word 98 for the Macintosh, and Word 2000 for Windows, 也包括微软其它产品中介绍的新的控制字。
RTF Version 1.6 includes all new control words introduced by Microsoft Word for Windows 95 version 7.0, Word 97 for Windows, Word 98 for the Macintosh, and Word 2000 for Windows, as well as other Microsoft products.

RTF语法

    一个RTF文件由未格式化文本、控制字、控制符号和组组成。为了更容易的转换,一个标准的RTF文件应该仅包含7位ASCII码字符。RTF文件没有限制文件的行的最大长度。(再议:maximun line length是指行的字符数还是指文档的行数?)
An RTF file consists of unformatted text, control words, control symbols, and groups. For ease of transport, a standard RTF file can consist of only 7-bit ASCII characters. (Converters that communicate with Microsoft Word for Windows or Microsoft Word for the Macintosh should expect 8-bit characters.) There is no set maximum line length for an RTF file.

    控制字是一种特殊的RTF用来标记打印机控制符的格式化命令,也是程序用来管理文档样式的格式化信息。(再议:措辞不好。)一个控制字不能超过32个字符。一个控制字类似以下形式:
A control word is a specially formatted command that RTF uses to mark printer control codes and information that applications use to manage documents. A control word cannot be longer than 32 characters. A control word takes the following form:

/LetterSequence<Delimiter>

注意 每个控制字是以反斜杠开始的。
    LetterSequence由小写字母字符(a-z)组成。RTF是大小写敏感的。
Note A backslash begins each control word.
The LetterSequence is made up of lowercase alphabetic characters (a-z). RTF is case sensitive.

    以下Word 97-2000关键字并不遵守以上所说的关键字不允许包含任何的大写字母的要求。所有编辑器应该仍然遵守这个规定,而Word的下一个版本也将是关键字完全使用小写字母的版本。同时,建议那些查看器将以下关键字作为例外:

  • /clFitText
  • /clftsWidthN
  • /clNoWrap
  • /clwWidthN
  • /tdfrmtxtBottomN
  • /tdfrmtxtLeftN
  • /tdfrmtxtRightN
  • /tdfrmtxtTopN
  • /trftsWidthAN
  • /trftsWidthBN
  • /trftsWidthN
  • /trwWidthAN
  • /trwWidthBN
  • /trwWidthN
  • /sectspecifygenN
The following Word 97-2000 keywords do not currently follow the requirement that keywords may not contain any uppercase alphabetic characters. All writers should still follow this rule, and Word will also emit completely lowercase versions of all these keywords in the next version. In the meantime, those implementing readers are advised to treat them as exceptions:
  • /clFitText
  • /clftsWidthN
  • /clNoWrap
  • /clwWidthN
  • /tdfrmtxtBottomN
  • /tdfrmtxtLeftN
  • /tdfrmtxtRightN
  • /tdfrmtxtTopN
  • /trftsWidthAN
  • /trftsWidthBN
  • /trftsWidthN
  • /trwWidthAN
  • /trwWidthBN
  • /trwWidthN
  • /sectspecifygenN

    一个RTF控制字的结束由分隔符标记,以下字符可以作为分隔符:

  • 一个空格。在这种情况下,空格作为关键字的一部分。
  • 一个数字或连字符(-), 意味着它是一个数字参数。这数字序列的长度由其后的一个空格或除了字母和数字的其他字符划定。这个参数可以是正数或者负数,它的取值范围通常是从-32767到32767。然而,Word的取值范围可以到达由-31680到31680。Word 允许关键字的小数字参数取值范围在-2,147,483,648到2,147,483,648(特别的,/bin, /revdttm,和一些图像属性)。(再议:a small number of keywords不知所指,应该指这些二进制文件吧。)一个RTF解析器应该能够将一个随意写出的数字字符串转换为一个关键字的合法值。如果一个数值参数紧跟着控制字,这个参数就是控制字的一部分。这时,控制字通过一个空格或非字母数字字符分隔出来,和分隔其他控制字的方式相同。
  • 除了字母和数字的其他字符。这种情况下,此分隔字符结束控制字,而它并不属于控制字的一部分。
The delimiter marks the end of an RTF control word, and can be one of the following:
  • A space. In this case, the space is part of the control word.
  • A digit or a hyphen (-), which indicates that a numeric parameter follows. The subsequent digital sequence is then delimited by a space or any character other than a letter or a digit. The parameter can be a positive or a negative number. The range of the values for the number is generally –32767 through 32767. However, Word tends to restrict the range to –31680 through 31680. Word allows values in the range -2,147,483,648 to 2,147,483,648 for a small number of keywords (specifically /bin, /revdttm, and some picture properties). An RTF parser must handle an arbitrary string of digits as a legal value for a keyword. If a numeric parameter immediately follows the control word, this parameter becomes part of the control word. The control word is then delimited by a space or a nonalphabetic or nonnumeric character in the same manner as any other control word.
  • Any character other than a letter or a digit. In this case, the delimiting character terminates the control word but is not actually part of the control word.

    如果是第一种情况,空格并不会出现在文档中。分隔符之后的所有字符,包括空格,将被写入文档。基于这个理由,你应该尽在需要的情况下使用空格,不要只是将空格用在分隔RTF代码。
If a space delimits the control word, the space does not appear in the document. Any characters following the delimiter, including spaces, will appear in the document. For this reason, you should use spaces only where necessary; do not use spaces merely to break up RTF code. 

    一个控制符号由一个反斜杠以及紧跟着它的一个简单的非字母的字符。例如,/~代表一个不换行空格。控制字符不需要分隔符。
A control symbol consists of a backslash followed by a single, nonalphabetic character. For example, /~ represents a nonbreaking space. Control symbols take no delimiters.

    组由括号({})所括起来的文本和控制字或控制字符组成。左括号({ )意味着组的开始,右括号意味着组的结束。每个组通过它的范围和它所包含的文本的不同属性来设置这段文本的格式。RTF文件也可以包括如下类型的组:字体、样式、屏幕颜色、图片、脚注、注释、页眉和页脚、摘要、字段,以及书签。这些组同样也可以有文档、章、短、字符格式化属性。如果字体、文件、样式、屏幕颜色、修订信息,以及摘要信息组包含着文档格式化属性,那么它们必须在第一块纯文本之前被定义。这些组构成了RTF文件的头。如果字体组存在,那么它应该出现在样式组之前。如果某一个组在文件没有被使用,那么它将被忽略。在下面的章节中将详细讨论组。
A group consists of text and control words or control symbols enclosed in braces ({ }). The opening brace ({ ) indicates the start of the group and the closing brace ( }) indicates the end of the group. Each group specifies the text affected by the group and the different attributes of that text. The RTF file can also include groups for fonts, styles, screen color, pictures, footnotes, comments (annotations), headers and footers, summary information, fields, and bookmarks, as well as document-, section-, paragraph-, and character-formatting properties. If the font, file, style, screen-color, revision mark, and summary-information groups and document-formatting properties are included, they must precede the first plain-text character in the document. These groups form the RTF file header. If the group for fonts is included, it should precede the group for styles. If any group is not used, it can be omitted. The groups are discussed in the following sections.

    某个控制字(例如粗体(bold)、斜体(italic)、粗斜体等等)的控制属性仅有两个状态。当这样的控制字没有参数或仅有一个非零参数,则认为这个控制字开启了这个属性。而当这样的控制字有一个参数且为零,则认为这个控制字取消了此属性。举例,/b 打开粗体,反之 /b0 取消粗体。
The control properties of certain control words (such as bold, italic, keep together, and so on) have only two states. When such a control word has no parameter or has a nonzero parameter, it is assumed that the control word turns on the property. When such a control word has a parameter of 0 , it is assumed that the control word turns off the property. For example, /b turns on bold, whereas /b0 turns off bold.

    对于某个控制字,作为目标提及,标记一段有关系的文本的起点...(再议:此句汗,如果作为考研翻译亦不为过,destination当什么讲?目标,目的地,目标单元格?思考中。)。目标也可以是被使用的但是未必出现于文档中的文本。目标的一个例子:/footnote 组,脚注在控制字之后。分页符不能出现在目标文本中。目标控制字和它们之后的文本必须被包含于大括号之内。在目标组中不能出现其它控制字或者文本。在1987年3月出版的《微软体系杂志》中叙述的 RTF 规范之后添加的目标应该在 /* 控制符号之前出现。如果 RTF 查看器不能识别目标,那么控制字符将忽略相关文本的目标。(RTF 编辑器应该遵循添加新的目标或组时使用控制符号的规范。)相关文本的目标应该被插入到文档中,即使RTF 查看器不能识别目标,不要使用/*。/*作为控制字的一部分,与所有没包含在1987年3月出版的RTF规范修订版中的目标一起使用。
Certain control words, referred to as destinations, mark the beginning of a collection of related text that could appear at another position, or destination, within the document. Destinations may also be text that is used but should not appear within the document. An example of a destination is the /footnote group, where the footnote text follows the control word. Page breaks cannot occur in destination text. Destination control words and their following text must be enclosed in braces. No other control words or text may appear within the destination group. Destinations added after the RTF Specification published in the March 1987 Microsoft Systems Journal may be preceded by the control symbol /*. This control symbol identifies destinations whose related text should be ignored if the RTF reader does not recognize the destination. (RTF writers should follow the convention of using this control symbol when adding new destinations or groups.) Destinations whose related text should be inserted into the document, even if the RTF reader does not recognize the destination, should not use /*. All destinations that were not included in the March 1987 revision of the RTF Specification are shown with /* as part of the control word.

    利用组进行指定的格式化仅影响组中的文本。一般来说,一个组中的文本的格式可以继承自它之前组中的文本。然而,RTF的微软实现却规定脚注、注释、页眉和页脚(在本章的后边将有叙述)不会继承其之前文本的格式。因此,为了确保这些组永远正确的格式化,你应该用/sectd、/pard、/plain控制字默认地设置这些组中的格式化,然后添加你所希望的任意的格式化内容。
Formatting specified within a group affects only the text within that group. Generally, text within a group inherits the formatting of the text in the preceding group. However, Microsoft implementations of RTF assume that the footnote, annotation, header, and footer groups (described later in this chapter) do not inherit the formatting of the preceding text. Therefore, to ensure that these groups are always formatted correctly, you should set the formatting within these groups to the default with the /sectd, /pard, and /plain control words, and then add any desired formatting.

    控制字,控制符号,以及大括号控制信息。文件中的其它所有字符是纯文本。这里有一个纯文本不能存在于组中的例子:
The control words, control symbols, and braces constitute control information. All other characters in the file are plain text. Here is an example of plain text that does not exist within a group:

{/rtf/ansi/deff0{/fonttbl{/f0/froman Tms Rmn;}{/f1/fdecor Symbol;}{/f2/fswiss Helv;}}{/colortbl;/red0/green0/blue0; /red0/green0/blue255;/red0/green255/blue255;/red0/green255/ blue0;/red255/green0/blue255;/red255/green0/blue0;/red255/ green255/blue0;/red255/green255/blue255;}{/stylesheet{/fs20 /snext0Normal;}}{/info{/author John Doe} {/creatim/yr1990/mo7/dy30/hr10/min48}{/version1}{/edmins0} {/nofpages1}{/nofwords0}{/nofchars0}{/vern8351}}/widoctrl/ftnbj /sectd/linex0/endnhere /pard/plain /fs20 This is plain text./par}

    短语“This is plain text”不是一个组的一部分,而作为文档文本来对待。
The phrase "This is plain text" is not part of a group and is treated as document text.

    正如先前所提到的,在 RTF 反斜杠(/)和大括号({ }) 中有特殊的意义,所以当使用这些字符作为文本时,需要在它们之前添加一个反斜杠,像这样 //,/{,/}。
As previously mentioned, the backslash (/) and braces ({ }) have special meaning in RTF. To use these characters as text, precede them with a backslash, as in //, /{, and /}.



RTF 文件内容

    一个 RTF 文件符合以下语法:
An RTF file has the following syntax:
<File>'{' <header> <document> '}'
    本语法是标准的RTF语法,任何RTF查看器都应该可以正确的解释以此语法格式写出的RTF文件。有必要重申的是:RTF查看器没有必要包含所有的控制字,但它必须能够无害的忽略它不知道(或者未使用)的控制字,并且必须能正确的略过被控制字符号/control*标记的部分。然而,生成RTF的编辑器有可能并没完全符合这个语法规范,同样地,RTF查看器应该有足够能力去处理一些细微变化的控制字。虽然如此,如果一个生成RTF的编辑器符合本规范,那么任何一个正确的RTF查看器都应该能够完美的解释它。
This syntax is the standard RTF syntax; any RTF reader must be able to correctly interpret RTF written to this syntax. It is worth mentioning again that RTF readers do not have to use all control words, but they must be able to harmlessly ignore unknown (or unused) control words, and they must correctly skip over destinations marked with the /control* symbol. There may, however, be RTF writers that generate RTF that does not conform to this syntax, and as such, RTF readers should be robust enough to handle some minor variations. Nonetheless, if an RTF writer generates RTF conforming to this specification, then any correct RTF reader should be able to interpret it.

0 0

相关博文

我的热门文章

img
取 消
img