## CSDN博客

### 正则表达式语法规则收集

turnmissile 的 Blog http://blog.csdn.net/turnmissile/

Microsoft已经把正则表达式的规则收录在了msdn里面了，有兴趣的朋友可以自己去研究一下(ms-help://MS.MSDNQTR.2003OCT.1033/cpgenref/html/cpconRegularExpressionsLanguageElements.htm)，这里罗列一些我找到的语法元素功能表，大家自己研究吧！

## Atomic Zero-Width Assertions

 Assertion Description ^ Specifies that the match must occur at the beginning of the string or the beginning of the line. For more information, see the Multiline option in Regular Expression Options. \$ Specifies that the match must occur at the end of the string, before /n at the end of the string, or at the end of the line. For more information, see the Multiline option in Regular Expression Options. /A Specifies that the match must occur at the beginning of the string (ignores the Multiline option). /Z Specifies that the match must occur at the end of the string or before /n at the end of the string (ignores the Multiline option). /z Specifies that the match must occur at the end of the string (ignores the Multiline option). /G Specifies that the match must occur at the point where the previous match ended. When used with Match.NextMatch(), this ensures that matches are all contiguous. /b Specifies that the match must occur on a boundary between /w (alphanumeric) and /W (nonalphanumeric) characters. The match must occur on word boundaries — that is, at the first or last characters in words separated by any nonalphanumeric characters. /B Specifies that the match must not occur on a /b boundary.

## 数量

 Quantifier Description * Specifies zero or more matches; for example, /w* or (abc)*. Equivalent to {0,}. + Specifies one or more matches; for example, /w+ or (abc)+. Equivalent to {1,}. ? Specifies zero or one matches; for example, /w? or (abc)?. Equivalent to {0,1}. {n} Specifies exactly n matches; for example, (pizza){2}. {n,} Specifies at least n matches; for example, (abc){2,}. {n,m} Specifies at least n, but no more than m, matches. *? Specifies the first match that consumes as few repeats as possible (equivalent to lazy *). +? Specifies as few repeats as possible, but at least one (equivalent to lazy +). ?? Specifies zero repeats if possible, or one (lazy ?). {n}? Equivalent to {n} (lazy {n}). {n,}? Specifies as few repeats as possible, but at least n (lazy {n,}). {n,m}? Specifies as few repeats as possible between n and m (lazy {n,m}).

## 组构造

Grouping constructs allow you to capture groups of subexpressions and to increase the efficiency of regular expressions with noncapturing lookahead and lookbehind modifiers. The following table describes the Regular Expression Grouping Constructs.

 Grouping construct Description (   ) Captures the matched substring (or noncapturing group; for more information, see the ExplicitCapture option in Regular Expression Options). Captures using () are numbered automatically based on the order of the opening parenthesis, starting from one. The first capture, capture element number zero, is the text matched by the whole regular expression pattern. (?   ) Captures the matched substring into a group name or number name. The string used for name must not contain any punctuation and it cannot begin with a number. You can use single quotes instead of angle brackets; for example, (?'name'). (? ) Balancing group definition. Deletes the definition of the previously defined group name2 and stores in group name1 the interval between the previously defined name2 group and the current group. If no group name2 is defined, the match backtracks. Because deleting the last definition of name2 reveals the previous definition of name2, this construct allows the stack of captures for group name2 to be used as a counter for keeping track of nested constructs such as parentheses. In this construct, name1 is optional. You can use single quotes instead of angle brackets; for example, (?'name1-name2'). (?:   ) Noncapturing group. (?imnsx-imnsx:   ) Applies or disables the specified options within the subexpression. For example, (?i-s: ) turns on case insensitivity and disables single-line mode. For more information, see Regular Expression Options. (?=   ) Zero-width positive lookahead assertion. Continues match only if the subexpression matches at this position on the right. For example, /w+(?=/d) matches a word followed by a digit, without matching the digit. This construct does not backtrack. (?!   ) Zero-width negative lookahead assertion. Continues match only if the subexpression does not match at this position on the right. For example, /b(?!un)/w+/b matches words that do not begin with un. (?<=   ) Zero-width positive lookbehind assertion. Continues match only if the subexpression matches at this position on the left. For example, (?<=19)99 matches instances of 99 that follow 19. This construct does not backtrack. (?   ) Nonbacktracking subexpression (also known as a "greedy" subexpression). The subexpression is fully matched once, and then does not participate piecemeal in backtracking. (That is, the subexpression matches only strings that would be matched by the subexpression alone.)

Named captures are numbered sequentially, based on the left-to-right order of the opening parenthesis (like unnamed captures), but numbering of named captures starts after all unnamed captures have been counted. For instance, the pattern ((?<One>abc)/d+)?(?<Two>xyz)(.*) produces the following capturing groups by number and name. (The first capture (number 0) always refers to the entire pattern).

 Number Name Pattern 0 0 (default name) ((?abc)/d+)?(?xyz)(.*) 1 1 (default name) ((?abc)/d+) 2 2 (default name) (.*) 3 One (?abc) 4 Two (?xyz)

## Backreference Constructs

The following table lists optional parameters that add backreference modifiers to a regular expression.

 Backreference construct Definition /number Backreference. For example, (/w)/1 finds doubled word characters. /k Named backreference. For example, (?/w)/k finds doubled word characters. The expression (?<43>/w)/43 does the same. You can use single quotes instead of angle brackets; for example, /k'char'.

Note the ambiguity between octal escape codes and /number backreferences that use the same notation. See Backreferences for details on how the regular expression engine resolves the ambiguity.

## 其他

The following table lists subexpressions that modify a regular expression.

 Construct Definition (?imnsx-imnsx) Sets or disables options such as case insensitivity to be turned on or off in the middle of a pattern. For information on specific options, see Regular Expression Options. Option changes are effective until the end of the enclosing group. See also the information on the grouping construct (?imnsx-imnsx: ), which is a cleaner form. (?# ) Inline comment inserted within a regular expression. The comment terminates at the first closing parenthesis character. # [to end of line] X-mode comment. The comment begins at an unescaped # and continues to the end of the line. (Note that the x option or the RegexOptions.IgnorePatternWhitespace enumerated option must be activated for this kind of comment to be recognized.)

0 0