Go to the previous, next section.

Regular expressions

Regular expressions are patterns used in selecting text. For example, the ed command

g/string/

prints all lines containing string. Regular expressions are also used by the `s' command for selecting old text to be replaced with new.

In addition to a specifying string literals, regular expressions can represent classes of strings. Strings thus represented are said to be matched by the corresponding regular expression. If it is possible for a regular expression to match several strings in a line, then the left-most longest match is the one selected.

The following symbols are used in constructing regular expressions:

c

Any character c not listed below, including `{', `}', `(', `)', `<' and `>', matches itself.

\c

Any backslash-escaped character c, other than `{', ``}', `(', `)', `<', `>', `b', `B', `w', `W', `+' and `?', matches itself.

.

Matches any single character.

[char-class]

Matches any single character in char-class. To include a `]' in char-class, it must be the first character. A range of characters may be specified by separating the end characters of the range with a `-', e.g., `a-z' specifies the lower case characters. The following literal expressions can also be used in char-class to specify sets of characters:

[:alnum:] [:cntrl:] [:lower:] [:space:]
[:alpha:] [:digit:] [:print:] [:upper:]
[:blank:] [:graph:] [:punct:] [:xdigit:]

If `-' appears as the first or last character of char-class, then it matches itself. All other characters in char-class match themselves.

Patterns in char-class of the form:

[.col-elm.]
[=col-elm=]

where col-elm is a collating element are interpreted according to locale (5) (not currently supported). See regex (3) for an explanation of these constructs.

[^char-class]

Matches any single character, other than newline, not in char-class. char-class is defined as above.

^

If `^' is the first character of a regular expression, then it anchors the regular expression to the beginning of a line. Otherwise, it matches itself.

$

If `$' is the last character of a regular expression, it anchors the regular expression to the end of a line. Otherwise, it matches itself.

$re$

Defines a (possibly null) subexpression re. Subexpressions may be nested. A subsequent backreference of the form `\n', where n is a number in the range [1,9], expands to the text matched by the nth subexpression. For example, the regular expression `$a.c$\1' matches the string `abcabc', but not `abcadc'. Subexpressions are ordered relative to their left delimiter.

*

Matches the single character regular expression or subexpression immediately preceding it zero or more times. If `*' is the first character of a regular expression or subexpression, then it matches itself. The `*' operator sometimes yields unexpected results. For example, the regular expression `b*' matches the beginning of the string `abbb', as opposed to the substring `bbb', since a null match is the only left-most match.

\{n,m\}

\{n,\}

\{n\}

Matches the single character regular expression or subexpression immediately preceding it at least n and at most m times. If m is omitted, then it matches at least n times. If the comma is also omitted, then it matches exactly n times. If any of these forms occurs first in a regular expression or subexpression, then it is interpreted literally (i.e., the regular expression `\{2\}' matches the string `{2}', and so on).

\<

\>

Anchors the single character regular expression or subexpression immediately following it to the beginning (in the case of `\<') or ending (in the case of `\>') of a word, i.e., in ASCII, a maximal string of alphanumeric characters, including the underscore (_).

The following extended operators are preceded by a backslash `\' to distinguish them from traditional ed syntax.

\`
\': Unconditionally matches the beginning `\`' or ending `\'' of a line.
\?: Optionally matches the single character regular expression or subexpression immediately preceding it. For example, the regular expression `a[bd]\?c' matches the strings `abc', `adc' and `ac'. If `\?' occurs at the beginning of a regular expressions or subexpression, then it matches a literal `?'.
\+: Matches the single character regular expression or subexpression immediately preceding it one or more times. So the regular expression `a+' is shorthand for `aa*'. If `\+' occurs at the beginning of a regular expression or subexpression, then it matches a literal `+'.
\b: Matches the beginning or ending (null string) of a word. Thus the regular expression `\bhello\b' is equivalent to `\<hello\>'. However, `\b\b' is a valid regular expression whereas `\<\>' is not.
\B: Matches (a null string) inside a word.
\w: Matches any character in a word.
\W: Matches any character not in a word.

Go to the previous, next section.