Go to the previous, next section.
Regular expressions are patterns used in selecting text. For example,
the ed
command
g/string/
prints all lines containing string. Regular expressions are also
used by the `s' command for selecting old text to be replaced with
new.
In addition to a specifying string literals, regular expressions can
represent classes of strings. Strings thus represented are said to be
matched by the corresponding regular expression. If it is possible for
a regular expression to match several strings in a line, then the
left-most longest match is the one selected.
The following symbols are used in constructing regular expressions:
c
- Any character c not listed below, including `{', `}',
`(', `)', `<' and `>', matches itself.
\c
- Any backslash-escaped character c, other than `{',
``}', `(', `)', `<', `>', `b', `B',
`w', `W', `+' and `?', matches itself.
.
- Matches any single character.
[char-class]
- Matches any single character in char-class. To include a `]'
in char-class, it must be the first character. A range of
characters may be specified by separating the end characters of the
range with a `-', e.g., `a-z' specifies the lower case
characters. The following literal expressions can also be used in
char-class to specify sets of characters:
[:alnum:] [:cntrl:] [:lower:] [:space:]
[:alpha:] [:digit:] [:print:] [:upper:]
[:blank:] [:graph:] [:punct:] [:xdigit:]
If `-' appears as the first or last character of char-class,
then it matches itself. All other characters in char-class match
themselves.
Patterns in
char-class
of the form:
[.col-elm.]
[=col-elm=]
where col-elm is a collating element are interpreted
according to locale (5)
(not currently supported). See
regex (3)
for an explanation of these constructs.
[^char-class]
- Matches any single character, other than newline, not in
char-class. char-class is defined as above.
^
- If `^' is the first character of a regular expression, then it
anchors the regular expression to the beginning of a line. Otherwise,
it matches itself.
$
- If `$' is the last character of a regular expression, it anchors
the regular expression to the end of a line. Otherwise, it matches
itself.
\(re\)
- Defines a (possibly null) subexpression re.
Subexpressions may be nested. A
subsequent backreference of the form `\n', where n is a
number in the range [1,9], expands to the text matched by the nth
subexpression. For example, the regular expression `\(a.c\)\1' matches
the string `abcabc', but not `abcadc'.
Subexpressions are ordered relative to their left delimiter.
*
- Matches the single character regular expression or subexpression
immediately preceding it zero or more times. If `*' is the first
character of a regular expression or subexpression, then it matches
itself. The `*' operator sometimes yields unexpected results. For
example, the regular expression `b*' matches the beginning of the
string `abbb', as opposed to the substring `bbb', since a
null match is the only left-most match.
\{n,m\}
\{n,\}
\{n\}
- Matches the single character regular expression or subexpression
immediately preceding it at least n and at most m times. If
m is omitted, then it matches at least n times. If the
comma is also omitted, then it matches exactly n times.
If any of these forms occurs first in a regular expression or subexpression,
then it is interpreted literally (i.e., the regular expression `\{2\}'
matches the string `{2}', and so on).
\<
\>
- Anchors the single character regular expression or subexpression
immediately following it to the beginning (in the case of `\<')
or ending (in the case of `\>') of
a word, i.e., in ASCII, a maximal string of alphanumeric characters,
including the underscore (_).
The following extended operators are preceded by a backslash `\' to
distinguish them from traditional ed
syntax.
\`
\'
- Unconditionally matches the beginning `\`' or ending `\'' of a line.
\?
- Optionally matches the single character regular expression or subexpression
immediately preceding it. For example, the regular expression `a[bd]\?c'
matches the strings `abc', `adc' and `ac'.
If `\?' occurs at the beginning
of a regular expressions or subexpression, then it matches a literal `?'.
\+
- Matches the single character regular expression or subexpression
immediately preceding it one or more times. So the regular expression
`a+' is shorthand for `aa*'. If `\+' occurs at the
beginning of a regular expression or subexpression, then it matches a
literal `+'.
\b
- Matches the beginning or ending (null string) of a word. Thus the regular
expression `\bhello\b' is equivalent to `\<hello\>'.
However, `\b\b'
is a valid regular expression whereas `\<\>' is not.
\B
- Matches (a null string) inside a word.
\w
- Matches any character in a word.
\W
- Matches any character not in a word.
Go to the previous, next section.