regular expression
1.
An ordinary character (not one of the special characters discussed below) matches that character.
A backslash (\) followed by any special character matches the special character itself. The special characters are:
"." matches any character except NEWLINE; "RE*" (where the "*" is called the "Kleene star") matches zero or more occurrences of RE. If there is any choice, the longest leftmost matching string is chosen, in most regexp flavours.
"^" at the beginning of an RE matches the start of a line and "$" at the end of an RE matches the end of a line.
[string] matches any one character in that string. If the first character of the string is a "^" it matches any character except the remaining characters in the string (and also usually excluding NEWLINE). "-" may be used to indicate a range of consecutive ASCII characters.
\( RE \) matches whatever RE matches and \n, where n is a digit, matches whatever was matched by the RE between the nth \( and its corresponding \) earlier in the same RE. Many flavours use ( RE ) used instead of \( RE \).
The concatenation of REs is a RE that matches the concatenation of the strings matched by each RE. RE1 | RE2 matches whatever RE1 or RE2 matches.
\< matches the beginning of a word and \> matches the end of a word. In many flavours of regexp, \> and \< are replaced by "\b", the special character for "word boundary".
RE\m matches m occurences of RE. RE\m, matches m or more occurences of RE. RE\m,n matches between m and n occurences.
The exact details of how regexp will work in a given application vary greatly from flavour to flavour. A comprehensive survey of regexp flavours is found in Friedl 1997 (see below).
[Jeffrey E.F. Friedl, "Mastering Regular Expressions, O'Reilly, 1997].
2. Any description of a pattern composed from combinations of symbols and the three operators:
Concatenation - pattern A concatenated with B matches a match for A followed by a match for B.
Or - pattern A-or-B matches either a match for A or a match for B.
Closure - zero or more matches for a pattern.
The earliest form of regular expressions (and the term itself) were invented by mathematician Stephen Cole Kleene in the mid-1950s, as a notation to easily manipulate "regular sets", formal descriptions of the behaviour of finite state machines, in regular algebra.
[S.C. Kleene, "Representation of events in nerve nets and finite automata", 1956, Automata Studies. Princeton].
[J.H. Conway, "Regular algebra and finite machines", 1971, Eds Chapman & Hall].
[Sedgewick, "Algorithms in C", page 294].
(2004-02-01)