Regular expression syntax in unix

"Regular expressions" are not a part of CSC 209, but since you are being exposed to software tools which take them as arguments, you might want to know the syntax.

So here's a quick summary of the syntax of the most basic aspects of regular expressions, as implemented by many unix tools:

most characters mean themselves
a dot means any one character
something followed by an asterisk (star) means zero or more of that thing
character lists or ranges in square brackets match one character which is any of those characters (examples: [a-z] matches any lower-case letter; [xq] matches either 'x' or 'q'; this can be combined, as in [ac-z] which matches any lower-case letter except 'b')
- The entire set of characters can be preceded with '^' to complement the set; e.g. [^a-zA-Z0-9] matches any NON-alphanumeric character
a backslash suppresses the special meaning of one following character, e.g. \. means an actual dot
parentheses can be used, but may need to be prefaced by backslashes depending on the program — if so, the backslashes here turn ON the special meaning of the parentheses
"extended regular expressions" also permit the vertical bar, to indicate alternatives. This is the difference between grep and egrep (see the man pages).

Regular expression notation is not to be confused with the much simpler (and less powerful) "glob" notation in the shell for matching file names. In the "glob" notation, an asterisk means zero or more of any characters — in the regular expression notation we would write this as ".*", not "*". And "*" by itself as a regular expression is a syntax error.