Schedule
Labs
Assignments
TA office hours
Topic videos
Some course notes
Extra problems
Lecture recordings
The "software tools" idea is about writing small, simple programs, which do one thing well; and having powerful and general ways to combine them.
Cute quotation: "Unix is user-friendly; it's just choosy about who its friends are."
Summary: "Do one thing well."
Tools which do just one thing can be combined in arbitrary ways.
One thing a bit odd in unix is that program output doesn't contain headers.
Consider the "who" command. Example output:
(The "who" output is more exciting on a system with multiple users, especially if no one's on the console and creating multiple terminal windows.)ajr console Jan 8 06:28 ajr ttyp1 Jan 8 09:25 ajr ttyp2 Jan 8 09:26
We can see how many entries there are by using the "word count" program "wc", with the option "−l" which means "only display the line count":
On many non-unix systems we would expect output with a header, identifying the columns, like this:$ who | wc -l 3 $
But this would cause problems for the software tools model. In the "who | wc −l" case, the line count above would be off by two; in fact we would get funny results from many tools. For example, a "grep" (display only lines matching a search expression) to see who is logged in and has a "−" in their logname would also display the header separation line, or if a user were named "ogi", then "who | grep ogi" would also display the header line.User Terminal Login time ------------------------------ ajr console Jan 8 06:28 ajr ttyp1 Jan 8 09:25 ajr ttyp2 Jan 8 09:26
For example, '*' matches any number of any character.
"cat comp*" would output the "composers" file because that file name
matches that pattern.
To illustrate how the globbing patterns work, we will use the 'echo' command, which just outputs its arguments.
But it outputs those arguments after all expansions and substitutions.$ echo hello hello $
$ echo comp* composers $
So below we illustrate how
To play along, download the file
toolsfiles/demo.tar ,
and extract its contents with the command tar xf demo.tar
(or if you are working on a teach.cs computer, you can just do
tar xf /u/csc209h/summer/pub/02/demo.tar )
and then cd to the newly-created demo directory.
We begin with an 'ls' to show the list of file names in the current directory, because the expansion of the glob patterns depends on this:
$ ls a.pdf a3.pdf aw.pdf gcd.c newstudentlist a1.pdf a4.pdf composers grades people a12.pdf abc firstbyte.c hello studentlist $ echo *.c firstbyte.c gcd.c $ echo *1*f a1.pdf a12.pdf $ echo a?.pdf a1.pdf a3.pdf a4.pdf aw.pdf $ echo a*pdf a.pdf a1.pdf a12.pdf a3.pdf a4.pdf aw.pdf $ echo a[0-9].pdf a1.pdf a3.pdf a4.pdf $ echo a[w1Q]*pdf a1.pdf a12.pdf aw.pdf $
A '.' at the beginning of the file name is treated specially: it is only
matched explicitly, not by a '*' or '?'. (There's no special treatment of
dots anywhere else in the name.)
"ls" also does not report files whose names begin with a dot, unless you give
the "−a" option.
$ echo *c abc firstbyte.c gcd.c $ ls -a . a1.pdf abc gcd.c people .. a12.pdf aw.pdf grades studentlist .abc a3.pdf composers hello a.pdf a4.pdf firstbyte.c newstudentlist $ echo .*c .abc $ echo .* . .. .abc $
All of these commands, and all other commands, have man pages. You'll want to get used to reading man pages, especially to find obscure options.
I frequently read man pages. The on-line help in unix is very comprehensive. There's a lot to know and you don't have to remember it all.
If you haven't already done so above,
please download the file
toolsfiles/demo.tar ,
and extract its contents with the command tar xf demo.tar
(or if you are working on a teach.cs computer, you can just do
tar xf /u/csc209h/summer/pub/02/demo.tar )
and then cd to the newly-created demo directory.
(Or, the individual files are usually linked to below, but it's smoother to
have the whole demo directory in advance.)
Where '$' is the shell prompt, and given a text file called "composers" (which is in that demo.tar file),
(that's all zero of them)$ grep Q composers $
$ grep H composers Henry Purcell Hildegard von Bingen Heinrich Schuetz $
We can combine this with other commands with a pipeline, as shown in the introductory section of this document:
In the pipelines,$ who ajr console Jan 8 06:28 ajr ttyp1 Jan 8 09:25 ajr ttyp2 Jan 8 09:26 $ who | grep ajr ajr console Jan 8 06:28 ajr ttyp1 Jan 8 09:25 ajr ttyp2 Jan 8 09:26 $ who | grep 09:25 ajr ttyp1 Jan 8 09:25 $
Data goes into a command via the standard input, but also via command-line arguments, as in the arguments to grep above.
Find lines which match a regular expression. Examples (some demonstrated in class):
who | grep ajr grep /~ajr/209/ /var/httpd/log/access_log lpq | grep ajr | cut -f1 | xargs lprm
tr o Q tr '\015' '\012' <file.mac >file.unix tr A-Z a-z tr a-zA-Z n-za-mN-ZA-M(try these! except for the macintosh one I guess)
last | head tail /var/log/messages tail -40 /var/log/messages
sort sort -k2 sort -n sort -n -k3lots of other options such as case-insensitive, reverse order — see the man page.
tr -cs a-zA-Z0-9 '\012' <file | tr A-Z a-z | sort | uniq -c | sort -n
sed s/Fred/Wilma/ people sed s/Fred/Wilma/g people sed 's/Fred[a-z]*/Wilma/g' people sed 5d people sed /pattern/d peoplesed takes arbitrary regular expressions.
The argument to sed often has to be quoted so that special characters in it aren't interpreted by the shell (e.g. as glob notation!).
I wrote a quick intro to unix regular expression syntax.
If you enclose some of the search string in backslashed parentheses,
\1 in the replacement means the first such match. If you have multiple
pairs of backslashed parentheses, you can also use \2, etc.
Whether some of the search string is enclosed in backslashed parentheses or
not, '&' in the replacement string represents the entire search
string.
Examples (try them!):
sed 's/[A-Z]/ capital-& /g' composers sed 's/\(.*\) \(.*\)/\2, \1/' composers
echo Please enter repeat count: echo -n 'Please enter repeat count: '→ note how it takes any number of arguments, and outputs them separated by spaces.
Use "tr" to convert x's to y's in xylophone:
In general in unix tools, you can combine options into one word with just one
minus sign.
For example, instead of writing "ls −l −a −r −t" you
can write "ls −lart".
Although as soon as you hit an option which takes an argument
(such as sort's −k option), that's it for that word. So for example, in
"sort −k2f", all of "2f" would be the argument to −k.
Although you could rearrange this one: "sort −fk2" is still the same as
"sort −f −k2".
(This asymmetry is caused by the fact that −k takes an argument and
−f does not.)
diff is the basis of commands to compare different revisions in many source control systems, e.g. "git diff".
Example:
students enrolled in CSC 209 before and after the drop date (fictional)
Please try these commands:
The rule is that the command-line options say which of the three columns to suppress. It's a little odd. Compare with "comm studentlist newstudentlist" with no options, which produces unreadable and useless output but is the key to understanding how the options work.comm -1 studentlist newstudentlist comm -12 studentlist newstudentlist comm -13 studentlist newstudentlist comm -23 studentlist newstudentlist
There are millions of options to specify what the key fields are, what the output format should be, etc. I think that most people consult the man page every single time they write a join command (which is used more frequently in a shell script than interactively).join newstudentlist grades
For example, diff − file will compare the standard input to the contents of "file".
The command line begins with a list of directories, then contains a list of predicates, usually ending with either "−print" (to print the path name if you get that far, i.e. all of the previous predicates are true) or "−exec" (to execute a command for that file path name).
For exec, in the command you can use "{}" to mean to substitute the path name here in the command-line. Since these characters are special to the shell, they need to be quoted.
The command-line for exec needs to be terminated, else find wouldn't be able to tell where the command ends and the find options resume. It is terminated by a semi-colon (as a separate argument). Since semicolon is special to the shell, it needs to be quoted.
The few examples below are intended to give you an idea of what find can do, not to teach you to use it; you'll learn how to use find as you have particular applications for it.
Basic command to find a file by name in a directory tree:
find /u/ajr/209/web/notes -name cat0.c -print
Sample output:/u/ajr/209/web/notes/toolsfiles/cat0.c
Find files which are modified within the last 30 days, and execute the "ls −l" command on them. But they might be directories, so we need the "−d" option to ls as well.
find /u/ajr/209/web/notes -mtime -30 -exec ls -ld '{}' ';'
Sample output:drwxr-xr-x 43 ajr staff 1462 Feb 17 01:02 /u/ajr/209/web/notes -rw-r--r-- 1 ajr staff 6964 Feb 17 01:01 /u/ajr/209/web/notes/c -rw-r--r-- 1 ajr staff 7372 Feb 17 01:02 /u/ajr/209/web/notes/files drwxr-xr-x 8 ajr staff 272 Feb 19 13:44 /u/ajr/209/web/notes/sockets -rw-r--r-- 1 ajr staff 1388 Feb 19 13:44 /u/ajr/209/web/notes/sockets/client.c -rw-r--r-- 1 ajr staff 1329 Feb 19 13:43 /u/ajr/209/web/notes/sockets/client_inet.c -rw-r--r-- 1 ajr staff 2830 Feb 19 13:35 /u/ajr/209/web/notes/sockets/server.c -rw-r--r-- 1 ajr staff 1758 Feb 19 13:37 /u/ajr/209/web/notes/sockets/server_inet.c
As above, but only finding plain files (e.g. excluding directories) (so the "−d" option to ls is no longer important, but doesn't hurt either):
find /u/ajr/209/web/notes -type f -mtime -30 -exec ls -ld '{}' ';'
Sample output:drwxr-xr-x 43 ajr staff 1462 Feb 17 01:02 /u/ajr/209/web/notes -rw-r--r-- 1 ajr staff 6964 Feb 17 01:01 /u/ajr/209/web/notes/c -rw-r--r-- 1 ajr staff 7372 Feb 17 01:02 /u/ajr/209/web/notes/files drwxr-xr-x 8 ajr staff 272 Feb 19 13:44 /u/ajr/209/web/notes/sockets -rw-r--r-- 1 ajr staff 1388 Feb 19 13:44 /u/ajr/209/web/notes/sockets/client.c -rw-r--r-- 1 ajr staff 1329 Feb 19 13:43 /u/ajr/209/web/notes/sockets/client_inet.c -rw-r--r-- 1 ajr staff 2830 Feb 19 13:35 /u/ajr/209/web/notes/sockets/server.c -rw-r--r-- 1 ajr staff 1758 Feb 19 13:37 /u/ajr/209/web/notes/sockets/server_inet.c