|
This is the fifth video in the shell programming video series.
It requires the first four videos as background.
This video discusses some details of shell programming which you need to get
right to make your shell program behave as a normal program in every way.
|
|
First of all, I want to emphasize what we've learned about quoting. If we
have a file name in a variable named "filename", then writing
"cat $filename"
is almost always incorrect.
This is incorrect because if the filename variable has a space in it, it will
not be passed literally to 'cat'. Spaces in the middle will cause cat to see
multiple arguments, which are different file names than the filename
identified in the variable; and spaces at the beginning or end will be lost.
We need to use double quotes to get the data from the filename variable into
cat's first argument intact.
|
|
The quoting issue is not particularly about file names. It's about any data
which might include spaces. And user input could contain anything.
Consider this shell script:
if test $1 = hello
then
echo hiya!
else
echo um, I'm not sure what to say
fi
The interpolation of $1 in that 'test' line made a line which looked like
"test once upon a time = hello",
which is a syntax error for test. (So test failed with an error exit status,
not meaning to indicate that the condition was evaluated as false, but with an
error nevertheless, so the 'else' clause was taken.)
|
|
To make this correct we need to put double-quotes around the $1.
|
|
There's another problem with that shell script. I'm now going to run this
under bob's account, as shown by the different prompt.
...
What happened there??!?
...
This looks normal...
|
|
But look at this:
...
What's going on?
|
|
The value of PATH looks normal enough, but look at this:
[cat /u/bob/bin/test]
Since this was earlier in the PATH than /bin, we got Bob's version of test
rather than /bin/test.
Is this a problem? Well, you might think that this means that shell programs
can't be serious programs. A normal program written in C or Python or any
normal programming language wouldn't be susceptible to this sort of thing.
But shell programs don't have to be susceptible to this sort of thing either.
The correct sh program almost always has to begin with an assignment to the
PATH variable.
|
|
"/bin:/usr/bin" is generally a good assignment to the PATH variable. It's not
standard between different versions of unix which of the core commands are in
/bin and which are in /usr/bin, but the core commands will all be in one or
the other of these two directories.
If we make our own local assignment to PATH, that will replace any PATH
variable in the environment.
We will find the standard commands, not anything else silly.
This also deals with the situation where a user has a private PATH variable
setting which doesn't include /bin or doesn't include /usr/bin.
|
|
Now, I should say that I had to find a very old version of sh for this
example. Modern versions of sh all have a built-in version of test, so it
doesn't use your PATH for that. But you can experience this problem with
most commands, just not "test". "cat" will do for an example, if you want to
try it out.
|
|
Another requirement of the well-behaved shell script is to deal with temporary
files properly.
We often use temporary files in shell scripts, more often than in programs in
normal programming languages.
For example, we might have a pipeline which produces some output,
which we want to compare, with versus without some additional filter cmd4.
cmd1 | cmd2 | cmd3 >tmpfile
cmd4 <tmpfile | cmp - tmpfile
Using a temporary file is the easiest way to write this; and it's the
only
way to write this which doesn't execute the cmd1|cmd2|cmd3 pipeline twice.
But to write it just like this is an error. The current directory might not
be writable. Or worse, the current directory might already contain a file
named tmpfile, which this command destroys! Even if neither of these is a
problem, we don't want running our shell script to leave files named
"tmpfile" all over the place.
|
|
There is a directory for temporary files, named "/tmp".
A first attempt at improving this is to put tmpfile in the directory /tmp.
This is better in some ways: The directory /tmp is writable by all.
Still, we don't want to leave messes around, so we will remove the file when
we're done.
|
|
What about the concern about overwriting a valuable existing file? Does this
solve this concern? We don't expect people to put valuables in /tmp.
Or do we? Actually, we are putting valuable data in /tmp in this very code.
Suppose two people run this shell script at the same time? They'll interfere
with each other!
We will use the special variable name $$, which expands to the process
ID number of the shell running this shell script. Every process has its own
unique
process ID number. So this won't collide with any other running instances
of this shell script.
|
|
Now, suppose the system administrator notices a bunch of garbage accumulating
in /tmp over time. They want to fix whatever program is not cleaning up
properly.
But with this file naming strategy, it's pretty hard to tell which
program this is!
Suppose this program is named "slosh". We'll change the code to include
"slosh" in the file name. Then we can see if some program's leaving around
its temporary files.
And at this point I'm going to introduce a variable name for the temporary file,
so that we aren't repeating it everywhere.
Are you objecting because of the lack of double quotes? I hope you noticed!
But we don't need the double quotes here because we know that the variable TMP
doesn't contain any funny characters — we made up the contents of this
variable just above! My earlier insistence on quoting variable interpolations
was about variables which might include a space, which is any data from a
user, but won't be a concern here.
What about the dollar signs?
The dollar signs aren't part of the variable's value; they're expanded in line
one, but then the process ID is part of the variable's value.
Could this process ID number contain exciting characters? No, '$$' always expands to one or
more digits. So it's safe.
|
|
Ok.
Suppose this program takes some time to run, and we press control-C. It's not going
to get to the last line which deletes the temporary file.
We can solve this problem too.
There is a command "trap" which catches signals. You can say that if the
user presses ^C, do these commands before terminating.
The first argument to trap is the command to execute — it can have a
semicolon in it if you want to execute more complicated stuff, or it can even
have newlines in it, so long as you make all those characters be the first
argument by using appropriate quoting.
And the rest of the arguments are a list of signal numbers to catch. It's
designed this way because we have just one command, but one or more signal
numbers.
Signal number 1 is what you get if someone closes the terminal window or
terminates the ssh session — it's called "hang up".
Signal number 2 is what you get if someone presses ^C.
Signal number 15 is what you get if someone kills this process with the 'kill'
command, with no options.
So 1, 2, and 15 are the usual signals to list here. If you press
control-backslash, it sends signal number 3. We usually deliberately
omit
signal 3 from commands like this, so that if you are trying to debug your
shell script, you can press control-backslash instead of control-C and get
it to leave the temporary files for debugging purposes.
Why do we say "−f" in the rm command? Because someone might press control-C
before it gets to creating the temporary file. In that case, it would be
weird for the user to see an error message from rm.
The −f option suppresses the error message if the file doesn't exist.
|
|
Now, I hope you are also objecting to the duplication of the 'rm' command.
Especially since in a larger script, the two presumably-identical rm commands
might be quite a distance away from each other.
You can also specify '0' in the list of signals for trap. This is not a
signal number, but means to do this command upon normal termination of the
shell script.
So then we don't need to replicate the rm command at the end.
|
|
Another important thing to do properly is error handling.
It's easy to write a shell script like this:
mkdir dir
echo blof >dir/blof
echo blah | grep h >dir/blah
echo Done!
but if that first 'mkdir' fails, for example if we are cd'd to a directory
which isn't writable, then we can get a cascade of decreasingly meaningful
error messages:
mkdir: dir: Permission denied
s4: line 2: dir/blof: No such file or directory
s4: line 3: dir/blah: No such file or directory
Done!
More sensible behaviour would be to output only that first error message, and
stop.
Furthermore, this terminated with an exit status of zero because the last
command, the echo, succeeded. But we should be terminating with a non-zero
exit status to indicate error. This is important if this is invoked in a
'make' file or in any other situation which examines the exit status to
determine whether or not to proceed with some larger operation.
|
|
Checking everything for error is awkward, but necessary.
But it doesn't have to be as verbose as the obvious method.
This is so verbose that it fills the whole screen.
|
|
Instead, we can use "set -e":
|
|
When the 'e' option is in force, every command is checked for exit status by
the shell. If it exits with non-zero exit status, the shell terminates
immediately, with that exit status; the rest of the shell script is
not executed.
|
|
We also want to be able to run our shell programs by just typing the command
name, like we do for system commands, or for our compiled C programs. This
is discussed in the sixth and final shell programming video, which involves
topics having to do with shell programming and unix processes. These topics
might be covered later in your course, so you might not be expected to watch
the sixth video at this time — please check your course web pages for timing.
|