CSC 209 Tutorial sh Example: A little web tool
Web browsers tend to be monolithic, multi-functional programs with very little
utility as software tools.
You can't use them very well in shell scripts, etc.
The text-mode web browser "lynx", however, has some command-line options
which make it suitable for use in this way.
In this assignment you will write an interactive program in the Bourne shell
script programming language which uses lynx to fetch and format web pages and
accepts a small list of user commands.
For simplicity,
this "websh" program will assume that all URLs are http URLs and that all
web pages are html.
Operation
When "websh" is run, it takes an optional initial URL as its sole
command-line argument.
It then processes commands interactively from the list below.
The commands 'c', 'g', and 'i' each take one argument,
e.g. the user can type "g 3".
- c arg
-
Change URL as specified.
To retrieve the web page for the new URL you will want to run:
lynx -source -dump -base "$url"
- s (show)
-
Display the current file with lynx -dump -force_html file | more.
We could also add commands to send a message to a fancier web browser (e.g. see
http://mozilla.org/unix/remote.html),
to "view source", etc.
- g arg (go)
-
Change URL to one of the references in the lynx output.
This corresponds to "clicking on a link"; if you look at the lynx -dump
(but not -source) output you will see that each hypertext link gets a number,
and they are summarized at the bottom.
The existence of this 'g' command means that you will have to keep a copy of the
html page from the 'c' command in a temporary file, as suggested in the
description of that command above.
(If you re-get it for the 'g' command, the web page could have changed,
making the user's numeric selection surprising!
People edit their web pages a lot and you don't have control over
that.) So use lynx -dump on your stashed file, and note that the reference
number lines begin with a space; so search for the last occurrence of
space, number, period.
If there is no such reference number in the current file,
you should display an error message.
- b
-
Bookmark this page.
Append the URL to $HOME/bookmarks, which will have the simple format of one
URL per line, no descriptive text or anything other than the URL.
- B
-
Show bookmarks, with reference numbers.
After this display,
'g' goes to the given bookmark number.
After
doing a 'B', it's ok if the 's' command shows
bookmark data instead, and don't worry about what a subsequent 'b'
would do.
Note the filter "cat -n" for adding line numbers, although it puts
tabs after the line numbers which you might have to modify with "tr" or
something.
- i arg
-
Import URLs from a document or other free-format file.
Some of the words in the document will be URLs (as they are in this document);
you are to extract appropriate words and throw them all into the bookmarks
file.
Assume URLs begin with "http://".
Remove a trailing period or comma, and remove
surrounding double-quotes.
However,
everything except whitespace and double-quotes is potentially part
of a URL, so don't be any more aggressive than this.
You will probably want the -s option to "tr", with the last argument
being a simple \\012, to convert the file into its list of words.
- q
-
Exit.
- ?
-
Display this list (help).
Other notes
Your programs should all work when given funny filenames as arguments.
Also note that URLs can and often do include all sorts of funny characters.
Be scrupulous in quoting both unknown file names and URLs.
As arguments to "tr",
it's easier to use \012 for newline and \011 for tab rather than
actual newlines and tabs.
(Unfortunately, "tr" doesn't recognize \n and \t.)
Note that '\' is special to the shell and needs to be quoted, e.g. \\012.
[a solution (don't look until you try it!)]
[back to sh problems]