Schedule
Labs
Assignments
TA office hours
Topic videos
Some course notes
Extra problems
Lecture recordings
Click on the highlighted text to expand an item. [EXPAND ALL]
In pragmatic assignment-writing terms, cutesy features don't get you extra marks, but the probabilistic expectation is that they will lose you marks on average, because they will introduce bugs which affect the working of the non-cutesy parts of the program.
That is to say: Wield your cleverness cleverly. Don't waste your cleverness in doing silly things.
Two pithy quotations:
"The superior pilot uses his superior judgement to avoid situations in which he has to demonstrate his superior skill." (traditional pilot saying)"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian Kernighan
Don't store text for any longer than or in any more of a complex data structure than you need to. Process it as you go.
"crypt" will work on the text character-by-character; it does not need to store the whole file, or even a whole line. It doesn't need to assemble the output line into a char array either; just output it as you go!
And similarly, findempty should process one file or subdirectory at a time, rather than storing the results of readdir() calls.
Unless you are avoiding having a pathname string length limit, findempty does not need to do any dynamic memory allocation, except for what you get for free with recursion.
Keep It Simple. My solutions have the following line counts:
90 crypt.c 90 whatyear.c 72 findempty.cincluding all #includes, use of getopt(), etc. (And the above whatyear.c includes the 49 lines of starter code, too.) (I might add a few further comments before posting the solutions, but not too many.)
First of all, this is a really bad way to write computer programs. To write a working computer program, you need to understand the problem; understand your tools; and figure out how to bring the tools to bear to solve the problem. Web searches of this type accomplish none of these. You will not end up with a working program.
Secondly, this is a really bad way to take a course. The objective of an assignment is not to "get the right answer". The objective is to learn how to write a particular program and to complete this task successfully, learning from the experience.
Thirdly, although there are a lot of hits for such a web search, there seem to be only two pieces of code out there — everyone else is plagiarizing. (Don't follow their lead in this regard!) Also, one of them is completely wrong. Furthermore, we will be comparing them to your code as part of our searching for academic offences. If you copy them and try to disguise them, your disguise will be inadequate — we've been doing this much longer than you have.
But most of all, this is a bad way to write computer programs. You need to understand everything in your .c file if you want to get it to work. Copying in stuff you don't understand is asking for trouble.
For example, the "usage" messages have a very specific format, which you must adhere to. It is similar to the SYNOPSIS section of the man pages, with their meaning for square brackets (indicating that something is optional) and ellipses (indicating "one or more of"). You can take the usage messages from the behaviour of the example compiled programs in /u/csc209h/summer/pub/a2 if you like; and fairly little variation is acceptable, although the token immediately following the "usage:" string can be either argv[0] or the base program name. I wrote about usage messages at ../notes/tiny/usage.html
Check for possible error return from all system calls, and from fopen(). For any library call or kernel call which can return an error indication, you have to check it and do something appropriate, even if it's just printing an error message and exiting.
Error messages must be to stderr, not stdout. And pay attention to your process exit status.
And be sure you understand perror(). Where perror() is applicable, it is obligatory, rather than formulating your own error message. Please look at what perror() does in the example cat.c — perror() produces a better error message than you can. (And it does its output to stderr, as we would want.)
However, perror() is only suitable for reporting the error status from certain library and kernel calls. It can't be used for general error messages because it prints error messages in a specific format.
A: No. Your program should compile with "gcc −Wall" with no warning or error messages. Almost all of the warning or error messages which gcc −Wall can output represent potentially-serious problems, and you need to fix them. I am willing to decode error messages by e-mail (although not generally to fix your bugs, obviously).
Many cases of programs I see at this point in this course which contain lurking bugs of this nature are actually copying data entirely unnecessarily. Don't copy data when the original is just as good as the copy. For example, strings in the argv array can be used from that array directly, without copying the string data.
A: No. Your programs for this assignment are small enough that it isn't worth it to separate them into multiple files.
Q: Can I submit a .h file so that I can declare some functions and/or variables?
A: No, just declare them at the top of your .c file (or wherever is appropriate). The purpose of .h files is to coordinate declarations across multiple files. Each of your files should be self-contained for this assignment.
A: Use fprintf(stderr, "format" ... ), or any other stdio function which accepts a value of type FILE*
Also, perror() prints its message to stderr.
Q: But when I do fprintf(stderr, "this is an error message\n"), I still see it on the screen.
A: Both stdout and stderr are initially connected to your terminal window, but they can be redirected independently.
If your program says fprintf(stderr, "this is an error message\n"), then if you run "./a.out >file", you'll still see "this is an error message" on the screen and it won't go into the file. This is the purpose of using stderr, as previously discussed.
A: You have to call perror(), and you have to exit with a non-zero exit status eventually. So the easiest thing is just to exit right away. In most cases it's ok (desirable, even) to process the remaining files which do exist, correctly; but it's not required for this assignment. You'll find that some standard unix tools proceed after error in this way and some don't.
A value with 257 possible values cannot be stored in an 8-bit char. If you attempt to do so, e.g. if you have
char c; while ((c = getc(fp)) != EOF) {, then you won't be able to tell the input of byte number 255 apart from the EOF condition. (Either the comparison will fail in both cases or it will succeed in both cases, depending upon whether or not char values are deemed to be "signed" or "unsigned", both of which are legal for a C compiler.)
Once you've found that the value returned from getc() or getchar() is not equal to EOF, then it's safe to store in a char variable.
A: Normally you should use the 'f' functions. (By which term I mean to include getc() — i.e. you should feel free to use getc().)
The 'f' functions (fopen(), getc()/etc, fclose()) are part of the standard i/o library, which was built on top of the unix kernel calls (open(), read(), close()) for two reasons:
(You can get the unix file descriptor underlying a FILE* with the fileno() function (that is, fileno(fp) is the file descriptor number). You can go the other way by using fdopen(), which creates all the FILE* stuff around an already-opened file identified by file descriptor number ("fd"). These two functions are rarely necessary and won't be of use to us in this course.)
void f(int *a) { int i; for (i = 0; i < sizeof a; i++) /* WRONG */ ... }
is completely wrong. It will not iterate the correct number of times. The variable 'a' will have size 8 on our linux machines, because that's how many bytes are used by a pointer. If you want to know the number of elements in the array which 'a' points to, you need to pass that value in as a second parameter, of type int.
A:
See "man getopt". But typing "man getopt" gives you a tool for use in shell programming. So say "man 3 getopt". (And to be clear, you should be using getopt(), not getopt_long(), for assignment two.)
See the supplied example call of getopt() in getopt.c. Please understand that program fully before copying any of it!
Here are some notes about getopt, of which you might want to read the "interface" section, after reading getopt.c above.
You are required to use getopt() for crypt.c rather than parsing the command-line options yourself. All sorts of bizarre syntaxes are possible and will be dealt with automatically by getopt(). In the old days, everyone writing a unix tool parsed the options themselves, and the result was a lot of inconsistency as to whether or not you could do certain things (even including fundamentals such as combining options into one argument, e.g. writing "ls −qa" instead of "ls −q −a"). These days, everyone calls getopt(), and the users of your program may use a feature of standard option parsing which you didn't even know exists. This is good.
Be careful to use getopt() properly. Do not make assumptions as to the format of the command line. The standard unix command-line option format is actually extremely flexible in some ways. For example, these are all valid ways to execute the example getopt.c with '−c' value 17 and with the '−x' option, and a further command-line argument "file":
./getopt -c17 -x file ./getopt -x -c17 file ./getopt -x -c 17 file ./getopt -x -c 17 -- file ./getopt -x -c17 -- file ./getopt -c1 -x -c2 -x -c3 -x -c4 -x -c5 -x -c17 fileAnd furthermore, none of these is a special case. If you call getopt() correctly, as discussed in the man page and as shown in the supplied example getopt.c, all of these cases and more are handled automatically, without trying, with no special cases. The getopt() library routine contains all of the relevant complexity.
A: No, it is a request to search a directory named "−q". That is, this is not a special case. Keep It Simple.
A: Well, the instructions didn't say to. But you might as well do it, because it's easy; just follow the example cat.c.
Q: How about in whatyear.c?
A: No, because that doesn't make sense. Nor for findempty. Just for crypt.
A: We do expect C programs to be well-organized and readable, much more than with the shell scripts in assignment one.
"All programs are poems; it's just that not all programmers are poets."Make your program nice. Keep it simple. Someone who knows C well should be able to read your program without much confusion. Comments can help this process.
On the other hand, do not teach your reader C — assume that your target audience knows C, and knows the problem domain.
I think that the ideal program would be so clearly readable that it would contain no comments at all except for an introductory comment at the top (the "prologue comment"). (I also think that this ideal is often or usually not achievable, and even more often not in fact achieved.)
I've written a lot more about comments in ../comments.html.
Don't focus on input from the terminal (in general). Redirect your input from a file or a pipeline to avoid a host of red herrings, especially with respect to eof-terminated input streams.
Don't output anything other than the transformed file contents. If there are multiple files in crypt, just process them in order with no additional output.
Q: Can we assume a maximum path name length in findempty.c?
A: Well, sort of. You can set a maximum (make it at least, say, 2000 chars) so that you can declare your array, but if the path name is too long, you must print an appropriate message to stderr and exit; nothing can be permitted to make you exceed the array bounds.
Q: What about the array holding the input line in crypt.c?
Don't have an array holding the line at all! Instead, loop with a simple getc(), storing just one character at a time.
Note that your program also exceeds array bounds (and thus is buggy) if it asks a library function to exceed array bounds, e.g. if you call strcpy(x,y) without basically having in mind a mathematical proof that the length of the string y is such that the data will fit into the array whose zeroth character is pointed to by x.
A: No. The timing of the input and output is not part of the specification. So you should do whatever is easiest in that regard, under the principle of "Keep It Simple".
In general, process data as you go, don't store it.
A: It doesn't, and it mustn't. The behaviour must not differ. Don't be "smart". Keep it simple. Process all data until eof, whatever the source of the data.
(Actually, x−>y is simply defined as (*x).y.)
A: It is a very similar concept to a FILE* — it is the information about an open directory which you need to pass to readdir() for it to know which input stream to read from. In fact, an implementation of opendir() and friends which I've read the source code to just defines DIR as FILE in dirent.h. But some of them don't, so you should declare it correctly.
A: For most directory-tree-traversing programs, including findempty, it's important to use lstat(), as follows.
For the most part, if you attempt to access a symbolic link, the kernel follows this symbolic link automatically, giving you instead the file that the symlink points to. If this weren't the case, then symlinks wouldn't mean what they do mean. A symlink is a stand-in for the pointed-to file.
But you can't have the kernel always following symlinks, only almost-always. For example, an ls −R, or find, would get very confused by symlinks if it called stat() rather than lstat(). In particular, if a symlink points to a parent directory, then to opendir that symlink and continue traversing from there will result in infinite recursion.
So when symlinks were introduced, a dozen or so programs needed to be modified to be able to continue to work in their presence. These days, many more programs need to be aware of symlinks. Anything which traverses a directory tree needs to treat symlinks-which-point-to-a-directory differently from directories. Programs such as "ls" need to collect information on the symlink, rather than the pointed-to file.
The way to do this is to call the special call "lstat()", which is like stat() so long as its parameter is not a symlink. If its parameter is a symlink, it does not follow the symlink, but rather, reports information about the symlink itself.
Thus for example, "ls −l" calls lstat(), not stat(). There is an option '−L' to make ls follow the symlinks, but otherwise it doesn't.
For more examples: "test −f" calls stat(), but "test −L" (check whether the file is a symlink) needs to call lstat().
crypt has no reason to call stat() or lstat(), but if it did, it would call stat, not lstat, because we do want it to follow symlinks, in the normal way.
A: "." is a reference to the directory which "." is in. For example, /u/csc209h/summer/pub/. is the same as /u/csc209h/summer/pub, and /u/csc209h/summer/pub/a2/. is the same as /u/csc209h/summer/pub/a2. Somewhat similarly, ".." is the parent directory, so /u/csc209h/summer/pub/a2/.. is the same as /u/csc209h/summer/pub. This is explained in some detail in the unix filesystem video.
To traverse the directory /u/csc209h/summer/pub (for example), you will recursively traverse all subdirectories, such as /u/csc209h/summer/pub/a2. However, if you recursively traverse /u/csc209h/summer/pub/., that is itself a traversal of /u/csc209h/summer/pub and thus you have an infinite loop (infinite recursion). Similarly, if you recursively traverse /u/csc209h/summer/pub/.., that is the same as /u/csc209h/summer, and you will eventually get back down to /u/csc209h/summer/pub, and also have an infinite loop.
So you have to skip "." and ".." when looking at the contents of a subdirectory. (However, these are still valid directory names for the command-line; make sure you put your 'if' statement in the right place.)
Normally, programs exit with exit status zero for success and one for failure. All three assignment two programs are like this — normally the exit status will be zero, but if there is a usage error or if an fopen() fails, the exit status should be one.
You can do either one of these. Your choice does not affect the usage message. Automated testing will be with either all-lower-case letters, or with some non-letters in there to test your program's fatal error message.
A: First of all, this is only an explanation for why the chdir() strategy is not appropriate for findempty. If you're not considering using chdir(), you don't need to be talked out of it!
But if you're interested:
Consider that when doing directory traversal, if you have a directory named "foo" and a file in it named "bar", rather than constructing the pathname string "foo/bar", you could just do chdir("foo"), and use the name "bar". After processing the directory foo, you do chdir("..").
This is slightly easier than the string operations, but it's often not worth it. You need to put together the path name for output anyway, so why not put it together to pass to opendir() first?
But more to the point, if the command-line is something like "findempty /a/b/c d/e/f", after you chdir("/a/b/c") and to subdirectories, no amount of chdir("..") is going to get you back to the directory you were originally cd'd to when the program started. So the pathname "d/e/f" is not going to work. So you can't use this chdir() strategy for findempty.
A: Because it contains the basic directory traversal code which is the point of this assignment. Some people can call ftw(), but someone else has to write ftw(). This assignment is about writing the directory traversal code.
A: No. Reply in the user's own terms. If they specify a pathname such as "foo/bar", then you will output file path names such as "foo/bar/baz", which are valid if foo/bar is valid. Don't be "clever" about this, just do it the obvious and simple way.