M.I.T. DEPARTMENT OF EECS

6.033 - Computer System Engineering UNIX Hands-On Assignment

Hands-on 2: The UNIX Time-Sharing System: Answers

Problem #1

A listing of all processes that you are currently running on the Athena machine you are using, sorted by the command name in reverse alphabetical order (i.e. a process running zwgc should be listed before a process running acroread). The output should consist only of the processes you are running, and nothing else (i.e. if you are running 6 processes, the output should only have 6 lines).

Here is a preferred solution:

athena% ps -o comm -u $USER | tail +2 | sort -r   
zwgc
tail
sort
ps
-tcsh
athena% 

It turns out that the "-o" option has a different behavior if the option is followed by an equal sign. So this solution also works:

athena% ps -o comm= -u $USER | sort -r   
zwgc
tail
sort
ps
-tcsh
athena% 

This problem was tricky! Because the output of the ps command is normally used by people, the output is "decorated" with a header line and additional information about each process. In this problem you were asked to omit that information from your output. Typically you might want to remove that information if you were trying to use this list of processes as input to another program in a pipeline.

The Linux version of the ps command has special "--no-headers" option that will suppress the first line of the ps command's output. Unfortunately, this option is not implemented on the Solaris version of the command. Thus, the portable solution is to let ps print the header and then use a "tail +2" in the pipeline to suppress the first line of output from the "ps" command.

Since the first line of the output (without the -o option) contains the string "PID," you might be tempted to add grep -v PID to your pipeline to suppress this line of output. Although this would appear to work, there would be a lurking bug: the grep would also suppress any command that had the upper-case letters "PID" in its name. This could cause the program to produce strange results many months or years after you put it into production!

One of the lessons here is that it is very difficult to write portable scripts that work on both Solaris and Linux. (In fact, Solaris has not one but two ps commands, each with different semantics. One is installed as /usr/bin/ps, while the other one is installed at /usr/ucb/ps. You will get a different version of the ps command when you type "ps" depending on which directory is first in your path.


Problem #2

The number of words in the file /usr/dict/words (*) which contain all of the letters a, b, c, d, e and f. These letters may occur more than once in the word and the word may contain other letters as well. (For example, "feedback" should be counted.)

The trick to answering this question was to realize that you needed to pipe together several invocations of the grep command --- each one to select lines that had a different letter. After the stream was created, you needed to count the number of words. This could be done easily with the wc command:

 % cat /usr/share/dict/words | grep a | grep b | grep c | grep d | grep e | grep f | wc  -l

It is possible to modify that last invocation of the grep command by adding a "-c" or "--count" flag to suppress normal output and instead print the number of matching lines. While this works, the resulting pipeline somewhat confusing to read:

 % cat /usr/share/dict/words | grep a | grep b | grep c | grep d | grep  e | grep -c f

This is confusing because all of the greps have one set of semantics, and then the final grep has a different one. This would not be a good idea in a production system, even if the code is marginally more efficient.

Problem #3

Create 7x7 matrix of alternating entries of 1's and 0's. It should look like this:
1 0 1 0 1 0 1
0 1 0 1 0 1 0
1 0 1 0 1 0 1
0 1 0 1 0 1 0
1 0 1 0 1 0 1
0 1 0 1 0 1 0
1 0 1 0 1 0 1

Although this problem was hard, the solution is fairly straightforward:

% yes 1 0 | fmt -14 | head -7 
This solution takes advantage of the fact that the yes command repeats its output, no matter what you give it.

Some versions of the yes command will only repeat one word. For these, you need to quote the command's arguments:

% yes "1 0" | fmt -14 | head -7 
Of course, you could also use the head command to truncate the output of the yes command, rather than the output of the head command, like this:
% yes 1 0 | head -c 111 | fmt -14 
While this works, it is unclear. Where does the constant "111" come from? What is the programmer trying to accomplish? You want 7 lines of output, so ask for 7 lines.


Problem #4

Create a "long" listing of the largest 5 files in the /etc directory whose name contains the string ".conf", sorted by decreasing file size.

This problem is trickier than it appears. The simplest approach to this problem is:

% ls -a -d -l -S /etc/*.conf* | head -5
However, there are a couple of obscure hidden bugs in it. One is that shell globs will not match any files beginning with a dot unless the glob itself begins with a dot. Therefore, in order to include files such as /etc/.conf or /etc/.foo.configuration, we'd have to change the command to:
% ls -a -d -l -S /etc/*.conf* /etc/.conf* /etc/.*.conf*
The second bug is that the shell (in the default configuration) will return an error if a shell glob matches no files (on the assumption that it is a typo). Thus, if the only files in /etc were named foo and bar the two commands above would return an error instead of an empty output. The second command is even more problematic in this respect, since it will return an error if any of the globs is empty.

Another approach might be to try to use grep:

% ls -a -l -S /etc/ | grep .conf | head -5
Although this command appears to work, it turns out there is a bug. The search string passed to grep is actually a regular expression, and with Unix regular expressions the dot (".") is actually the regular expression wildcard character. Thus, the command "grep .conf" will pass a line that contains the file "6033.conf" but it will also pass the file "6033conf" and even "6conf", which is not what the assignment requested.(Type "man grep" or "man regex" for information on regular expressions.)

This problem can be avoided by escaping the dot using backslashes:

% ls -a -l -S /etc/ | grep \\.conf | head -5
Two backslash characters are needed here. The first backslash is interpreted by the Unix shell to escape the second backslash, causing the string \.conf to be passed to the grep command on the command line. The grep command then interprets the single backslash as an escape and treats the dot as a literal, rather than as a wildcard character.

Some versions of grep provide an -F flag, which turns off wildcards such as dot. However, this isn't portable.

There is yet another bug with the grep example: if a file's owner or group contains the string ".conf", it will be incorrectly passed through by the grep. This could be addressed by making sure the regular expression only matches the last part of the line (the filename):

% ls -a -l -S /etc/ | grep '\.conf[^ ]*$' | head -5
Clearly, this approach is rapidly becoming very complex, and it still doesn't work if /etc contains any files with spaces in their names!

This problem is an excellent example of how complex systems can interact in unexpected ways on rare or unanticipated cases. If you really wanted to solve this problem for all possible cases, you might have to write a program that uses the UNIX syscalls directly. However, for most purposes, the first example above would suffice, so we consider it to be a fine answer to the problem.


Problem #5

Why does the first command work in the /etc directory when the second one doesn't?? How else can the second command not produce the same output as the first? Can you think of any negative side effects that this construction might cause for the user?
#1: athena% ls | head -1
#2: athena% ls > temp; head -1 < temp

The key difference between #1 and #2 is that #2 holds the output of the ls command in a temporary file named temp, while the version #1 pipes the output of the ls command directly to the input of the head command. This fails if your current directory is a directory in which you do not have write permission, as is the case with the /etc directory. The outputs could differ because version #2 creates temp before running ls. Therefore, if temp is the first file listed in the directory, it will appear in the output of #2 but not #1.

Here are some negative side effects of the second form:

As a result of these differences, the first form of the command can be repeated any number of times, while the second form cannot be repeated without risking failure.


Problem #6

Consider these two commands:
athena% cat myyes
#!/bin/sh

echo y
sleep 1
echo n
athena% 
#1 athena% (./myyes ; ./myyes) > temp3 
#2 athena% (./myyes & ./myyes) > temp4 
Based on your understanding of file I/O in UNIX, what is going on here, and why? Is this different from what you would expect? (If there is more than one difference between the two files, it is the ordering of the letters y and n that we are interested in).

The parenthesis cause a recursive invocation of the shell; the results are put into the file named after the greater-than sign.

In #1, the two myyes shell scripts run sequentially. Each script prints a y by itself on a line, sleeps, and then prints a n by itself on a line. The result is therefore:

athena% cat temp3
y
n
y
n
athena%

In #2, the two myyes scripts run concurrently. (Technically, the first is run in the background while the second is run in the foreground.) Both scripts have the their output piped into the file temp4, resulting in outputs that are interleaved.

athena% cat temp4
[1] 3394
y
y
n
n
athena%
The first line of output is the result of running the first myyes command in the background; the shell print the job number ([1]) and the process number (3394) of that background process before executing the second instance of the myyes script.

Problem #7

This homework took your TA 4 hours to grade. The write-up took another three hours.