M.I.T. DEPARTMENT OF EECS
| 6.033 - Computer System Engineering | UNIX Hands-On Assignment |
A listing of all processes that you are currently running on the Athena machine you are using, sorted by the command name in reverse alphabetical order (i.e. a process runningzwgcshould be listed before a process runningacroread). The output should consist only of the processes you are running, and nothing else (i.e. if you are running 6 processes, the output should only have 6 lines).
Here is a preferred solution:
athena% ps -o comm -u $USER | tail +2 | sort -r zwgc tail sort ps -tcsh athena%
It turns out that the "-o" option has a different behavior if the option is followed by an equal sign. So this solution also works:
athena% ps -o comm= -u $USER | sort -r zwgc tail sort ps -tcsh athena%
This problem was tricky! Because the output of the ps
command is normally used by people, the output is "decorated" with a
header line and additional information about each process. In this
problem you were asked to omit that information from your
output. Typically you might want to remove that information if you
were trying to use this list of processes as input to another program
in a pipeline.
The Linux version of the ps command has special "--no-headers"
option that will suppress the first line of the ps
command's output. Unfortunately, this option is not implemented on the
Solaris version of the command. Thus, the portable solution is to let
ps print the header and then use a "tail +2" in the
pipeline to suppress the first line of output from the "ps" command.
Since the first line of the output (without the -o option) contains the string "PID," you
might be tempted to add grep -v PID to your pipeline to
suppress this line of output. Although this would appear to work,
there would be a lurking bug: the grep would also
suppress any command that had the upper-case letters "PID" in its
name. This could cause the program to produce strange results many
months or years after you put it into production!
One of the lessons here is that it is very difficult to write portable
scripts that work on both Solaris and Linux. (In fact, Solaris has not
one but two ps commands, each with different semantics. One is
installed as /usr/bin/ps, while the other one is installed at
/usr/ucb/ps. You will get a different version of the ps
command when you type "ps" depending on which directory is first in
your path.
The number of words in the file
/usr/dict/words (*) which contain all of the
letters a, b, c, d, e and f. These letters may occur more than once
in the word and the word may contain other letters as well. (For
example, "feedback" should be counted.)
The trick to answering this question was to realize that you needed to pipe together several invocations of the grep command --- each one to select lines that had a different letter. After the stream was created, you needed to count the number of words. This could be done easily with the wc command:
% cat /usr/share/dict/words | grep a | grep b | grep c | grep d | grep e | grep f | wc -l
It is possible to modify that last invocation of the
grep command by adding a "-c" or "--count" flag to
suppress normal output and instead print the number of matching
lines. While this works, the resulting pipeline somewhat
confusing to read:
% cat /usr/share/dict/words | grep a | grep b | grep c | grep d | grep e | grep -c f
This is confusing because all of the greps have one set
of semantics, and then the final grep has a different
one. This would not be a good idea in a production system, even if
the code is marginally more efficient.
Create7x7matrix of alternating entries of 1's and 0's. It should look like this:1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Although this problem was hard, the solution is fairly straightforward:
% yes 1 0 | fmt -14 | head -7This solution takes advantage of the fact that the
yes
command repeats its output, no matter what you give it.
Some versions of the yes command will only repeat one
word. For these, you need to quote the command's arguments:
% yes "1 0" | fmt -14 | head -7Of course, you could also use the
head command to truncate the
output of the yes command, rather than the output of the
head command, like this:
% yes 1 0 | head -c 111 | fmt -14While this works, it is unclear. Where does the constant "111" come from? What is the programmer trying to accomplish? You want 7 lines of output, so ask for 7 lines.
Create a "long" listing of the largest 5 files in the
/etc directory whose name contains
the string ".conf", sorted by decreasing file size.
This problem is trickier than it appears. The simplest approach to this problem is:
% ls -a -d -l -S /etc/*.conf* | head -5However, there are a couple of obscure hidden bugs in it. One is that shell globs will not match any files beginning with a dot unless the glob itself begins with a dot. Therefore, in order to include files such as
/etc/.conf or /etc/.foo.configuration,
we'd have to change the command to:
% ls -a -d -l -S /etc/*.conf* /etc/.conf* /etc/.*.conf*The second bug is that the shell (in the default configuration) will return an error if a shell glob matches no files (on the assumption that it is a typo). Thus, if the only files in
/etc were
named foo and bar the two commands above
would return an error instead of an empty output. The second command
is even more problematic in this respect, since it will return an
error if any of the globs is empty.
Another approach might be to try to use grep:
% ls -a -l -S /etc/ | grep .conf | head -5Although this command appears to work, it turns out there is a bug. The search string passed to
grep is actually a
regular expression, and with Unix regular expressions the dot
(".") is actually the regular expression wildcard character. Thus,
the command "grep .conf" will pass a line that contains the file
"6033.conf" but it will also pass the file "6033conf" and even
"6conf", which is not what the assignment requested.(Type "man grep" or "man
regex" for information on regular expressions.)
This problem can be avoided by escaping the dot using backslashes:
% ls -a -l -S /etc/ | grep \\.conf | head -5Two backslash characters are needed here. The first backslash is interpreted by the Unix shell to escape the second backslash, causing the string
\.conf to be passed to the grep
command on the command line. The grep command then
interprets the single backslash as an escape and treats the dot as
a literal, rather than as a wildcard character.
Some versions of grep provide an -F flag,
which turns off wildcards such as dot. However, this isn't portable.
There is yet another bug with the grep example: if a
file's owner or group contains the string ".conf", it will be
incorrectly passed through by the grep. This could be
addressed by making sure the regular expression only matches the last
part of the line (the filename):
% ls -a -l -S /etc/ | grep '\.conf[^ ]*$' | head -5Clearly, this approach is rapidly becoming very complex, and it still doesn't work if
/etc contains any files with spaces in
their names!
This problem is an excellent example of how complex systems can interact in unexpected ways on rare or unanticipated cases. If you really wanted to solve this problem for all possible cases, you might have to write a program that uses the UNIX syscalls directly. However, for most purposes, the first example above would suffice, so we consider it to be a fine answer to the problem.
Why does the first command work in the/etcdirectory when the second one doesn't?? How else can the second command not produce the same output as the first? Can you think of any negative side effects that this construction might cause for the user?#1: athena% ls | head -1
#2: athena% ls > temp; head -1 < temp
The key difference between #1 and #2 is that #2 holds the output of
the ls command in a temporary file named
temp, while the version #1 pipes the output of the
ls command directly to the input of the head
command. This fails if your current directory is a directory in which
you do not have write permission, as is the case with the
/etc directory. The outputs could differ because version
#2 creates temp before running ls.
Therefore, if temp is the first file listed in the
directory, it will appear in the output of #2 but not #1.
Here are some negative side effects of the second form:
temp file is in the same namespace as the
other files in the current directory, it is possible that the output
of the head command will contain the filename
temp! This would happen if you ran the second form in an
empty directory.temp is left behind.temp exists, and the shell variable
noclobber is set, the file temp will not be
overwritten by the ls command. In this case, the
head command will report the first line of the old
temp file.ls command must run to completion. With the
first version of the command, the ls command is terminated when it
attempts to write to its standard output after the head
command exits.Consider these two commands:athena% cat myyes #!/bin/sh echo y sleep 1 echo n athena% #1 athena% (./myyes ; ./myyes) > temp3 #2 athena% (./myyes & ./myyes) > temp4Based on your understanding of file I/O in UNIX, what is going on here, and why? Is this different from what you would expect? (If there is more than one difference between the two files, it is the ordering of the letters y and n that we are interested in).
The parenthesis cause a recursive invocation of the shell; the results are put into the file named after the greater-than sign.
In #1, the two myyes shell scripts run
sequentially. Each script prints a y by itself on a line,
sleeps, and then prints a n by itself on a line. The result is
therefore:
athena% cat temp3 y n y n athena%
In #2, the two myyes scripts run
concurrently. (Technically, the first is run in the background while
the second is run in the foreground.) Both scripts have the their
output piped into the file temp4, resulting in outputs
that are interleaved.
athena% cat temp4 [1] 3394 y y n n athena%The first line of output is the result of running the first
myyes command in the background; the shell print the job
number ([1]) and the process number (3394) of that background process
before executing the second instance of the myyes script.
This homework took your TA 4 hours to grade. The write-up took another three hours.