@string(draftdate = "Prepublication version of January 18, 1985")
@pagefooting()
@blankspace(5 lines)
@center{@g[THE DESKTOP COMPUTER AS A NETWORK PARTICIPANT]}
@center[
by:@:  Jerome H. Saltzer, David D. Clark, John L. Romkey, and Wayne L.
Gramlich@foot{
Massachusetts Institute of Technology, Department of
Electrical Engineering and Computer Science, and Laboratory for
Computer Science.  Address:@:  M.I.T. Room NE43-513, 545 Technology
Square, Cambridge, Massachusetts, 02139.}
]
@pageheading[left   "@g(PCIP paper)",
	     center "@g{@value(draftdate)}",
	     right  "@g(Page @value(page))"]
@blankspace(2 lines)
@center[@value(draftdate)]

@subheading[@i(Abstract)]

A desktop personal computer can be greatly extended in usefulness
by attaching it to a local area network and implementing a
full set of network protocols, just as one might provide for a
mainframe computer.  Such protocols are a set
of tools that allow the desktop computer not just to access data
elsewhere, but to participate in the computing milieu much more
intensely.  There are two challenges to this proposal.  First, a
personal computer may often be disconnected from the network, so
it cannot track the network state and it must be able to discover
and resynchronize with that state very quickly.  Second, 
full protocol implementations have often been large and slow, two
attributes that could be fatal in a small computer. This paper
reports a network implementation for the IBM Personal Computer
that uses several performance-oriented design techniques with
wide applicability:  an upcall/downcall organization that
simplifies structure; implementation layers that do not always
coincide with protocol specification layers; copy minimization;
and tailoring of protocol implementations with knowledge of the
application that will use them.  The size and scale of the
resulting package of programs, now in use in our laboratory for
two years, is quite reasonable for a desktop computer and the
techniques developed are applicable to a wider range of network
protocol designs.

@subheading[@i(Overview)]

This paper describes the issues encountered and lessons learned
in the design, implementation, and deployment of a full-scale
network protocol implementation for a desktop personal computer.  The
protocol family implemented was the United States Department of
Defense standard Transmission Control Protocol and Internet
Protocol[1,2].  The desktop computer was the IBM Personal
Computer attached to one of several local area networks:@:
Ethernet, PRONET, and an RS-232 asynchronous serial line network.  The
collection of programs is known as PCIP.

The project was undertaken in December, 1981, shortly after the IBM
PC became available.  Initial implementations using an RS-232
asynchronous serial line network were in operation in the summer of
1982, and a complete implementation for the Ethernet was placed in
service at M.I.T.  in January, 1983.  Since that time the
implementation has been polished, drivers for other networks have
been added, the software has been used in many applications
unrelated to network research, and the programs have been placed in
service at several other sites.  Enough experience with the
implementation has been gained to provide a convincing demonstration
that the techniques used were successful.


@subheading[@i(Introduction)]

A comparison of "large mainframe" computers attached to networks
and similar machines with no networking capabilities makes a clear
case for the value of network attachments.  The value includes
the abilities:@:  to move files from one machine's file system to
another that has better long-term reliability, more space, or
cheaper storage; to use a unique printer that has better fonts or
higher speed on another computer; to log in as a user on another
machine to get to a different data base manager or different
programming language; and to send and receive electronic mail
within a large community.  Such abilities have all proven to be
important extensions of the basic standalone capability of a
computer system.  The desktop personal computer, whose main
advantage lies in its administrative autonomy, potentially can be
extended in value by network attachment even more than the large
shared mainframe.  The reason is that by itself it is likely to
have a smaller range of facilities than does a large, shared
mainframe, and thus a mechanism that offers the ability to make
occasional use of unique services found elsewhere is especially
useful.

In the mainframe world, this added function has come with a
substantial cost, however.  Implementations
of network protocols have usually turned out to be large and
slow.  Although size of software packages is of somewhat less
concern than it once was (because the cost of memory to hold
those large packages seems to drop faster than the packages grow)
long path lengths through those packages can produce bottlenecks
and limit data rates.  For example, although the hardware links
in the ARPANET are mostly of 56 kbit/s, few attached systems can
sustain a data rate much above 15 kbit/s.  When those same
systems are attached to local area networks that can accept data
rates of 10 mbit/s their software continues to be a bottleneck
in the 15 kbit/s area.

Thus the question arises:@: can one put together a useful
implementation of a network protocol family, one that fits into a
desktop computer that does not have a virtual memory in which to
hide bulky programs and that has a processor perhaps one-tenth the
speed of a mainframe?  Our particular experiences in doing protocol
implementations for several different mainframes suggested that the
slow, bulky implementations are not intrinsic.  Instead, they are
brought about by a combination of several conquerable effects:

1)  Although protocols are described in terms of layers, the
particular layer structure chosen for description is not
necessarily suitable for direct implementation.  A naive
implementation that places software modularity boundaries at the
protocol layer boundaries can be extremely inefficient.  The
reason for the inefficiency is that in moving data, software
modularity boundaries usually become the points where buffers and
queues are inserted[3].  But the protocol layer boundaries are not
necessarily the most effective points for buffering and queueing.
A particular issue is that it is vital to minimize the number of
times that data gets copied on the way from the application out
to the wire and vice-versa.  (The operating system sometimes
encourages extra copying, too, but that is part of point three,
below.)

2)  There are usually several ways to implement a protocol, all of
which meet the specifications, but that can have radically
different performance; the way that produces best performance for
one application may be quite different from the way that produces
best performance for another application. An implementation that
tries to provide a general base for a wide variety of
applications can perform much worse than one that is designed
with one application in mind.  This application variability of
performance shows up strongly in the choice of data buffering
strategies and in the choice of flow control strategies.

3) The current generation of operating systems is ill-equipped for
integration of high-performance network protocols.  Problems are
often encountered in the form of a long time required to switch
contexts, clumsy interprocess communication, or inadequate memory
sharing.  Good implementation of network protocols requires a very
agile, light-weight mechanism for coordination of intrinisically
parallel activities_sending packets, receiving packets, sending
packets at low levels as a result of receiving packets that require
further processing at high levels, dallying in packet dispatch in
hope that further processing will allow the piggybacking of
responses at different levels into a single response packet, and so
on.  The various parallel activities of a network implementation are
characterized by substantial sharing of both protocol state and
packet data, so shared-variable communication is another essential
feature.  For an operating system to give really substantial support
it needs to provide three forms of memory sharing:@: among
operating-system-provided processes, between such processes and the
operating system kernel, and between real and virtual memory areas.
Often one or more of these three sharing mechanisms is missing or
weak, so the protocol implementation is forced to spend time making
copies.

One might summarize all three of these points by the single
observation that current network protocol implementations,
especially for high-speed local area networks, are
quite early on the learning curve of this software area. Most
experience so far is on large mainframes and with networks that
operate at telephone line speeds.  A particularly unexplored area
is protocol implementations that work well with network speeds that
span the four orders of magnitude from 1.2 kbit/s to 10
mbit/s.  One would expect that as
experience is gained implementations will improve.  One of the
primary purposes of the PC network implementation was to take one
or two steps higher on that learning curve.

In the remainder of this paper, we first describe what was
implemented, and then discuss the organizing strategies that make
the implementation interesting.  

@subheading[@i(What was implemented)]

Figure one shows the various protocols and drivers used within
the PCIP software packages. PCIP divides naturally into three
levels_the driver level, the transport level, and the application
level.  At the driver level are modules that manage 
different local area network hardware interfaces:@:  the 3COM
EtherLink Ethernet interface, the Proteon PRONET 10 mbit/s token
ring interface, and the RS-232 asynchronous serial line port. 

The transport level has three major components. The Internet
Protocol (IP) provides for packets originating on one network to
be sent to a destination on another network. (In this paper the term
"packet" describes the object delivered from one network-attached
computer to another via the Internet Protocol.)  The User Datagram
Protocol (UDP) is a connection-less protocol intended for the
transmission of a single, uncontrolled datagram. The Transmission
Control Protocol (TCP) provides a reliable, full-duplex byte
stream connection usually between a user process at one network node
and a server process at another. 

One application-level protocol, Remote Virtual Disk (RVD), is built
directly on the IP layer.  RVD is implemented as a device driver
that allows one to read and write individual disk blocks on a
remote machine as if they were on a local disk.

Several application-level protocols are built on UDP, each
providing its own application-specialized error control. For example,
the node name protocol takes a character string name for a node
and consults a series of name servers to learn that node's 32-bit
internet address, using UDP. The Trivial File Transfer Protocol
(TFTP) is a lock-step file transfer protocol built on UDP, in
which the file receiver must acknowledge each datagram before the
sender may dispatch the next one.  The Print File program permits a user to
print a text file, using TFTP to transport the file to a printer
server.  The get time protocol obtains the time and date from a
set of time and date servers.

The application programs that use TCP are the remote login
protocol, named Telnet, and several information lookup protocols.
In addition, some TCP-based mail facilities are currently being
implemented. The Telnet program uses a Heath H19 terminal
emulator in managing the keyboard and screen of the PC[4].@:  An
unusual, but very useful, application program, Netwatch, built
directly on the local-area network drivers, allows monitoring of all
or selected traffic on the local network.  The ability to turn any
PC into a network monitoring station at a moment's notice has been
used to solve dozens of performance, configuration, and trouble
isolation problems.

@subheading[@i(You get more than remote terminal emulation)]

Although a remote login protocol is an important function, it is
not by itself justification for a network implementation_if that
were the only function obtained, one could use one of the many
terminal emulator programs for the PC instead.  The interest in a
fuller protocol family implementation for a PC comes about
when considering the range of services that become directly available for
the PC user, and the ease of building new applications.  Examples
range from seemingly trivial ones to major work-savers.

Among the apparently trivial features is the PC command that sets
the PC system clock (date and time) by sending datagrams to several
network servers[5].  This command is included in most of our PC
users' automatic bootload batch files, where it eliminates the need
for an extra battery-powered clock card.  Only after this command
became available did the date and time records kept in DOS floppy
disk directories become reliable indicators of which version of a
file one was looking at.  Another remarkably useful command is one
that obtains from any timesharing system in the internet a list of
currently logged-in users and identification information on any
particular named user of that system.  A similar command obtains
directory information from the ARPANET Network Information Center.
Another command sends a request for an echo response to any network
node, to allow trouble isolation.  These tools, each not very
important in itself, become part of an operation repertoire that
makes the desktop computer much more useful than when it stands
alone.  

Probably the single most important tool is the file transfer
protocol, TFTP.  TFTP provides the ability to move a file between
the PC and any network-attached timesharing system or file
server.  With TFTP, one can casually undertake quite
complex operations.  A typical use, such as the preparation of
this paper, involves several authors each using a favorite editor
on the PC to prepare individual contributions.  Each 
moves contributions to a common directory on a central file
server, so the others can look them over and provide
comments and suggestions.  One author moves all the paper
fragments to a private PC, assembles them, runs them through a
formatter and then sends them, again using TFTP, to a
sophisticated laser printer server located elsewhere in the
network.  Because the network is not just local, but is seamlessly
interconnected by the ARPANET to many other sites nation- and
world-wide, the authors and other facilities can be assembled
from a geographically dispersed set.  And because TFTP is an
independent entity, it can be used as a subsystem in a more complex
application, such as sending a text file to a manuscript preparation
server, sending the result to a laser printer, and retrieving result
log files, all in an under-the-cover application package that runs
on the PC.  Perhaps more important, if some file transfer mechanism
had not been available, so that terminal emulation were the only
means of communication, it would not have been easy to include
stand-alone programs on the PC, such as text editors, as part of
this process.

When added to this set of network tools, a remote login protocol
becomes a migration and extension tool, since it makes functions
that are missing on the PC, such as specialized languages or
databases, easily available by allowing the PC user to attach to a
timesharing system anywhere in the network.  The most prominent
example of a function currently missing in our repertoire is
electronic mail handling.  While waiting for a mail handling package
to be implemented for the PC, sending and receiving mail is
accomplished by logging in to one of the large timesharing systems.
Another useful feature of implementing remote login as a network
package rather than as a simple terminal emulator is that
TFTP is available at the same time.  This feature allows one to use
any timesharing system commands to locate, collect, or create files,
and then send them immediately back to the PC.  It also allows other
users to move files to and from the PC at the same time it is being
used for remote login.

The sum of these tools is greater than the individual parts, because
the application user can blend remote with local operations as the
job requires.  An application programmer finds even more
flexibility, since any service offered by any network site can be
incorporated in an application program.  It is exactly this ability
to compose network functions in unanticipated ways that makes a
complete network protocol implementation for a PC so powerful.


@subheading[@i(Remote Virtual Disk)]

A good example of an extended service possibility is our
implementation of the Remote Virtual Disk protocol (RVD) for the
PC. This protocol, locally developed at M.I.T., permits a machine
to have access to disk storage which appears to be local, but
which is in fact remotely located at a server across the network[6].
To accomplish this appearance, a device driver is written that,
instead of reading and writing to a real disk, sends messages
across the network to the RVD server which does the actual
reads and writes.

There are a number of uses for the function provided by RVD. Most
important, the disk made available through RVD can be shared,
thus providing a mechanism for distribution of software,
especially making a large library of tools available to a
community.  In this use, an RVD disk strongly resembles the
virtual minidisk provided by the VM/370 operating
system[7].  (Note, however, that if sharing is the
primary goal, sharing at the physical disk block level is not
as flexible as sharing at the logical file level.  Remote file
system protocols have been the subject of much research and
development activity lately.[8,9])

A simple but helpful use of the RVD disks is as an extension to the
private disk storage of the individual machine.  The economics of
large and small disks is currently such that one has only a modest
price advantage over the other, but the functional advantage of RVD
is threefold.  First, any RVD disk can be available to every PC on
the net, so in contrast to the permanently attached Winchester disk,
the file stored on an RVD disk can still be reached if one's private
PC is down, by walking down the hall and finding another
network-attached PC.@: Second, since the RVD disks are actually
partitions of centrally-located large disks, one can arrange for a
central operations staff to make backup copies of the information
stored on RVD disks.  The need to make backup copies of information
stored on private Winchester disks has proven to be one of the
operational headaches of those devices; with RVD the headaches can
be subcontracted to someone else.  Third, the effective data rate of
the RVD disk is comparable to a local hard disk and substantially
better than that of a floppy disk.  Large block transfers using RVD
take place across the Ethernet at about 240 kbit/s.


@subheading[@i(The PC environment)]

Development of a network implementation for the PC required that
a number of choices be made, both in the development environment
and in the programming environment.  This section describes those
choices.

The development environment, while it entailed difficult choices,
did not involve any new ideas or breakthroughs.  Programming was
done on a microcomputer development system that runs on a nearby
UNIX time-sharing system.  That approach was used rather than
doing the programming entirely on the PC because in 1981, when the
choice was made, very little support software (editors, choice of
compilers, library managers) was yet available to run on the PC.
The programming was done in the C language, with the choice again
based primarily on the combination of compiler and assembler
availability.  It was apparent that some assembly language
programming would be required, and the only assembler that we
could locate for the PC at the time was one that came as part of
an integrated C compiler/assembler/loader package.

The programming environment used was the IBM DOS operating
system[10].  This choice was easier than it might have appeared:@:
all of the operating system alternatives provided very little
support for the kinds of operations needed to do a network protocol
implementation, so all required that support to be added.  Thus the
choice was made on predicted ubiquity, on which point DOS appeared
strongest.  The primary run-time facility added was a tasking and
timer management package that permits as many parallel tasks as
necessary to operate within a single address space.  For simplicity,
the tasking package runs each task to completion (either "block,"
awaiting a wakeup, or "yield," allowing other tasks to run) using a
round robin scheduler.

The combination of the development environment and programming
environment required one bootstrapping program to be
constructed_an RS-232 port file-copying program for the PC that
could take a file being pushed at it by UNIX and store it in the
PC file system.  The development environment on UNIX produced
loaded, ready-to-run command files; the bootstrap provided a way
of getting those command files into the PC for execution.  The
first real network program developed was one that implemented a
standard file transfer protocol, and as soon as that program was
operational the bootstrap was no longer needed[11].


@subheading[@i(PCIP over asynchronous serial ports)]

When the IBM PC was first announced there was no local area
network interface available for it, but several manufacturers
seemed intent on supplying them within a year or so.  Rather than
building a piece of special hardware that would be soon
discarded, we opted to use the PC's asynchronous serial port as a
temporary substitute.  To connect the asynchronous serial port to an existing
local area network, a token ring, we configured a Digital Equipment 
Corporation LSI-11
to contain both a token ring network interface and a small number
of asynchronous serial ports.  This LSI-11 came to be known as the
PC-Gateway. The PC-Gateway was programmed to treat the set of
asynchronous ports as a local network, and to act as a packet-forwarding
gateway between that local network and the token ring. When the
PC was ready to send a packet of data, it merely sent the packet
as a sequence of 8-bit bytes over the asynchronous port.  This approach
made the combination of the asynchronous port driver, the port,
the serial line, and the PC-Gateway a unit that could later be
replaced by a local network driver and a network hardware interface.  

There were two useful results from the PC-Gateway.  First, it
permitted substantial progress to be made in implementing and
polishing the network code for the PC.  When local network hardware
did become available for the PC, the only software effort was to
replace the asynchronous port driver with a network interface
driver.  Second, it turned out to be surprising useful, and was not
discarded when network interfaces arrived.  Instead, dial-up modems
were attached to unused asynchronous ports of the PC-Gateway to
permit people who had PC's at home to connect to the network using
telephone lines.  

There was mixed success with the PC gateway using asynchronous
lines.    On a
9.6 kbit/s line, there was no major problem in
performing either file transfers or using remote login, even with
character-at-a-time remote echo.  On a 1.2 kbit/s dial-up telephone
line, file transfers were reasonably successful.  (Sometimes the
transmission time involved in sending a long packet over a 1.2
kbit/s line would cause the remote system to time out and
abort the file transfer.  Eventually, most other implementations
learned to be patient enough to tolerate telephone-line transfers.)
For remote login to time-sharing systems that work in character-at-a-time
remote echo mode, each time the user typed a character, a packet in
excess of 25 bytes was transmitted over the low-speed line.  It was
thus very easy for a fast typist to saturate the connection to the
PC-Gateway, and echoing fell far behind the typist.

The problem of low-speed line saturation in one-character-per packet 
node could have been overcome by employing data compression.
Many of the bytes in each packet of a TCP connection are likely to
be identical to those of the previous packet.  An algorithm was
discussed, but never implemented, to take advantage of this
observation and transmit only the differences between the current
packet and the previously transmitted packet.  Compression was never
implemented because the arrival of high-speed local area network
interfaces reduced demand for remote login over 1.2 kbit/s lines.
However, if the effort had been undertaken to increase performance
on 1.2 kbit/s telephone lines we believe that it would have been
technically feasible.

One of the lessons learned from implementing the PC-Gateway was that
a dial-up packet forwarder opens a new world of opportunities when
compared with a dial-up terminal concentrator.  When only terminal
concentrator ports are available, files are usually transferred
between mainframe computers and PC's using one of several embedded
protocols such as KERMIT, developed at Columbia University[12].  One
of the problems with such embedded protocols is that there are
several different ones, so the PC and the mainframe may not have one
in common.  When that is the case, some staging process must be
employed whereby the user first moves the file from its original
site to one that implements the same protocol available on the PC,
and then transfers the file over the asynchronous connection.  In
contrast, the PC-gateway allowed implementation of a standard
network file transfer protocol (TFTP) on the PC itself, which made
file transfer immediately usable with all the other network
participants.  But even more important, all other network services,
such as node name resolution, time-of-day service, etc., become
available to the PC, and the PC can respond to requests initiated by
other systems, inquiring about presence, identity, existence of
files, etc.  

There is no reason why both terminal concentration and packet
forwarding cannot be provided on a single port.  Our advice
to future implementors of network terminal concentrators is to
provide an escape mechanism so that a PC can directly send and
receive network packets carrying any protocol the PC finds useful.
This escape can give the PC the opportunity to make fuller use of
the network possibilities.


@subheading[@i(Tailoring the implementation to the environment)]

There are a few characteristics of desktop computer operation
that are quite different from mainframe operation, and these
characteristics affect the way in which the network is integrated
with the system.  The most important of these is that the desktop
computer is often_perhaps usually_not "on the network".  When
not in use, a desktop computer is often powered off, perhaps to
reduce the noise and heat in the office in which it is located.
Even when powered on, one cannot expect the network software to
be always in operation.  Some desktop computer application
software packages operate by taking over the entire machine,
sometimes to prevent pirate copies of the program from being made
and sometimes simply because they require every scrap of memory
in order to perform usably.  

Thus the software in the personal computer cannot expect to
maintain a continuous record of the state of the network; instead
it must be organized so that it can quickly discover whatever
state it needs when it is called into operation. To cope with the
"normally-off-the-network" paradigm of operation, the various
PCIP programs do not attempt to retain any discovered network
information at all for the use of the next program that may use
the network.  Because one has no idea what other application
program may run between two network programs, the up-to-dateness and integrity
of
any state variable stored in primary memory is questionable, and
it is safer to rediscover the network information rather than to
depend on a stored value.  Thus if one initiates a file transfer
with another site, such facts as the round trip time to that
site, its network address, and the Ethernet address of an
intervening gateway are all discovered, used during the transfer,
and then discarded.  If the next command to be typed is another
file transfer to the same site the listed facts are all
rediscovered again.  This approach, while perhaps seeming
wasteful, actually costs quite little and has a very large payoff
in improved reliability of the network software.  In contrast
with our experience with other network implementations that
maintain network state continuously, in PCIP one almost never
encounters the situation in which anomolous behavior (caused by
recorded state getting out of step with real state) leads to a
need to reboot the system or explicitly reinitialize the network
code to get it working again.  (However, all is not roses.
Because there is no protection between supervisor and user in the
PC, bugs in either the network code or in a user application can
cause a system crash, requiring a reboot to recover. During
application programming such crashes are fairly common, providing
another reason why one cannot depend on maintaining network state
records.)

Another aspect of this expectation of frequent detachment from
the network is that the PC network implementation makes no
attempt at all to maintain a table of (user oriented,
character-string) names of other nodes and their network
addresses.  Keeping such a table in step with the
name tables in nodes that are always online (and which depend on
that usual onlineness in informing one another of changes) would
be a major challenge.  So instead the PC depends on the
availability of node name translation services provided by many
of the always-online network systems.

A related problem is that the network software must be able to
discover quickly environment parameters (such as network
addresses of nearby gateways and other servers or the number of
the network to which it is attached) rather than expecting that
the user types them in each time when a network program is used.
To provide such environment parameters, the PCIP implementation
uses a trick:@:  A piece of code is installed as a DOS device
driver, but this piece of code does not actually control a real
device. Instead, calls to read from this device cause the code to
send back a stream of environment information, in a standard
format. Every PCIP program knows how to interpret this stream,
and thereby has a quick way of discovering the facts about the
environment it needs. A customization program allows the
application user to set up this pseudo device driver.  Using a
pseudo device driver provides this information much more rapidly
than reading a file, and it is far easier to change as compared
with the alternative of assembling the information in as
constants of the programs.  (The DOS 2.0 environment variable
feature in principle provides an equally good way to do this job,
but unfortunately the space allocated by DOS for environment
variables was insufficient.)


@subheading[@i(Tailoring the implementation to the application)]

Perhaps the most interesting strategy used by the PCIP software
to obtain good performance in a small machine is the tailoring of
the network implementation to match the application that will use
it.  There are several examples of this tailoring that illustrate
the idea.  The primary examples are in the implementation of the end-to-end
transport protocol, TCP.  This implementation was designed to
work optimally with only one application protocol, the "User Telnet"
remote login protocol[13].  The idea of tailoring is that the
knowledge that the only application is remote login should guide
implementation decisions in the transport protocol.

Some of the decisions simply relate to how much standard TCP
function to implement.  The PC TCP can only originate connections;
no provision was made for other nodes to make connections to the PC,
because that feature is not needed by User Telnet.  Similarly, PC
TCP can maintain only one connection at a time, because User Telnet
requires only one connection.  A substantial amount of table
management code is thus unneeded.  

TCP includes a sliding window for flow control.  The PC TCP simply
ignores the window values sent to it by the remote login server,
because when it is used for remote login, the only data sent to the
login service is that typed by a person at a keyboard, and that data
rate is almost certain to be lower than the rate that the service
can accept data.  (If once in a great while the service falls behind
so far that the typist gets ahead of the offered window, no loss of
data occurs_the login service simply stops acknowledging the data,
and the PC TCP has for error control a timeout-resend strategy that
retries until the service catches up.)  The simplicity that
results from ignoring windows makes the code both smaller and
faster.

To minimize copying of data and space occupied by packet buffers,
the TCP send function is tailored in another way with the knowledge
that data comes from a typist.  Only one packet buffer is provided
for output data, and this packet buffer is set up with certain
fields, such as the source and destination addresses, precalculated,
since they never change.  When the user types a character, Telnet
calls the TCP send function with the character as the argument, and
send merely drops the character into the precalculated packet
buffer, adjusts any remaining fields, and calls the local network
driver with a pointer to the packet buffer.  Because the output is
to a high-speed local area network the network driver will complete
the dispatch of the packet before returning to TCP.  It is thus safe
for TCP to assume that it now has control of, and can change the
contents of the output buffer.  If the user types another character
before the login service acknowledges the earlier one, Telnet calls
TCP as usual, but TCP's send function simply slips this new
character into the same packet buffer following the earlier
character, and dispatches this packet containing, now, two
characters.  If the earlier packet is lost in transit (and thus no
acknowledgement of it ever comes back from the service) this new
two-character packet will act as the resend.  

This technique of adding characters to the output packet buffer
as they are typed has a limit, of course; if the typist fills the
packet buffer (500 characters, which allows at least 30 seconds of
frantic typing) before the remote service acknowledges the first
character typed the typist must be asked to stop; the TCP
send function simply returns an error condition to Telnet
when the single packet buffer is full, and Telnet notifies the
typist to desist.  This situation occurs very rarely in practice.
Normally, the remote service receives a packet and sends back an
acknowledgement of the oldest typed characters.  The PC TCP, upon
seeing that acknowledgement, adjusts the characters in the output
packet buffer by sliding them back so that the first
unacknowledged character is first in the output buffer.  Usually,
the acknowledgement is for all the outstanding data, and no copying
occurs at all.

This whole collection of techniques of output buffer management
reduces path length, buffer space, and packet copying, but all of
them depend on the knowledge that the send function will be used in
a particular way.  If one tried to use this tailored TCP to send a
file consisting of many large blocks of data, its performance would
be very poor.  It might overrun the file server, because it ignores
that server's flow control windows, leading to many unnecessary
retransmissions.  It could accept only one packet of data to be sent
at a time, because it has only one packet buffer, and it cannot
reuse that buffer until acknowledgement comes from the other end
that the receiver has accepted the data.  There would be much time
spent copying the large blocks of data from one end of the packet
buffer to the other as acknowledgements came back.  And, finally,
the implementor of the file transfer program would find that the TCP
send interface accepts only one byte on each call, so sending a
block of data would require an inefficient repeated call loop.

For data flowing to the PC, a completely
different set of considerations holds.  In this direction, the PC
TCP implements flow control windows because it can be overrun by
an active, high-powered time-sharing system.  However, there are
still opportunities for tailoring the implementation.

The most serious problem with incoming data is not just that it
arrives too fast, but that in the ARPANET some servers sometimes transmit a
separate packet for each byte of data they send.  Since the TCP
window controls the number of outstanding bytes rather than the
number of outstanding packets, the window does not prevent a
flood of packets if the data is being sent in this very
inefficient way.  The problem shows up if the PC cannot keep up
with the rate of arriving packets; fairly soon a packet gets
missed and thus not acknowledged.  The sending site eventually
times out and resends, starting with the missed packet.  The
time-out shows up as a noticeable pause in the flow of data to
the user's screen. The PC TCP required a special buffering scheme
to deal with a large number of arriving small packets.  Since
running a complete terminal emulator is actually more
time-consuming than processing incoming packets, the PC emulator
is permitted to handle only a few bytes at a time before
returning to the TCP level to see if more packets have come in.
This strategy permits as much processor time as possible to be
allocated to packet handling.  (As described in the next section,
the PC terminal emulator is invoked by an "upcall" from TCP, so
limiting it is actually quite easy_TCP simply calls the emulator with an
argument consisting of the number of characters it thinks the emulator
can handle.)

This implicit flow control mechanism between the emulator and TCP
replaces the more general explicit flow control system that would
have to be implemented if TCP had been designed to cope with
arbitrary client protocols including, for example, file transfer.

At least one more, minor opportunity for tailoring exists in this
direction.  Since the customer application is remote login, it is
a good bet that the largest quantity of data that will ever
arrive in a single burst over a connection from the remote login
service is one screen full, a predictably finite amount of data.  Thus
TCP input buffers and window size need be provided just for this
amount and no more.  If an ambitious server process aspires to send more
than one screenful of data in a burst, the window mechanism acts
as a throttle. In the most common case everything proceeds
smoothly and optimally and the window is not a limit. In an unusual
case performance may suffer but no data is lost.

@subheading[@i(Upcalls)]

The combination of the tasking package and the C language features
of static storage and
procedure variables are used extensively throughout the
network implementation in a style of programming known locally as
"upcall/downcall".  (In some of the more recently developed
window management systems, and the Pilot file system, the same
style of programming is sometimes known as "callback"[14].) In
this style of programming, some tasks are waiting for events at
"high" levels, for example in application programs.  When an
event occurs the application program operates by calling "down" to lower
level network implementation programs. This is the usual style of
programming of operating systems. However, other tasks wait for
signals at low levels, inside network driver programs, for
example.  When a signal starts them, perhaps because a packet has
arrived, they operate by handling the packet operations at their
level, and then calling "up" to higher levels of network protocol
and eventually "up" to the application.

The denotation "up" and "down" can be misleading, because a call
"up" can lead to a call "down" as part of its implementation.
For example, the arrival of a packet may result in an upcall to
dispose of the packet, and during that upcall one or more
downcalls to send acknowledgements, flow control messages, or an
application-level response.

Figure two illustrates in a simplified example the use of this
organization in the implementation of the Telnet remote login
protocol.  In that figure, in the left column, the top level
application program creates a parallel task (in the right column)
to handle arriving packets using upcalls.  The top level program
proceeds to
initialize static procedure variables in anticipation of upcalls
at the several network protocol levels. The main task then concentrates on
sending typed characters to the remote server.  Meanwhile, in the
right column, all packets coming
from the remote server are noticed at a low level by the network
driver, which calls upward, using the previously initialized
tables of procedure variables, eventually reaching the screen
display procedure of the terminal emulator. Although the actual
programs are complicated by error conditions, the basic flow of
control illustrated in this figure is complete and, relative to
other implementations we have seen, quite simple[15].

The upcall/downcall programming style, together with a tasking
package that allows several tasks to operate within a single
address space is the primary set of tools used to gain leverage
against the third performance-draining effect mentioned
earlier_that the current generation of operating systems doesn't
provide agile, lightweight support for the parallel operations
that are required to run a network implementation.  An upcall
also provides a natural way for a network implementation layer to
receive data from below and pass it up higher without having to
copy it just to insure that it doesn't get deallocated by the
lower level. Thus some leverage is also obtained against the
first performance-draining effect_too much buffering at protocol
layer boundaries.  Another example of the simplifying effect of
upcalls was mentioned in the previous section, which described
their use to provide implicit flow control between TCP and
Telnet.


@subheading[@i(Getting around DOS)]

The implementation of the Remote Virtual Disk protocol for the PC
was an interesting exercise.  The DOS operating system has a provision for
user-installed disk drivers, so there was an obvious place to
integrate the RVD interface. However, the RVD driver is rather
different from most drivers; since it implements a network
protocol inside, it contains all the support tools we implemented
for the other protocol packages, including our tasking scheduler
and our timer manager. Since PC DOS is not designed to be
re-entrant, the driver cannot call on DOS for any services, so it
must re-create any DOS functions it needs. The resulting
exercise causes the implementer of RVD to stand on his head to
get some things done, and produces a device driver for DOS with
considerably more sophisticated operating system features than
DOS itself.

There was one limitation imposed by DOS that we have not yet tried
to circumvent.  Since the network package for RVD was hidden inside
what DOS thought was a disk driver, that network package was not
available for use by other applications.  Since only one driver can
control the physical network interface at a time, no other network
application could use RVD service.  This limitation meant that, for
example, one could not use the file transfer protocol to move a file
to or from an RVD disk.  Such transfers currently require a
two-stage operation, moving the file via a disk physically at the
local PC and copying it from there to or from the RVD disk.
Removing this restriction could be done by adding an ad hoc
communication path directly from the application to the RVD package,
a path that a more flexible operating system might have provided.

Our experience with RVD clearly showed that the PC had enough
power to support this kind of protocol, and that such a feature
could be very helpful. Even with its limitations, RVD is in wide
use in our laboratory. However, the limitations of DOS 2.0
increased the difficulty of this project, and reduced somewhat
the value of the final service. Fortunately, this sort of limitation
seems to be going away as the creators of operating systems
expand their vision of the capabilities of a PC class machine.  For
example, in a UNIX-based operating system such as XENIX or PCIX, the
restriction on RVD use by other network applications would not be
necessary.@foot[UNIX is a trademark of @i[AT&T].@:  XENIX is a
trademark of MicroSoft Corporation.@:  PCIX is a trademark of Interactive
Systems Corporation.]

@subheading[@i(On size and scale)]

While the CPU of the PC can access 1 mbyte of memory, all of the
PCIP packages can operate in a 128 kbyte configuration.  (This small
size was fortunate, because it happened that the available C
compiler used a "small memory model", limiting one loaded program to
64 kbytes of code and 64 kbytes of data.)  The individual packages
are relatively small; combined they easily meet this constraint.
Consider the decomposition of the code space of the file transfer
package, TFTP:

@verbatim[
		TFTP user and server	 	 7650 bytes
		UDP				 	 2812
		IP				 	 4614
		ethernet driver			 5744
		network common library	 1198
		timer and tasking package	 2314
		C run time support		 7569
		total					31901 bytes
]

The largest, most complex package is the Telnet command.  It uses
TCP and UDP (for name resolution) and contains a TFTP server.  This
command consists of the modules above, plus:

@verbatim[
		Telnet				 7392 bytes
		TCP					 5946
		total				 	45339 bytes
]

The size of Telnet includes the size of the screen manager as well as
the protocol implementation. Notice that 
Telnet and TCP are individually the most complex modules implemented.

An interesting observation about the scale of a network package
for a personal computer comes from examination of a typical
package, the one that does file transfer. The implementation of
TFTP user and server is done in three C language programs and one
C language "include" file, of common data structure definitions.
That set of programs implements just the box labeled "Trivial
File Transfer" in figure one.  These C programs together total
about 1020 lines of code (excluding comments,) of which about 450
lines implement the main stream of the protocol, 505 lines handle
error conditions, and 65 were provided as aids for debugging.
The 50% figure for handling error conditions in our experience is
typical for network code that is intended to be reasonably
robust.  A similar fraction was noted by Clark in his
implementation of the TFTP protocol in PL/I for the Multics
system.  Probably much more than half the intellectual effort of
design and debugging went into that part of the code, since it
tends to involve untangling of things that didn't go right,
rather than straightforwardly moving on to the next step of the
protocol.  The 1000-line figure for TFTP as a whole indicates
that the overall size of network packages is well within the
capability of a desktop computer.

The lesson to be drawn from all these numbers is that with proper
system support, good organization, and attention to the client
being supported, a network protocol package need not be a large
module.  

When we examine the performance of the programs, we find that the
bottlenecks are not in the protocol implementations themselves,
but in resources the applications utilize. The code wasn't
written with great concern for performance because it was
expected that the bottlenecks would be found outside of the
protocol implementations. The low cost of context switching and
few data copies allow fast transfer of data through the protocol
layers.  For instance, TFTP writing to a floppy disk frequently
achieves an end-to-end useful data rate of 13 kbit/s,
about the writing speed of the floppy disk. With a Winchester
disk, TFTP can transfer data over the network at a rate of about
55 kbit/s, again about the writing speed (for small
blocks) of the disk drive itself. When tests are done in which
TFTP discards data as soon as it is received, network transfers
run as fast as 110 kbit/s.  Thus the bottlenecks in file
transfer seem to be the disk systems, and improvements that we
might make to the protocol implementation would not substantially
alter the transfer rates achieved. 

A second example is Telnet.  Monitoring shows that it spends 50% of
its processing time in the Heath H19 terminal emulator.  Another 30%
is spent idle, waiting for something to do.  For a real performance
breakthrough in Telnet, the terminal emulator would have to be
improved, rather than the IP or TCP implementation.  While some
speed could be gained by small changes to the TCP implementation,
the terminal emulator is the real bottleneck.

@subheading[@i(Conclusions)]

In the beginning of this paper, we identified three problems that
can beset the implementor of network protocols:

1) The architected layer structure of the protocol can prove
unsuitable as a structuring technique for the implementation.

2) An implementation that attempts to serve several clients may
either be very complex or provide poor performance to some
or all clients.

3) The operating system chosen may provide poor support for the
needed program structure.

The impact of these problems is that a full implementation of a
protocol suite tends to be sufficiently bulky and slow that a
realization inside a personal computer seems impractical. We have
shown to our satisfaction that this need not be so. We produced a
running and useful implementation that is consistent with the
speed and size of an IBM PC, by identifying and using techniques
that directly combat the problems identified above.

To avoid the excessive interfacing code that results from
classical layering, we used an interface technique, upcalls, that
put the asynchronous boundaries in the implementation only where
they are needed. Subroutine calls, always more efficient than
process switches, are used wherever possible. 

To combat the high cost of generality, we abandoned it wherever
abandonment really seemed to pay off. Instead of producing a
virtual circuit protocol that attempted good performance for all
clients, we tailored the implementation to remote login. Compared
to other implementations of more generality that we have
examined, this code was substantially smaller and simpler to
produce. 

To solve the problem of an unsuitable operating system, we
provided fragments of our own, as part of the network code. This kind of
replacement is not always possible, but in this case it both
proved the benefit of proper system support for protocols, and demonstrated the
flexibility of the programming environment of the PC.

We feel very strongly that it is a good approach to produce
implementations that are tailored to specific clients, as opposed to
more general implementations.  The drawback of this technique is
that if several clients are to be supported, it is necessary to
produce several different implementations of the support program,
which produces unwelcome increases in maintenance costs.  In other
projects we have done this sort of multiple implementation, and do
not feel that the development effort is substantial.  Many parts of
the implementation, such as the protocol state machine, can be
reused.  To help control the maintenance cost, we are now exploring
different modularity techniques in which the protocol state machine
for a layer is implemented as a general module, while the data flow
paths are supplied by each client, using a standard interface.


@subheading[@i(Acknowledgements )]

The implementation of the programs described here was supported
by the IBM Corporation in a general grant for computer science
research at M.I.T.  Many of the ideas were borrowed, and some of
the code was ported, from projects supported at M.I.T. by the
Defense Advanced Research Projects Agency.  The first
implementation of TFTP was accomplished by Karl Wright, and the
initial implementation of Telnet was done by Louis Konopelski.
David Bridgham wrote the terminal emulator used in Telnet.  Chris
Terman kindly supplied the C-language development system. Several
early users, including especially Fernando Corbato and Robert
Iannucci, acted as uncomplaining guinea pigs while the
network code was being debugged.  Finally, one of the anonymous referees of
this paper provided extensive comments of unusual depth and insight.


@subheading[@i<References>]

@begin[enumerate] 
---, @i[Internet Protocol Transition Workbook], SRI International,
Network Information Center, Menlo Park, CA, March, 1982.

---, @i[Internet Protocol Implementation Guide],
SRI International, Network Information Center, Menlo Park,
CA, August, 1982.

Cooper, G@.  H@., "An Argument for Soft Layering of Protocols,"
Massachusetts Institute of Technology, Department of Electrical
Engineering and Computer Science S.M@. thesis, May, 1983.@:
Available as M.I.T@.  Laboratory for Computer Science Technical
Report, TR-300, May, 1983.

---, "Heathkit Manual for the Video Terminal, Model H19," Heath
Company, Benton Harbor, Michigan, 1979.

Saltzer, J.  H@., "PC/IP User's Guide," M.I.T@.  Laboratory for
Computer Science, December, 1984.  

Greenwald, M., "Remote Virtual Disk Protocol Specifications,"
Technical Memorandum, Massachusetts Institute of Technology
Laboratory for Computer Science, Cambridge, MA.@: In Preparation.

Seawright, L@. H@. and MacKinnon, R@. A@., "VM/370--A Study of
Multiplicity and Usefulness," @i[IBM Systems Journal 19], 1, 1979,
pp@. 4-17.

Laselle, J., et al., "EtherShare User's Guide," 3COM Corp., Mountain
View, CA, July, 1983.

Goldstein, B.C., et al., "Directions in Cooperative Processing
Between Workstations and Hosts," @i[IBM Systems Journal 23], 3
(1984) pp@. 236-244.

Microsoft, Inc., @i[Disk Operating System], Version 2.0, IBM
Corporation, Boca Raton, Fla., January, 1983.

Wright, Karl D@., "A File Transfer Program for a Personal Computer,"
Massachusetts Institute of Technology, Department of Electrical Engineering and
Computer Science S.B@. thesis, April, 1982.@:  Available as M.I.T@.
Laboratory for Computer Science Technical Memorandum, TM-217, April,
1982.

DaCruz, F@. and Catchings, B@., "KERMIT:@:  A File-Transfer Protocol
for Universities," @i[BYTE 9], 6 (June, 1984) pp. 255-278.

Konopelski, Louis J@., "Implementing Internet Remote Login on a
Personal Computer," Massachusetts Institute of Technology,
Department of Electrical Engineering and Computer Science S.B@. thesis,
December, 1982.@:  Available as M.I.T@. Laboratory for Computer Science
Technical Memorandum, TM-233, December, 1982.

Reid, L@. R@., and Karlton, P@. L@., "A File System Supporting
Cooperation Between Programs," @i[ACM Operating Systems Review 17], 5,
October, 1983.  pp@. 20-29.

Romkey, J@. L@., "IBM PC Network Programmer's Manual," M.I.T@.
Laboratory for Computer Science, December, 1984.


@end[enumerate]

@pagefooting[]
@style[justification no]
@style[leftmargin 15 chars]
@style[rightmargin 15 chars]
@blankspace[5 lines]

Figure captions for "The Desktop Computer as a Network Participant,"
by Saltzer, J.H., Clark, D.D., Romkey, R.L., and Gramlich, W.L.
@blankspace[4 lines]

@style[indent 0 chars]


Figure 1:@:  PCIP Protocol Hierachy

@blankspace[10 lines]

Figure 2:@:  PC/Telnet call organization.  The downcall flow, on the 
left, initializes static procedure variables for later upcalls.  The 
identifier "tcp_type" is a constant set to the protocol-specified
value that distinguishes TCP packets from other kinds.  When a
packet arrives from the network it activates the upcall flow on the
right.

@pagefooting[]
@style[justification no]
@style[rightmargin 15 chars]
@style[leftmargin 15 chars]
@blankspace[4 lines]
@flushleft[
Jerome H. Saltzer:  Biographical Note]
@blankspace[3 lines]


     Jerome H.  Saltzer was born in Nampa, Idaho, on October 9,
1939.@:  He received the degrees of S.B@. in 1961, S.M@. in 1963,
and Sc.D@. in 1966, from the Massachusetts Institute of Technology,
all in Electrical Engineering.

     Since 1966, he has been a faculty member of the Department of
Electrical Engineering and Computer Science, M.I.T., where he has
been active in the formulation of the undergraduate curriculum in
Computer Science, including development of the core subject on the
engineering of computer systems.  At the M.I.T@. Laboratory for
Computer Science he participated in the refinement of the
Compatible Time-Sharing System (CTSS) and then was involved in
all aspects of the design and implementation of the Multiplexed
Information and Computing Service (Multics).  More recently,
his research activities have involved the connection of computers
with communication systems, including the design of a token-passing ring
local network, exploration of the problems of high-performance
protocol implementations, and interenterprise connection.  He is
Technical Director of M.I.T@. Project Athena, using networked
desktop computers to improve undergraduate science and engineering
education.

     Professor Saltzer is a Fellow of IEEE, and a member of
ACM, AAAS, Sigma Xi, Eta Kappa Nu, and Tau Beta Pi.
@begin[format]


Title:           Professor of Computer Science and Engineering
                 M.I.T@. Department of Electrical Engineering
                   and Computer Science

Address:         Room NE43-513
                 M.I.T@. Laboratory for Computer Science
                 545 Technology Square
                 Cambridge, Mass.  02139

Telephone:       (617) 253-6016

Last updated:    February, 1985
@end[format]

@pagefooting[]
@style[justification no]
@style[rightmargin 15 chars]
@style[leftmargin 15 chars]
@blankspace[4 lines]
@flushleft[
David D. Clark:  Biographical Note]
@blankspace[3 lines]

Dr. David D@. Clark, Senior Research Scientist, at the M.I.T@.
Laboratory for Computer Science received the B.S.E.E@. degree with
distinction from Swarthmore College, Swarthmore, PA in 1966, the
S.M@. and E.E@. degrees, in 1968, and the Ph.D@. degree in 1973,
from the Massachusetts Institute of Technology, Cambridge, MA.

Since 1967 he has been associated with the Laboratory for Computer
Science at M.I.T@., where he is currently a Senior Research
Scientist in the Distributed Computer Systems group.  His first
activities at M.I.T@. were a variety of projects related to the
development of the Multics operating system, including design of an
I/O system and a programming language for system implementation.  He
has been a participant in the development of the ARPANET during the
last ten years, and is currently chief protocol architect for the
DARPA Internet project, which is developing standards for Department
of Defense networking.  His current research effort is the
development of efficient network protocols and computer operating
systems suitable for use with high speed local area networks.

Dr. Clark is a member of the IEEE, ACM, and Sigma Xi.


@pagefooting[]
@style[justification no]
@style[rightmargin 15 chars]
@style[leftmargin 15 chars]
@blankspace[4 lines]
@flushleft[
John Romkey:  Biographical Note]
@blankspace[3 lines]

Mr. John Romkey is currently a senior in the Electrical Engineering 
and Computer Science Department at the Massachusetts Institute
of Technology and expects to receive his B.S@. in June, 1985.  He
began working on a TCP/IP implementation with Professor Jerome
Saltzer in January, 1982, and is currently working on a distributed
mail system with Dr. David Clark.

@pagefooting[]
@style[justification no]
@style[rightmargin 15 chars]
@style[leftmargin 15 chars]
@blankspace[4 lines]
@flushleft[
Wayne C. Gramlich:  Biographical Note]
@blankspace[3 lines]

Mr. Wayne C@. Gramlich received his B.S@. degree in Electrical
Engineering and Mathematics, with honors, in 1979 from
Carnegie-Mellon University.  He also received his M.S@. degree from
Carnegie-Mellon in Electrical Engineering with the Computer
Engineering Option.

In 1979 Mr. Gramlich entered the graduate school at the
Massachusetts Institute of Technology working towards a Ph.D@. in
ELectrical Engineering and Computer Science.  His work has been in
the areas of computer networks, personal computers, and distributed
systems.  Mr. Gramlich's unfinished Ph.D@. dissertation is entitled
"Checkpoint Debugging."

Mr. Gramlich currently works for Sun Microsystems in the area of
programming environments.

His hobbies include programming his personal computer, building
electronic projects, reading science fiction, woodworking,
metalworking and amateur rocketry.

@pagefooting[]
@style[leftmargin 15 chars]
@style[rightmargin 15 chars]
@blankspace[8 lines]
@center[
Authors and Affiliations]

@begin[format]

@flushleft[
Professor Jerome H. Saltzer
	Massachusetts Institute of Technology
	Laboratory for Computer Science
	545 Technology Square, NE43-513
	Cambridge, MA  02139

Dr. David D. Clark
	Massachusetts Institute of Technology
	Laboratory for Computer Science
	545 Technology Square, NE43-540
	Cambridge, MA  02139

Mr. John L. Romkey
	Massachusetts Institute of Technology
	Laboratory for Computer Science
	545 Technology Square, NE43-503	
	Cambridge, MA  02139

Mr. Wayne C. Gramlich
	Address when the work reported in this paper was reported:
	Massachusetts Institute of Technology
	Laboratory for Computer Science
	545 Technology Square, NE43-511
	Cambridge, MA  02139	

	Current address:	
	Sun Microsystems, Inc.
	2550 Garcia Avenue
	Mountain View, CA  94043]


@end[format]