How Athena Works
----------------

This document gives an overview of how Athena works from an
administrative perspective.  The focus is on defining important terms
and principles of operation, not on documenting internals or
specifics.  The intended audience is new IS hires or people who are
curious about how Athena works.

Contents:

1. Kerberos
2. AFS
3. Hesiod
4. Lockers
5. Login sequence
6. X logins
7. Workstation self-maintenance
8. Install
9. Update
10. mkserv
11. Moira
12. Mail infrastructure
13. Larvnet
14. Athinfo
15. Software License Wrapper

1. Kerberos

Many Athena services use a security system called Kerberos.  Kerberos
can be thought of as a service for negotiating shared secrets between
unfamiliar parties.

A central server called a KDC (Key Distribution Center) has a
pre-shared secret with each user and with each service.  The secrets
shared with users are conventionally called "passwords"; the secrets
shared with services are conventionally called "keytabs" (or
"srvtabs", in older jargon).  Together, users and services are called
"principals".

When one principal requests to negotiate a shared key with another
principal, the KDC makes up a random new key (called a "session key"),
encrypts it once in each principal's key (along with a bunch of other
information), and sends both pieces of ciphertext back to the first
principal, which will in turn send the appropriate part to the second
principal when it is ready to talk.  Since both principals can get at
the session key by decrypting their bit of ciphertext, they now have a
shared secret which they can use to communicate securely.  Kerberos
clients record these bits of information in "credential caches" (or
"ticket files" in older jargon; neither term is particularly correct
since the file is not strictly a cache and stores more than just
tickets).

There are two versions of the Kerberos protocol in use on Athena, 4
and 5.  The Kerberos 5 protocol supports more features and different
types of cryptographic algorithms, but is also a great deal more
complicated.

See http://web.mit.edu/kerberos/www for more complete and precise
information about Kerberos.  Athena services which use Kerberos
include AFS, discuss, zephyr, olc, moira, and remote login and FTP
(when both parties support it).

2. AFS

Athena workstations use a filesystem called AFS.  Running AFS allows
workstations to access files under the /afs hierarchy.  Of particular
interest are the MIT parts of this hierarchy: /afs/athena.mit.edu,
/afs/dev.mit.edu, /afs/sipb.mit.edu, /afs/net.mit.edu,
/afs/ops.mit.edu, and /afs/zone.mit.edu.

Unlike NFS, AFS includes two layers of indirection which shield a
client from having to know what hostname a file resides on in order to
access it.  The first layer of indirection is "cells", such as
athena.mit.edu.  Each workstation has a directory of cells in
/usr/vice/etc/CellServDB, which it can use to look up the database
servers for a cell name.  If a cell's database servers change, each
client's CellServDB has to be updated, but the canonical paths to
files in that cell do not change.  Athena workstations update their
CellServDB files periodically (at boot and reactivate time) from
/afs/athena.mit.edu/service/CellServDB.

The second layer of indirection is the volume location database, or
VLDB.  Each AFS cell's contents are divided into named volumes of
files which are stored together; volumes refer to other volumes using
mountpoints within their directory structure.  When a client wishes to
access a file in a volume, it uses the VLDB servers to find out which
file server the volume lives on.  Volumes can move around from one
file server to another and clients will track them without the user
noticing anything other than a slight slowdown.

AFS has several advantages over traditional filesystems:

	* Volumes can be moved around between servers without causing
	  an outage.

	* Volumes can be replicated so that they are accessible from
	  several servers.  (Only read-only copies of a volume can be
	  replicated; read/write replication is a difficult problem.)

	* It is more secure than traditional NFS.  (Secure variants of
	  NFS are not widely implemented outside of Solaris.)

	* AFS clients cache data, reducing load on the servers and
	  improving access speed in some cases.

	* Permissions can be managed in a (not strictly) more flexible
	  manner than in other filesystems.

AFS has several unusual properties which sometimes causes software to
behave poorly in relationship to it:

	* AFS uses a totally different permissions system from most
	  other Unix filesystems; instead of assigning meanings to a
	  file's status bits for the group owner and the world, AFS
	  stores an access control list in each directory and applies
	  that list to all files in the directory.  As a result,
	  programs that copy files and directories will usually not
	  automatically copy the permissions along with them, and
	  programs that use file status bits to determine in advance
	  whether they have permission to perform an operation will
	  often get the wrong answer.

	* It is not possible to make a hard link between files in two
	  different AFS directories even if they are in the same
	  volume, so programs which try to do so will fail.

	* It is possible to lose permissions on an AFS file because of
	  changing ACLs or expired or destroyed tokens.  This is not
	  possible for a local filesystem and some programs don't
	  behave gracefully when it happens in AFS.

	* It is possible for close() to fail in AFS for a file which
	  was open for writing, either because of reaching quota or
	  because of lost permissions.  This is also not possible for
	  a local filesystem.

	* AFS is a lot slower than local filesystem access, so
	  software which peforms acceptably on local disk may not
	  perform acceptably when run out of AFS.  Some software may
	  even perform unacceptably simply because a user's home
	  directory is in AFS, even though the software itself comes
	  from local disk.

AFS uses Kerberos 4 to authenticate.  Since it is not reasonable for
AFS kernel code to read Kerberos credential caches directly,
AFS-specific credentials are stored into the kernel as "tokens".  The
kernel looks up tokens using a "process authentication group" or PAG,
which is stored in the user's group list.  If there is no PAG in the
user's group list, the kernel falls back to looking up tokens by uid,
which would mean that two separate logins would use the same tokens
and that a user who does an "su" no longer uses the same tokens.
Athena workstations do their best to ensure that each login gets a
fresh PAG.

See http://www.openafs.org/ for more information about AFS.

3. Hesiod

Hesiod is a simple string lookup service built on top of the Domain
Name System.  Conceptually, the service translates a pair of strings
(the "name" and "type") into a set of result strings.  This lookup is
done very simply; a DNS lookup is done for name.type.ns.athena.mit.edu
and the strings in the resulting TXT records are returned.

Athena uses Hesiod to store user account information (see section 6),
locker information (see section 4), post office box information (see
section 11), workstation cluster information (see section 7), and
printer information.

4. Lockers

Athena imposes a filesystem-independent layer of indirection on file
storage called "lockers".  Because most Athena lockers currently live
in AFS, lockers may seem a little inconvenient and pointless, but the
concept may come in handy if Athena ever moves to a different
filesystem.

Operationally, a locker is represented by a Hesiod entry with type
"filsys".  The value of the filsys record is a string which usually
looks like "AFS <pathname> <mode> <mountpoint> <pref>", where AFS is
the filesystem type, <pathname> is the AFS path of the locker, <mode>
determines whether tokens are desirable or required for the locker,
<mountpoint> determines where the locker should appear on the local
workstation, and <pref> is used to order filsys entries when there is
more than one.  If the filesystem type is something other than AFS,
different fields may be present.

Users can make lockers visible on an Athena workstation using the
setuid "attach" program.  The "add" alias from the standard Athena
dotfiles attaches a locker and places the appropriate binary and
manual directories in the user's PATH and MANPATH.

A loose convention, documented in the lockers(7) man page, governs how
software lockers should be organized.  Not all lockers are for
software; in particular, user home directories are also lockers, and
generally do not contain any software.

Some platforms (Solaris and IRIX at the time of this writing) refer to
lockers for most of their operating system and Athena software.  The
mountpoints for these lockers are /os for operating system software
and /srvd for Athena software.  On these machines, a third locker
mounted on /install contains material used during installs and
updates.

5. Login sequence

The Athena login sequence is very different from the standard Unix
login sequence; as such, Athena workstations use a special Athena
login program for console logins, X logins, and for remote logins via
rlogin, telnet, or ssh.  Here are the extra steps performed:

	* User authorization is different.  /etc/athena/access is
	  consulted (see access(5)); if that file does not exist,
	  users are allowed to log in if they have entries in the
	  local passwd file, or if they are performing a local login
	  and /etc/nocreate does not exist, or if they are performing
	  a remote login and /etc/noremote and /etc/nocreate both do
	  not exist.

	* If a password is used to log in, the password is checked
	  against Kerberos as well as against the local shadow passwd
	  file.  The user is considered authenticated if either check
	  succeeds.  This step also acquires Kerberos credentials for
	  the user if possible.

	* The user's account information is looked up via Hesiod and,
	  if necessary, a local account is created by adding entries
	  to /etc/passwd, /etc/shadow, and /etc/group.  Files under
	  /var/athena/sessions keep track of these modifications so
	  that they can be undone at logout time.

	* The user is placed in an AFS PAG.

	* The user's locker is attached.

Most of the above steps are not performed if the user is listed as
having a "local account" (the L bit) in /etc/athena/access.

6. X logins

Athena uses the following components to implement the X login scheme:

	* /etc/athena/dm runs out of /etc/inittab and controls the
	  other components of the system, including the X server.  It
	  pretty much just runs the commands specified in
	  /etc/athena/login/config at the appropriate times.  When dm
	  exits, init will restart it, starting the login process over
	  again.

	* /etc/athena/console displays cryptic system console output
	  in a window so that it can confuse users in its own little
	  part of the screen instead of getting blatted all over the
	  display or ignored completely.

	* /etc/athena/xlogin puts up the window to read a username and
	  password and performs the actual login.

	* /etc/athena/login/Xsession is the script which runs the
	  user's X session.

7. Workstation self-maintenance

Athena workstations perform a variety of regular tasks to help ensure
smooth operation.  These tasks run at boot time, out of reactivate (a
script run every several minutes by xlogin when no one is logged in),
or out of cron.

The maintenance done at boot time and at reactivate time are very
similar, and include:

	* /etc/{passwd,shadow,group} are replaced by
	  /etc/{passwd,shadow,group}.local if they exist, in case any
	  leftover local account information is present.  This means
	  any permanent changes to those files must be reflected in
	  the .local versions or they will be undone.  The Athena
	  passwd(1) command takes care of this for local password
	  changes.

	* The AFS CellServDB and trusted cells files are updated from
	  /afs/athena.mit.edu/service.

	* Any attached lockers are detached (unless they are
	  "locked" using the -L flag to attach).

	* Cluster information is retrieved from Hesiod for the
	  machine.  This information controls where the machine looks
	  for system software (if it uses a srvd) and updates, and
	  also controls the default printer for the machine.  This
	  information is recorded in /var/athena/clusterinfo and
	  /var/athena/clusterinfo.bsh.

	* On platforms which use a srvd, the srvd lockers are
	  attached.  Cluster information is required for this.

	* A bunch of files are deleted if they exist, such as the
	  emacs superlock file and the account session records.

	* On public workstations, a verification of the operating
	  system and Athena sotware is performed.  How this is done
	  varies from platform to platform.  At reactivate time, only
	  a verification of the Athena software is performed, and on
	  Linux not even that part is done.

	* On public workstations of platforms which use a srvd, a
	  list of configuration files specified in
	  /usr/athena/lib/update/configfiles is copied in from
	  /srvd or /os.

	* The workstation checks for a new release and possibly runs
	  an automatic update.

	* At boot time only, the system time is reset to the current
	  time according to time.mit.edu.  (The xntp daemon, AFS, and
	  a cron job all work to keep this time in sync while the
	  machine is up.)

The following maintenance tasks run out of cron:

	* Temporary directories are cleaned up according to how
	  recently files and directories have been modified and
	  accessed.  (Somtimes important files in /tmp go away because
	  of this cron job if a person stays logged in for days at a
	  time.)

	* An attempt is made at delivering any queued mail messages
	  once an hour.

	* The workstation's time is reset if it has drifted by more
	  than 60 seconds.

	* A local copy is made of the netscape part of the infoagents
	  locker so that netscape can be started up more quickly.

Because these tasks impose load on the network, they are
desynchronized based on the workstation's hostname using the
/etc/athena/desync program.

8. Install

Each Athena platform installs using completely different code.  Here
is the basic pattern, though:

	* If the hardware does not have adequate loader support in
	  the prom monitor (as on the Linux platform and on some of
	  the older Suns), a loader is booted off a floppy.

	* A kernel is retrieved via tftp.

	* Filesystems are mounted via NFS.  On Linux, this step is
	  skipped; the install filesystem is retrieved along with the
	  kernel in the tftp step.

	* The user is asked (with a timeout) whether he or she wants
	  to perform a custom install, is allowed to partition the
	  disk manually, etc..  The disk is partitioned.

	* AFS is mounted and control is transferred to a script in AFS
	  (in the srvd, on platforms which use one).  On Linux, this
	  actually happens before the previous step.

	* The operating system and Athena software components are
	  installed onto the target disk.

Logs of the installer output are placed in /var/athena/install.log
once the machine is installed.

9. Update

Athena versions have the form X.Y.Z, where X is the major version, Y
is the minor version, and Z is the patch version.  When the major or
minor version of the Athena software is bumped, it is called a "full
release"; if only the patch version changes, it is called a "patch
release."

The update process varies greatly between platforms which use a srvd
and platforms which do not.  On both types of platforms, the command
"update_ws -a", run from either the boot or reactivate scripts, checks
for an update, or the machine administrator runs "update_ws" from a
root console login.  For platforms which require a srvd, the general
procedure is:

	* /srvd/.rvdinfo is consulted to see if a new version is
	  available.  The last line of /etc/athena/version records the
	  current version.

	* A variety of checks are performed (whether there is enough
	  disk space, whether the machine should wait until later for
	  desynchronization, whether the machine takes auto-updates,
	  etc.) to make sure the update should actually take place.

	* Assuming the update is actually going to take place, most
	  services on the machine are shut down.  If the update is running
	  out of reactivate, dm is told to shut down and sleep forever
	  so that the X server isn't running.

	* Update scripts for versions between the current and new
	  version are run.  These scripts specify what sorts of update
	  steps need to take place and what configuration files have
	  been updated, and they can also run commands if something
	  needs to be done once during a particular update.

	* The configuration files which have changed according to the
	  update scripts are updated from the srvd.  On public
	  machines, all configuration files specified in
	  /usr/athena/lib/update/configfiles are copied in.

	* If there are any new /etc/athena/rc.conf variables in the
	  /srvd copy, they are added.  On public machines,
	  /etc/athena/rc.conf is rewritten based on the /srvd version
	  with the host-specific variables preserved.

	* If there are traumatic OS changes to be performed, a
	  "miniroot" is created in the swap partition (sometimes the
	  machine may need to be rebooted in order to shut up
	  swapping; /etc/init.d/finish-update takes care of this) and
	  the machine is rebooted into the miniroot.

	* The OS software is updated if necessary.  On IRIX, this is
	  done using inst; on Solaris, this is done using pkgadd and
	  patchadd.

	* The Athena software is updated using the "track" command.

	* If this is an update to a new version of the OS, the
	  workstation reboots before running the last stage of the
	  update.

	* mkserv is run to repeat any scripted workstation
	  customizations.  See section 10.

On srvd-using platforms, full release updates are desynchronized by
getcluster, and patch release updates are desynchronized by update_ws
using the /etc/athena/desync program.

On Linux, the update process is simpler:

	* The control file (determined by the SYSPREFIX and SYSCONTROL
	  cluster variables) is consulted to see if a new version is
	  available.  The last line of /etc/athena/version records the
	  current version.

	* Checks are performed to see if an update should really occur
	  as in the srvd case.

	* Most running services and dm are shut down as in the srvd
	  case.

	* The list of RPMs for the current Athena version (stored in
	  /var/athena/release-rpms) is compared against the list of
	  RPMs for the new version, and RPMs are added, removed, or
	  updated according to the rules documented in update_ws(8).

	* mkserv is run as in the srvd case.

Logs of the update process are kept in /var/athena/update.log for all
platforms.

10. mkserv

mkserv allows scripted customizations to be performed and then
re-performed at each update.  It is harder to write a script to
perform a customization than to simply do it, but this architecture
ensures that customizations will not be reversed by updates.  The
mkserv software lives in the mkserv locker (source in the mkservdev
locker).

The most common mkserv scripts are located in
/mit/mkserv/services/X.Y, where X.Y are the major and minor Athena
release numbers.  A mkserv script is composed of four files:

	* servicename.sync, which specifies which Athena software to
	  copy onto local disk.  This file is irrelevant on Linux,
	  where all Athena software lives on local disk.

	* servicename.add, which is run when the service is first
	  added, and at each update.  The script must be "idempotent,"
	  i.e. running the script multiple time should have the same
	  effect as running it once.

	* servicename.del, which reverses any customizations made by
	  the .add script.

	* servicename.dep, which lists the dependencies of the
	  service.

mkserv also runs the script /var/server/.private at update time, for
machine-specific customizations.

11. Mail infrastructure

To send mail, Athena machines use a mostly unmodified version of
sendmail.  Outgoing mail is sent through the MIT mailhubs, although it
may be queued temporarily on local workstations if the MIT mailhubs
won't accept it immediately.

When mail is received by an MIT mailhub for username@mit.edu, it is
normally delivered to a storage area on a PO server.  PO servers can
speak either IMAP (see RFC 2060) or a modified version of the POP
protocol (see RFC 1725) which uses Kerberos 4 instead of passwords to
authenticate.

The supported Athena mail client is a modified version of nmh which
uses KPOP to retrieve mail and store it in files in the user's home
directory.  Many users use alternative mail programs; most use KPOP
and store into the user's homedir in some format.  Some users use
netscape, which speaks IMAP using SSL and which generally leaves mail
on the PO server so that it can be accessed from non-Athena machines.

12. Moira

Moira is a database and primary information repository for:

	* Workstation cluster information
	* Locker filsys entries, quotas, and server locations
	* Lists, which compasses mailing lists and filesystem access
	  groups
	* Host and network configuration
	* Kerberized NFS server configuration
	* Printer configurations
	* User information
	* Zephyr ACLs
	* "Generic ACLs" which can be used by any service which can
	  be made to understand the ACL file format

and probably a few other things.

Production systems never (at least, ideally) retrieve information from
moira as part of regular operation; instead, a periodic process called
a DCM (Data Control Manager) pushes out new versions of information
from the Moira database to the affected servers.  For instance, the
Hesiod DNS servers are periodically updated with a new zone file
containing new cluster, filsys, printer, and user information.
Currently, the DCM runs several times a day.  A few kinds of changes
to the Moira database are propagated immediately to the affected
servers via incremental update; an example is changes to AFS groups
resulting from changes to Moira list membership.

The Moira server is implemented as an Oracle database with a
surrounding layer of C code.  The Moira clients for Unix live in the
moira locker (the Athena release contains scripts which attach the
moira locker and run the actual programs), and use Kerberos 4 to
authenticate to the Moira server.

13. Larvnet

Larvnet is the cluster monitoring system which gathers the data
returned by the "cview" command--a list of free machines of each type
in the Athena clusters, and a list of the number of current jobs
pending on Athena printers.

When a user logs in or out of an Athena machine, or when an Athena
machine starts up the login system, the machine sends a status packet
to the Larvnet server.  The status packet gives the machine's name,
host type, and an determination of whether any user is logged into the
machine at the console.  Workstations can also be queried for the same
status information using the "busyd" UDP service, which runs out of
inetd.  The Larvnet server separates machine names into clusters
according to a configuration file and produces a data file once per
minute containing counts of the free machines in each cluster of each
type.

The Larvnet server also queries the print spooler for each Athena
printer once per minute, using an "lpq" query.  (Sadly, the output
returned by "lpq" is not standardized well enough to be robustly
machine-readable, so the mechanism here sometimes requires maintenance
when changes are made to the printing system.)

14. Athinfo

Athinfo is a TCP service which runs out of inetd on Athena machines.
It allows anyone to remotely run one of a specified set of named
commands and view the output.  "athinfo machinename queries" will
generally give a list of the commands which can be run.

15. Software License Wrapper

Some of the commercial third-party software used at MIT is
"license-wrapped".  This means the binary which lives in the locker
has been corrupted by DES-encrypting a chunk of the binary in some
key.  A front-end script invokes a program which contacts the slw
server to retrieve the key which can decrypt the chunk of the binary
so that it can be run.  The server will refuse the request for the key
if the client machine does not appear to be on an MIT network.

The license software currently lives in the slw locker, although it
may move into the Athena release.

Possible future topics: zephyr, discuss, printing, lert/gmotd, olc