How Athena Works ---------------- This document gives an overview of how Athena works from an administrative perspective. The focus is on defining important terms and principles of operation, not on documenting internals or specifics. The intended audience is new IS hires or people who are curious about how Athena works. Contents: 1. Kerberos 2. AFS 3. Hesiod 4. Lockers 5. Login sequence 6. X logins 7. Workstation self-maintenance 8. Install 9. Update 10. mkserv 11. Moira 12. Mail infrastructure 13. Larvnet 14. Athinfo 15. Software License Wrapper 1. Kerberos Many Athena services use a security system called Kerberos. Kerberos can be thought of as a service for negotiating shared secrets between unfamiliar parties. A central server called a KDC (Key Distribution Center) has a pre-shared secret with each user and with each service. The secrets shared with users are conventionally called "passwords"; the secrets shared with services are conventionally called "keytabs" (or "srvtabs", in older jargon). Together, users and services are called "principals". When one principal requests to negotiate a shared key with another principal, the KDC makes up a random new key (called a "session key"), encrypts it once in each principal's key (along with a bunch of other information), and sends both pieces of ciphertext back to the first principal, which will in turn send the appropriate part to the second principal when it is ready to talk. Since both principals can get at the session key by decrypting their bit of ciphertext, they now have a shared secret which they can use to communicate securely. Kerberos clients record these bits of information in "credential caches" (or "ticket files" in older jargon; neither term is particularly correct since the file is not strictly a cache and stores more than just tickets). There are two versions of the Kerberos protocol in use on Athena, 4 and 5. The Kerberos 5 protocol supports more features and different types of cryptographic algorithms, but is also a great deal more complicated. See http://web.mit.edu/kerberos/www for more complete and precise information about Kerberos. Athena services which use Kerberos include AFS, discuss, zephyr, olc, moira, and remote login and FTP (when both parties support it). 2. AFS Athena workstations use a filesystem called AFS. Running AFS allows workstations to access files under the /afs hierarchy. Of particular interest are the MIT parts of this hierarchy: /afs/athena.mit.edu, /afs/dev.mit.edu, /afs/sipb.mit.edu, /afs/net.mit.edu, /afs/ops.mit.edu, and /afs/zone.mit.edu. Unlike NFS, AFS includes two layers of indirection which shield a client from having to know what hostname a file resides on in order to access it. The first layer of indirection is "cells", such as athena.mit.edu. Each workstation has a directory of cells in /usr/vice/etc/CellServDB, which it can use to look up the database servers for a cell name. If a cell's database servers change, each client's CellServDB has to be updated, but the canonical paths to files in that cell do not change. Athena workstations update their CellServDB files periodically (at boot and reactivate time) from /afs/athena.mit.edu/service/CellServDB. The second layer of indirection is the volume location database, or VLDB. Each AFS cell's contents are divided into named volumes of files which are stored together; volumes refer to other volumes using mountpoints within their directory structure. When a client wishes to access a file in a volume, it uses the VLDB servers to find out which file server the volume lives on. Volumes can move around from one file server to another and clients will track them without the user noticing anything other than a slight slowdown. AFS has several advantages over traditional filesystems: * Volumes can be moved around between servers without causing an outage. * Volumes can be replicated so that they are accessible from several servers. (Only read-only copies of a volume can be replicated; read/write replication is a difficult problem.) * It is more secure than traditional NFS. (Secure variants of NFS are not widely implemented outside of Solaris.) * AFS clients cache data, reducing load on the servers and improving access speed in some cases. * Permissions can be managed in a (not strictly) more flexible manner than in other filesystems. AFS has several unusual properties which sometimes causes software to behave poorly in relationship to it: * AFS uses a totally different permissions system from most other Unix filesystems; instead of assigning meanings to a file's status bits for the group owner and the world, AFS stores an access control list in each directory and applies that list to all files in the directory. As a result, programs that copy files and directories will usually not automatically copy the permissions along with them, and programs that use file status bits to determine in advance whether they have permission to perform an operation will often get the wrong answer. * It is not possible to make a hard link between files in two different AFS directories even if they are in the same volume, so programs which try to do so will fail. * It is possible to lose permissions on an AFS file because of changing ACLs or expired or destroyed tokens. This is not possible for a local filesystem and some programs don't behave gracefully when it happens in AFS. * It is possible for close() to fail in AFS for a file which was open for writing, either because of reaching quota or because of lost permissions. This is also not possible for a local filesystem. * AFS is a lot slower than local filesystem access, so software which peforms acceptably on local disk may not perform acceptably when run out of AFS. Some software may even perform unacceptably simply because a user's home directory is in AFS, even though the software itself comes from local disk. AFS uses Kerberos 4 to authenticate. Since it is not reasonable for AFS kernel code to read Kerberos credential caches directly, AFS-specific credentials are stored into the kernel as "tokens". The kernel looks up tokens using a "process authentication group" or PAG, which is stored in the user's group list. If there is no PAG in the user's group list, the kernel falls back to looking up tokens by uid, which would mean that two separate logins would use the same tokens and that a user who does an "su" no longer uses the same tokens. Athena workstations do their best to ensure that each login gets a fresh PAG. See http://www.openafs.org/ for more information about AFS. 3. Hesiod Hesiod is a simple string lookup service built on top of the Domain Name System. Conceptually, the service translates a pair of strings (the "name" and "type") into a set of result strings. This lookup is done very simply; a DNS lookup is done for name.type.ns.athena.mit.edu and the strings in the resulting TXT records are returned. Athena uses Hesiod to store user account information (see section 6), locker information (see section 4), post office box information (see section 11), workstation cluster information (see section 7), and printer information. 4. Lockers Athena imposes a filesystem-independent layer of indirection on file storage called "lockers". Because most Athena lockers currently live in AFS, lockers may seem a little inconvenient and pointless, but the concept may come in handy if Athena ever moves to a different filesystem. Operationally, a locker is represented by a Hesiod entry with type "filsys". The value of the filsys record is a string which usually looks like "AFS ", where AFS is the filesystem type, is the AFS path of the locker, determines whether tokens are desirable or required for the locker, determines where the locker should appear on the local workstation, and is used to order filsys entries when there is more than one. If the filesystem type is something other than AFS, different fields may be present. Users can make lockers visible on an Athena workstation using the setuid "attach" program. The "add" alias from the standard Athena dotfiles attaches a locker and places the appropriate binary and manual directories in the user's PATH and MANPATH. A loose convention, documented in the lockers(7) man page, governs how software lockers should be organized. Not all lockers are for software; in particular, user home directories are also lockers, and generally do not contain any software. Some platforms (Solaris and IRIX at the time of this writing) refer to lockers for most of their operating system and Athena software. The mountpoints for these lockers are /os for operating system software and /srvd for Athena software. On these machines, a third locker mounted on /install contains material used during installs and updates. 5. Login sequence The Athena login sequence is very different from the standard Unix login sequence; as such, Athena workstations use a special Athena login program for console logins, X logins, and for remote logins via rlogin, telnet, or ssh. Here are the extra steps performed: * User authorization is different. /etc/athena/access is consulted (see access(5)); if that file does not exist, users are allowed to log in if they have entries in the local passwd file, or if they are performing a local login and /etc/nocreate does not exist, or if they are performing a remote login and /etc/noremote and /etc/nocreate both do not exist. * If a password is used to log in, the password is checked against Kerberos as well as against the local shadow passwd file. The user is considered authenticated if either check succeeds. This step also acquires Kerberos credentials for the user if possible. * The user's account information is looked up via Hesiod and, if necessary, a local account is created by adding entries to /etc/passwd, /etc/shadow, and /etc/group. Files under /var/athena/sessions keep track of these modifications so that they can be undone at logout time. * The user is placed in an AFS PAG. * The user's locker is attached. Most of the above steps are not performed if the user is listed as having a "local account" (the L bit) in /etc/athena/access. 6. X logins Athena uses the following components to implement the X login scheme: * /etc/athena/dm runs out of /etc/inittab and controls the other components of the system, including the X server. It pretty much just runs the commands specified in /etc/athena/login/config at the appropriate times. When dm exits, init will restart it, starting the login process over again. * /etc/athena/console displays cryptic system console output in a window so that it can confuse users in its own little part of the screen instead of getting blatted all over the display or ignored completely. * /etc/athena/xlogin puts up the window to read a username and password and performs the actual login. * /etc/athena/login/Xsession is the script which runs the user's X session. 7. Workstation self-maintenance Athena workstations perform a variety of regular tasks to help ensure smooth operation. These tasks run at boot time, out of reactivate (a script run every several minutes by xlogin when no one is logged in), or out of cron. The maintenance done at boot time and at reactivate time are very similar, and include: * /etc/{passwd,shadow,group} are replaced by /etc/{passwd,shadow,group}.local if they exist, in case any leftover local account information is present. This means any permanent changes to those files must be reflected in the .local versions or they will be undone. The Athena passwd(1) command takes care of this for local password changes. * The AFS CellServDB and trusted cells files are updated from /afs/athena.mit.edu/service. * Any attached lockers are detached (unless they are "locked" using the -L flag to attach). * Cluster information is retrieved from Hesiod for the machine. This information controls where the machine looks for system software (if it uses a srvd) and updates, and also controls the default printer for the machine. This information is recorded in /var/athena/clusterinfo and /var/athena/clusterinfo.bsh. * On platforms which use a srvd, the srvd lockers are attached. Cluster information is required for this. * A bunch of files are deleted if they exist, such as the emacs superlock file and the account session records. * On public workstations, a verification of the operating system and Athena sotware is performed. How this is done varies from platform to platform. At reactivate time, only a verification of the Athena software is performed, and on Linux not even that part is done. * On public workstations of platforms which use a srvd, a list of configuration files specified in /usr/athena/lib/update/configfiles is copied in from /srvd or /os. * The workstation checks for a new release and possibly runs an automatic update. * At boot time only, the system time is reset to the current time according to time.mit.edu. (The xntp daemon, AFS, and a cron job all work to keep this time in sync while the machine is up.) The following maintenance tasks run out of cron: * Temporary directories are cleaned up according to how recently files and directories have been modified and accessed. (Somtimes important files in /tmp go away because of this cron job if a person stays logged in for days at a time.) * An attempt is made at delivering any queued mail messages once an hour. * The workstation's time is reset if it has drifted by more than 60 seconds. * A local copy is made of the netscape part of the infoagents locker so that netscape can be started up more quickly. Because these tasks impose load on the network, they are desynchronized based on the workstation's hostname using the /etc/athena/desync program. 8. Install Each Athena platform installs using completely different code. Here is the basic pattern, though: * If the hardware does not have adequate loader support in the prom monitor (as on the Linux platform and on some of the older Suns), a loader is booted off a floppy. * A kernel is retrieved via tftp. * Filesystems are mounted via NFS. On Linux, this step is skipped; the install filesystem is retrieved along with the kernel in the tftp step. * The user is asked (with a timeout) whether he or she wants to perform a custom install, is allowed to partition the disk manually, etc.. The disk is partitioned. * AFS is mounted and control is transferred to a script in AFS (in the srvd, on platforms which use one). On Linux, this actually happens before the previous step. * The operating system and Athena software components are installed onto the target disk. Logs of the installer output are placed in /var/athena/install.log once the machine is installed. 9. Update Athena versions have the form X.Y.Z, where X is the major version, Y is the minor version, and Z is the patch version. When the major or minor version of the Athena software is bumped, it is called a "full release"; if only the patch version changes, it is called a "patch release." The update process varies greatly between platforms which use a srvd and platforms which do not. On both types of platforms, the command "update_ws -a", run from either the boot or reactivate scripts, checks for an update, or the machine administrator runs "update_ws" from a root console login. For platforms which require a srvd, the general procedure is: * /srvd/.rvdinfo is consulted to see if a new version is available. The last line of /etc/athena/version records the current version. * A variety of checks are performed (whether there is enough disk space, whether the machine should wait until later for desynchronization, whether the machine takes auto-updates, etc.) to make sure the update should actually take place. * Assuming the update is actually going to take place, most services on the machine are shut down. If the update is running out of reactivate, dm is told to shut down and sleep forever so that the X server isn't running. * Update scripts for versions between the current and new version are run. These scripts specify what sorts of update steps need to take place and what configuration files have been updated, and they can also run commands if something needs to be done once during a particular update. * The configuration files which have changed according to the update scripts are updated from the srvd. On public machines, all configuration files specified in /usr/athena/lib/update/configfiles are copied in. * If there are any new /etc/athena/rc.conf variables in the /srvd copy, they are added. On public machines, /etc/athena/rc.conf is rewritten based on the /srvd version with the host-specific variables preserved. * If there are traumatic OS changes to be performed, a "miniroot" is created in the swap partition (sometimes the machine may need to be rebooted in order to shut up swapping; /etc/init.d/finish-update takes care of this) and the machine is rebooted into the miniroot. * The OS software is updated if necessary. On IRIX, this is done using inst; on Solaris, this is done using pkgadd and patchadd. * The Athena software is updated using the "track" command. * If this is an update to a new version of the OS, the workstation reboots before running the last stage of the update. * mkserv is run to repeat any scripted workstation customizations. See section 10. On srvd-using platforms, full release updates are desynchronized by getcluster, and patch release updates are desynchronized by update_ws using the /etc/athena/desync program. On Linux, the update process is simpler: * The control file (determined by the SYSPREFIX and SYSCONTROL cluster variables) is consulted to see if a new version is available. The last line of /etc/athena/version records the current version. * Checks are performed to see if an update should really occur as in the srvd case. * Most running services and dm are shut down as in the srvd case. * The list of RPMs for the current Athena version (stored in /var/athena/release-rpms) is compared against the list of RPMs for the new version, and RPMs are added, removed, or updated according to the rules documented in update_ws(8). * mkserv is run as in the srvd case. Logs of the update process are kept in /var/athena/update.log for all platforms. 10. mkserv mkserv allows scripted customizations to be performed and then re-performed at each update. It is harder to write a script to perform a customization than to simply do it, but this architecture ensures that customizations will not be reversed by updates. The mkserv software lives in the mkserv locker (source in the mkservdev locker). The most common mkserv scripts are located in /mit/mkserv/services/X.Y, where X.Y are the major and minor Athena release numbers. A mkserv script is composed of four files: * servicename.sync, which specifies which Athena software to copy onto local disk. This file is irrelevant on Linux, where all Athena software lives on local disk. * servicename.add, which is run when the service is first added, and at each update. The script must be "idempotent," i.e. running the script multiple time should have the same effect as running it once. * servicename.del, which reverses any customizations made by the .add script. * servicename.dep, which lists the dependencies of the service. mkserv also runs the script /var/server/.private at update time, for machine-specific customizations. 11. Mail infrastructure To send mail, Athena machines use a mostly unmodified version of sendmail. Outgoing mail is sent through the MIT mailhubs, although it may be queued temporarily on local workstations if the MIT mailhubs won't accept it immediately. When mail is received by an MIT mailhub for username@mit.edu, it is normally delivered to a storage area on a PO server. PO servers can speak either IMAP (see RFC 2060) or a modified version of the POP protocol (see RFC 1725) which uses Kerberos 4 instead of passwords to authenticate. The supported Athena mail client is a modified version of nmh which uses KPOP to retrieve mail and store it in files in the user's home directory. Many users use alternative mail programs; most use KPOP and store into the user's homedir in some format. Some users use netscape, which speaks IMAP using SSL and which generally leaves mail on the PO server so that it can be accessed from non-Athena machines. 12. Moira Moira is a database and primary information repository for: * Workstation cluster information * Locker filsys entries, quotas, and server locations * Lists, which compasses mailing lists and filesystem access groups * Host and network configuration * Kerberized NFS server configuration * Printer configurations * User information * Zephyr ACLs * "Generic ACLs" which can be used by any service which can be made to understand the ACL file format and probably a few other things. Production systems never (at least, ideally) retrieve information from moira as part of regular operation; instead, a periodic process called a DCM (Data Control Manager) pushes out new versions of information from the Moira database to the affected servers. For instance, the Hesiod DNS servers are periodically updated with a new zone file containing new cluster, filsys, printer, and user information. Currently, the DCM runs several times a day. A few kinds of changes to the Moira database are propagated immediately to the affected servers via incremental update; an example is changes to AFS groups resulting from changes to Moira list membership. The Moira server is implemented as an Oracle database with a surrounding layer of C code. The Moira clients for Unix live in the moira locker (the Athena release contains scripts which attach the moira locker and run the actual programs), and use Kerberos 4 to authenticate to the Moira server. 13. Larvnet Larvnet is the cluster monitoring system which gathers the data returned by the "cview" command--a list of free machines of each type in the Athena clusters, and a list of the number of current jobs pending on Athena printers. When a user logs in or out of an Athena machine, or when an Athena machine starts up the login system, the machine sends a status packet to the Larvnet server. The status packet gives the machine's name, host type, and an determination of whether any user is logged into the machine at the console. Workstations can also be queried for the same status information using the "busyd" UDP service, which runs out of inetd. The Larvnet server separates machine names into clusters according to a configuration file and produces a data file once per minute containing counts of the free machines in each cluster of each type. The Larvnet server also queries the print spooler for each Athena printer once per minute, using an "lpq" query. (Sadly, the output returned by "lpq" is not standardized well enough to be robustly machine-readable, so the mechanism here sometimes requires maintenance when changes are made to the printing system.) 14. Athinfo Athinfo is a TCP service which runs out of inetd on Athena machines. It allows anyone to remotely run one of a specified set of named commands and view the output. "athinfo machinename queries" will generally give a list of the commands which can be run. 15. Software License Wrapper Some of the commercial third-party software used at MIT is "license-wrapped". This means the binary which lives in the locker has been corrupted by DES-encrypting a chunk of the binary in some key. A front-end script invokes a program which contacts the slw server to retrieve the key which can decrypt the chunk of the binary so that it can be run. The server will refuse the request for the key if the client machine does not appear to be on an MIT network. The license software currently lives in the slw locker, although it may move into the Athena release. Possible future topics: zephyr, discuss, printing, lert/gmotd, olc