MULTICS DESIGN DOCUMENT MDD-012-02 To: MDD Distribution From: Paul Farley Date: October 15, 1986 Subject: The I/O Interfacer (IOI) Abstract: This MDD describes the features and operations of the I/O interfacer (IOI), as well as those hardware features which make its operation possible. Revisions: REVISION DATE AUTHOR initial 85-06-01 Chris Jones 01 86-03-01 Chris Jones 02 86-10-15 Paul Farley _________________________________________________________________ Multics Design Documents are the official design descriptions of the Multics Trusted Computing Base. They are internal documents, which may be released outside of Multics System Development only with the approval of the Director. i MDD-012-02 IOI CONTENTS Page Section 1 Overview . . . . . . . . . . . . . . . . 1-1 Organization of This Document . . . . 1-1 Facilities Provided by IOI . . . . . . 1-1 | IOI at BCE . . . . . . . . . . . . . . 1-2 An Example of IOI Usage . . . . . . . 1-2 Terminology and Abbreviations . . . . 1-2 Section 2 Security . . . . . . . . . . . . . . . . 2-1 RCP and IOI Interactions . . . . . . . 2-1 Security Features of the IOM . . . . . 2-3 Section 3 IOI's Databases . . . . . . . . . . . . . 3-1 ioi_data . . . . . . . . . . . . . . . 3-1 Header . . . . . . . . . . . . . . 3-1 IOM Table . . . . . . . . . . . . . 3-2 Group Table . . . . . . . . . . . . 3-2 Channel Table . . . . . . . . . . . 3-2 Device Table . . . . . . . . . . . 3-2 Locking . . . . . . . . . . . . . . 3-3 DTE Locks . . . . . . . . . . . 3-3 GTE Locks . . . . . . . . . . . 3-4 Reconfig Lock . . . . . . . . . 3-4 io_page_tables . . . . . . . . . . . . 3-4 io_config_data . . . . . . . . . . . . 3-4 Section 4 Services Provided by IOI . . . . . . . . 4-1 Call-side Services . . . . . . . . . . 4-1 Interrupt-side Services . . . . . . . 4-1 ii IOI MDD-012-02 SECTION 1 OVERVIEW The I/O Interfacer (IOI) is the facility of the supervisor through which user programs can directly operate peripheral devices. IOI is accessed via calls to the supervisor gate ioi_. IOI is used by all user-ring programs which perform I/O to devices connected to the Input/Output Multiplexer (IOM) or | Information Multiplexer Unit (IMU) channels. Users can construct | device-specific Data Control Word (DCW) lists and call ioi_ to | initiate the I/O operation. When the operation completes, IOI provides the user with a wakeup and status on the results of the operation. The hardware protection and relocation features of the IOM are used by IOI to allow the user complete control over the DCW lists and data with no possibility of damaging the system. _O_R_G_A_N_I_Z_A_T_I_O_N _O_F _T_H_I_S _D_O_C_U_M_E_N_T This document is divided into 4 sections. The first section gives an overview of ioi_. The second section discusses the security-relevant aspects of ioi_. The third section describes ioi_ databases. The fourth section lists the call-side and interrupt-side services provided by ioi_. _F_A_C_I_L_I_T_I_E_S _P_R_O_V_I_D_E_D _B_Y _I_O_I IOI provides entries to acquire a workspace in which data buffers and channel programs are placed, to initiate an I/O operation, to read status from the completion of an I/O operation or other external event, to specify a timeout value for an operation, to specify an event channel over which information is passed concerning device interrupts, to specify a circular status queue in which information is placed concerning device interrupts, to select which of a set of IOM channels is to be used for I/O operation, and to prohibit other I/O operations from being done on a given controller (and also to remove this prohibition). These facilities are all provided via calls to entries in the ioi_ gate. 1-1 MDD-012-02 IOI In addition, IOI also cooperates with disk control in the sharing of disk channels among storage system disk volumes and user I/O disk volumes, and with the reconfiguration software in changing | the availability of devices and channels. | | | _I_O_I _A_T _B_C_E | | The many facilities of IOI are also present when Multics has been | shutdown and is in the Bootload Command Environment (BCE). In | this environment the ioi_ gate cannot be used, but the target | modules are directly callable from the BCE programs. | | Also at BCE the normal method of sending a wakeup to a user when | an operation completes cannot be done. Instead a method of | posting the completion occurs, where the originator of the | operation creates an entry in a posting segment and monitors this | entry for completion. When IOI detects that the operation is | complete it calls a module to locate and mark the entry as | complete. _A_N _E_X_A_M_P_L_E _O_F _I_O_I _U_S_A_G_E Without going into all of the details of calling sequences of the various entries (which are available elsewhere), let's look at what it takes to establish an IOI connection, perform several I/O operations, and to remove the connection. The IOI connection is established via calls to rcp_$attach (or rcp_priv_$attach) and rcp_$check_attach. Output from these calls is a device index which is used to identify the device in question to IOI. Next, a workspace is acquired by a call to ioi_$workspace, and buffers are allocated in this workspace. Space to store the status of completed I/O operations is allocated by calling ioi_$set_status, and an event channel over which to notify the user of completed I/O operations is specified by a call to ioi_$set_event. The user builds channel programs in the workspace, and initiates an I/O operation by calling ioi_$connect. When the I/O operation completes, IOI places status information in the area earlier specified and sends a wakeup over the designated event channel. This connect-wakeup sequence is repeated as many times as desired, and the IOI attachment is removed by calling rcp_$detach. _T_E_R_M_I_N_O_L_O_G_Y _A_N_D _A_B_B_R_E_V_I_A_T_I_O_N_S call side that part of IOI software that is run in response to a gate call (see also "interrupt side"). DCW Data Control Word. One word of a channel program. 1-2 IOI MDD-012-02 IDCW Instruction Data Control Word. Contains device number and instructions. interrupt that part of IOI software that is run in response to side an I/O interrupt or due to traffic-control polling (see also "call side"). IMU Information Multiplexer Unit. A replacement for the | IOM that is micro-processor controlled. Functions | very much like the IOM. Allows connection of many | of the IOM devices along with Federal Information | Processing Standard (FIPS) compliant disk and tape | devices. | IOI Input/Output Interfacer. The interface which allows for direct user control of peripherals. IOM Input/Output Multiplexer. Interfaces between memory controllers and peripheral controllers. IOPTW I/O Page Table Word. Provides logical to physical address translation for the IOM. LPW List Pointer Word. Points to current position in channel program. MPC Microprogrammable Peripheral Controller. Interfaces between IOMs and peripherals. PCW Peripheral Control Word. Contains channel number and paging information. RCP Resource Control Package. Responsible for mediating user access to peripherals. T&D Test and Diagnostic. Responsible for testing peripherals and determining causes of problems. TCB Trusted Computing Base. The mechanisms of a system responsible for enforcing a security policy. These mechanisms include hardware, firmware, and software. TDCW Transfer Data Control Word. Allows "goto"s in channel programs. wired a state of a page. A wired page is one that must stay in memory. An abs-wired page is a wired page that cannot be moved. 1-3 IOI MDD-012-02 SECTION 2 SECURITY There are two questions which must be answered when considering whether or not to allow a given ioi_ call to succeed: "Is this user allowed to use this device?" and "How can one be certain that this operation will affect only this device?". The answer to the first lies in the Resource Control Package (RCP) and the RCP/IOI interface, while the answer to the second depends on hardware features of the IOM. _R_C_P _A_N_D _I_O_I _I_N_T_E_R_A_C_T_I_O_N_S The only way to initiate an IOI attachment is via a call to admin_gate_$ioi_attach. Since admin_gate_ is a gate from ring 1 to ring 0, this entry may only be called by a program running in ring 1. The usual way this entry is called is when a user makes a call to rcp_$attach and RCP, running in ring 1, makes a call through admin_gate_ to IOI (a process logged into ring 1 could call admin_gate_ directly and bypass the RCP checks, but such a process by definition lies within the TCB). Thus a partial answer to the first question above is "The caller can use this device because RCP said it was all right". The policies RCP enforces when making this decision are detailed in another document, MDD-009. Associated with each IOI attachment is a validation level. IOI initially sets this level to the validation level of its caller (and since its caller is RCP, and since RCP's validation level at the time it makes the call to IOI is 1, all attachments start out with a validation level of 1). IOI will disallow any call through ioi_ whose caller has a validation level greater than that of the attachment. The only way to change the validation level of an attachment is by a call to admin_gate_$ioi_promote. Thus RCP can attach a tape drive in ring 1, have the operator mount the tape, validate the label of the tape, and only then allow its caller to access the tape. Similarly, a ring 2 subsystem could attach a tape and have complete access to it, secure in the knowledge that no program running in ring 4 could affect the tape. 2-1 MDD-012-02 IOI In addition, RCP passes a flag to IOI denoting whether this is to be considered a privileged attachment or not (the distinction being whether the caller called RCP via the rcp_priv_ gate or the rcp_ gate). IOI remembers this information, and uses it to disallow holders of unprivileged attachments from performing certain operations. Users who are holding privileged attachments are said to have priv attaches, or to be priv-attached. The complete list of differences between priv attaches, as well as a justification for each difference, is given below. Difference Reason A priv-attached caller This capability is required by can direct commands to some Test and Diagnostic (T&D) the controller a device software to exercise certain is on as well as to the features of the controller. device itself. Since providing this capability makes having access to the rcp_sys_ gate as privileged as having access to a controller (and since access to the controller is theoretically equivalent to access to all devices on that controller), use of this feature must be restricted. The program This entry allows a caller to ioi_suspend_devices will prevent other users who have IOI not suspend I/O on other attachments of devices on the devices on a controller same controller as the caller unless its caller has a from performing I/O operations. priv attachment. Since this is a potential denial of service, IOI requires that the caller have a priv attachment to suspend I/O. Only a priv attached This entry allows a caller to caller can successfully specify the Peripheral Control use the entry Word (PCW) to be used when I/O ioi_$connect_pcw. is initiated. Since PCWs can be used to reset a channel, the use of this entry is restricted. Only a priv-attached This entry restricts a device to caller can successfully running only on a given channel. use the entry This is necessary for some ioi_$set_channel_required. applications (such as loading firmware into a controller), but prevents IOI from using channels in an optimal manner, so its use is restricted. 2-2 IOI MDD-012-02 A process with a priv Privileged attachments are used attachment may set the by T&D software to perform timeout value on a device operations (such as tape data to a value greater than security erase) which can take the maximum allowed for a considerably longer than normal non-priv attachment. operations. Since the maximum timeout values are based on normal operations, priv-attached callers are allowed to exceed these values. The program ioi_masked Users who have priv-attached will not log I/O errors devices are often performing if the error occurred on operations which will return a device which is error statuses. Since these priv-attached. usually do not indicate real error conditions, they are not logged. _S_E_C_U_R_I_T_Y _F_E_A_T_U_R_E_S _O_F _T_H_E _I_O_M It would be impossible for IOI to work as it does without the security and protection features that the IOM provides. These features ensure that I/O operations will be performed only on the device that the user has attached, and that the only memory references that a user's channel program can generate will lie within the user's workspace segment. To ensure that a caller's channel program affects only the device that caller has attached, IOI depends on the fact that MPCs and other controllers use only the first IDCW to determine upon which device to operate.(1) The connect entrypoints of IOI are given an offset into the caller's workspace at which the first IDCW of a channel program is to be found. IOI copies this IDCW into its data structures so it cannot be modified by any program outside of ring 0.(2) IOI then validates the device number in the IDCW, _________________________________________________________________ (1) Some controllers will allow a channel program to start out with commands directed to the controller, then switch to commands directed to some other device. Since IOI will not allow a channel program to start out with a command directed to the controller unless the caller either has the controller attached or has a priv attachment, and since either of these is effectively handing over access to all of the devices attached to a controller, a caller in either of these instances is obviously highly trusted. (2) Some old controllers do not recognize TDCWs, so IOI cannot use the trick of grafting a copied version of the beginning 2-3 MDD-012-02 IOI constructs a TDCW to transfer control to the remainder of the channel program, and actually initiates the I/O operation at the copied IDCW. Since the caller cannot modify the copy of the IDCW, and since this is the one the controller will see first, the remainder of the channel program can only affect the caller's device. To ensure that a channel program can cause memory references only within the user's workspace, IOI uses the fact that all IOMs on a Multics system are run in paged mode. This means that any memory references generated by a channel program are resolved relative to an I/O page table (these page tables are conceptually similar to, but have a different format than, processor page tables). The relevant data structures are the List Pointer Word (LPW), described in the include file iom_lpw.incl.pl1, the PCW, described in the include file iom_pcw.incl.pl1, and the I/O page table word (IOPTW), described in the include file io_ptw.incl.pl1. Before a user can start an I/O operation, IOI ensures that every page of the user's workspace is in memory and wired. IOI then constructs an I/O page table that describes where in memory the pages of the workspace are (IOPTWs corresponding to non-existant workspace pages are marked invalid). If the caller's IDCW has been copied into IOI's data structures, the I/O connect operation is done in absolute mode, that is, all memory references are interpreted as absolute memory addresses. However, the TDCW which transfers control to the remainder of the channel program has control bits set which cause the channel program to run in paged mode, so all future memory references are relative to the workspace. If IOI was not allowed to copy the IDCW, the connect operation is done in paged mode, and all memory references are relative to the workspace. _________________________________________________________________ of the list to the remainder of the list. In these cases, the connect which starts the I/O operation is directed at the beginning of the list in the caller's workspace. Thus, the caller could theoretically change the device number in the IDCW after IOI had validated it. This turns out not to be a problem since these old controllers had only one device connected to them, so there is no other device which could be affected by changing the device number. 2-4 IOI MDD-012-02 SECTION 3 IOI'S DATABASES The major databases used by IOI are, in decreasing order of importance, ioi_data, io_page_tables, and io_config_data (these are defined in the aptly named include files ioi_data.incl.pl1, io_page_tables.incl.pl1, and io_config_data.incl.pl1, respectively). _I_O_I___D_A_T_A Without question, ioi_data is THE major IOI database. Every service performed by IOI either refers to data stored here, modifies it, or both. Stored here is the information IOI needs about every device, channel, and IOM defined in the system config deck. This structure consists of a header followed by four arrays. The header and the elements of each array are each a multiple of 8 words long, aligned on a 0 mod 8 word boundary. This ensures that the data in each array element occurs in a different memory block. The purpose of this is to decrease the possibility that the same memory block will be referenced by two different processors simultaneously, causing a performance degradation due to cache flushes. _H_e_a_d_e_r The ioi_data header is 16 words long. It contains the refer extents for the immediately following four arrays, as well as a hardcore fast lock for use during reconfiguration operations, a counter to meter spurious interrupts, and a flag specifying that the rest of the structure has been initialized. The header is rarely referenced during normal operation as most IOI programs try not to access the arrays in such a way that will cause the refer extents to be consulted, since this can be quite expensive for all of the arrays save the first. 3-1 MDD-012-02 IOI _I_O_M _T_a_b_l_e The IOM table is an array of ite's (Iom Table Entries), one per iom card found in the config deck. It contains information necessary to perform IOM reconfiguration, and not much else. _G_r_o_u_p _T_a_b_l_e The group table is an array of gte's (Group Table Entries), one per prph card found in the config deck. A group (usually referred to as a subsystem) consists of a collection of devices and channels, where each channel can communicate with each device. A gte contains the four character name of the subsystem (e.g. dska, mtpb), 18 bit offsets to the channel entries and device entries which comprise the subsystem, a count of the number of devices currently in use, a count of the number of devices waiting for a channel to become free, a spin lock (see "Locking" below), information that is common to all of the devices of the subsystem (e.g. how to read detailed status), and various subsystem state flags. _C_h_a_n_n_e_l _T_a_b_l_e The channel table is an array of cte's (Channel Table Entries), one for every channel defined on the prph and chnl cards in the config deck. Each cte is linked to the next cte in the subsystem's channel list (the last cte is linked to nothing) and points back to its associated gte. The cte contains the name of the channel as it is known to the outside world, to io_manager, and to disk_control if applicable. There are flags defining the state of the channel (e.g. deleted, performing I/O, etc.). Also stored here are timeout values and error status during error recovery operations. _D_e_v_i_c_e _T_a_b_l_e The device table is an array of dte's (Device Table Entries), one for every peripheral device defined on the prph cards in the config deck. The dte contains a pointer to the next dte in the subsystem's circular list of devices as well as a pointer back to the gte. The dte is the longest of the table entries and contains information related to the attachment of devices, such as the process_id of the process which has the device attached, a pointer to the workspace associated with this attachment, the event channel associated with this attachment, various parameters defining limits on workspace size and timeout values, the last error status logged for this device, where in the workspace the channel program and status queue are located, any PCW associated 3-2 IOI MDD-012-02 with the channel program, etc. There is also a hardcore fast lock (see "Locking" below). _L_o_c_k_i_n_g There are three kinds of locks that exist within ioi_data: dte locks, gte locks, and the reconfig lock. The dte locks and the reconfig lock are standard hardcore fast wait locks, while the gte locks are spin locks. DTE LOCKS The information stored in the dte can be divided into several classes: (1) those parts which are constant (at least during the attachment of this device), (2) those parts which are only to be modified when the dte lock is held, (3) those parts which are only to be modified when the dte lock is held and the flag dte.active is off, (4) those parts which are only to be modified when the gte lock is held and dte.active is on, and (5) those parts which are only to be modified when the gte lock is held. (From this one can see that there are parts of the dte which are shared between call side, which uses the dte lock, and interrupt side, which uses the gte lock, and that dte.active is the arbiter of which side gets to use these parts of the dte.) The data protected by the locks is carefully grouped into machine words in such a way that no two pieces of data protected by different locks share the same word. This prevents potential conflicts, as the PL/I compiler cannot be depended on to generate read-alter-rewrite instructions to update the data. The dte lock is a standard hardcore fast wait lock (as defined in hc_lock.incl.pl1). It is locked and unlocked via calls to lock$lock_fast and lock$unlock_fast, respectively. Associated with each dte lock is a 36 bit event ID, to be used when a potential locker must go blocked waiting for the lock to be unlocked. Although it is theoretically possible for all dte's to share the same event ID, for efficiency purposes each dte has a different event ID, consisting of the ascii characters "dv" in the upper 18 bits and the offset of the dte in ioi_data in the lower 18 bits. The dte lock is meant to protect the dte from being modified, or even examined, by two processes which could be interfering with each other. Every caller of ioi_device$get_dtep will have the dte locked to itself. Thus, any call from outside of IOI, which identifies dte's by their dtx, must call this entry, and will of necessity lock the dte from other such callers. If this were the only way to ever come up with a dtep, no interference would be possible. Of course, it isn't the only way. The other ways to come up with a dtep are to remember it from some previous call, or to iterate through the array of dte's. If a program doesn't lock the dte, then it must protect 3-3 MDD-012-02 IOI itself by locking the associated gte and touching only those parts of the dte which are not protected by this mechanism. GTE LOCKS The gte lock is a standard spin lock. It should be used only when the process attempting to lock the lock is running wired and masked. It is locked by using a "stac" instruction to place a non-zero value (by convention, the processid of the locking process) into the lock, and is unlocked by using a "stacq" instruction to place a zero into the lock if the unlocking process's processid is in the lock (and crashing the system if it is not, since this indicates a supervisor programming error or a hardware failure). RECONFIG LOCK The reconfig lock is used to ensure that only one I/O reconfiguration operation is going on at any given time. It is a standard hardcore fast wait lock. _I_O___P_A_G_E___T_A_B_L_E_S The io_page_tables data structure holds all of the page tables associated with IOI attachments, one page table per attachment. It consists of a header which has some housekeeping information, including an array which describes the state of each page table (allocated or not), followed by up to 252 page tables. The page tables themselves are simply arrays of I/O page table words (IOPTWs).(1) These page tables are either 64 or 256 words long, and are aligned on a 0 mod 64 or 256 word boundary, respectively. The page table index, which uniquely identifies a page table, is stored in the dte of the corresponding attachment. _I_O___C_O_N_F_I_G___D_A_T_A The io_config_data data structure defines the inter-connections of the I/O hardware of the system. Conceptually, it is similar to ioi_data in this way, but io_config_data is readable by processes running in rings out to ring 5, so ONLY configuration related information is stored there.(2) While io_config_data is _________________________________________________________________ (1) The format of an IOPTW is defined in io_ptw.incl.pl1. (2) ioi_data is readable only in ring 0. It contains much information that should not be made available in outer rings. 3-4 IOI MDD-012-02 kept up to date by IOI, its main use is by the reconfiguration software, which checks it to make sure that requested reconfiguration operations make sense, or to determine the configuration interconnections when performing a complex reconfiguration operation. _________________________________________________________________ Equally as important, it is impossible to change the contents of io_config_data without making a privileged call; this is not the case for ioi_data. Therefore, making ioi_data readable in an outer ring would open a gaping covert channel. 3-5 IOI MDD-012-02 SECTION 4 SERVICES PROVIDED BY IOI The services provided by IOI can be divided into two groups: call-side services and interrupt-side services. The call-side services are performed in response to a call to an entry in the ioi_ gate, while the interrupt-side services are performed while IOI is processing an interrupt from a peripheral device, or in response to traffic-control polling. _C_A_L_L_-_S_I_D_E _S_E_R_V_I_C_E_S The call-side services of IOI can be further divided into five categories: connecting, reading status, manipulating the workspace, setting parameters, and suspending and releasing other devices. All of the call-side IOI services have a parameter list which begins with a device index (obtained from the the RCP attach call) which identifies this attachment and which ends with an error code which tells the caller the results of the call. _I_N_T_E_R_R_U_P_T_-_S_I_D_E _S_E_R_V_I_C_E_S The interrupt-side services of IOI are: placing status in the user's status queue, sending a wakeup over the user's event channel, starting pending I/O operations, and handling lost interrupts. Since the interrupt-side IOI services are not done in direct response to a gate call, IOI has no parameters to use to specify the results of performing the service. Instead, it uses previously specified locations in the workspace or a previously specified event channel as the communications path. ----------------------------------------------------------- Historical Background This edition of the Multics software materials and documentation is provided and donated to Massachusetts Institute of Technology by Group BULL including BULL HN Information Systems Inc. as a contribution to computer science knowledge. This donation is made also to give evidence of the common contributions of Massachusetts Institute of Technology, Bell Laboratories, General Electric, Honeywell Information Systems Inc., Honeywell BULL Inc., Groupe BULL and BULL HN Information Systems Inc. to the development of this operating system. Multics development was initiated by Massachusetts Institute of Technology Project MAC (1963-1970), renamed the MIT Laboratory for Computer Science and Artificial Intelligence in the mid 1970s, under the leadership of Professor Fernando Jose Corbato. Users consider that Multics provided the best software architecture for managing computer hardware properly and for executing programs. Many subsequent operating systems incorporated Multics principles. Multics was distributed in 1975 to 2000 by Group Bull in Europe , and in the U.S. by Bull HN Information Systems Inc., as successor in interest by change in name only to Honeywell Bull Inc. and Honeywell Information Systems Inc. . ----------------------------------------------------------- Permission to use, copy, modify, and distribute these programs and their documentation for any purpose and without fee is hereby granted,provided that the below copyright notice and historical background appear in all copies and that both the copyright notice and historical background and this permission notice appear in supporting documentation, and that the names of MIT, HIS, BULL or BULL HN not be used in advertising or publicity pertaining to distribution of the programs without specific prior written permission. Copyright 1972 by Massachusetts Institute of Technology and Honeywell Information Systems Inc. Copyright 2006 by BULL HN Information Systems Inc. Copyright 2006 by Bull SAS All Rights Reserved