
		    acceptance_test.an53.compin     10/22/84  1057.9rew 03/16/81  1658.5       21942


.ifi init_plm "AN53-01"
.ifi l1h "N__a_m_e:  acceptance_test"
.spb
     The acceptance_test command is called at the beginning and the end of an
absentee command sequence in order to control the absentee process for system
testing.  Information prepared by the master process (interactive mode) is
extracted from the segment abs_control_info in the working directory.  The
master process makes use of the abs_control command for control and
measurement of the test.
.spb 2
E__n_t_r_y:  acceptance_test$init
.spb
     When this command is issued in an absentee process, the following events
occur:
.spb
.inl 10
.unl 5
1.   A special dim, absentee_test_dim_, is attached to user_input.
.spb
.unl 5
2.   A wakeup is sent to the controlling process.
.spb
.unl 5
3.   The absentee process blocks itself until a wakeup is received from the controlling
process.
.spb 2
.inl 0
U__s_a_g_e
.spb
     acceptance_test$init
.spb 2
E__n_t_r_y:  acceptance_test$terminate
.spb
     This command causes a wakeup to be sent to the controlling process.  It sends
along an event message containing an index into a usage array in which virtual CPU time, memory
units, total CPU time and demand page faults are stored.  This usage array is contained
in abs_control_info.
.spb 2
U__s_a_g_e
.spb
     acceptance_test$terminate
.spb 2
E__n_t_r_y: acceptance_test$absentee_read_delay
.spb
     This entry point is referenced in absentee_test_dim_ and is entered
whenever the absentee process performs a read operation directed to the stream
user_input.  A random number (falling within limits extracted from
abs_control_info) is generated and used to determine the length of delay to
occur.  Subsequently the interrupted read operation is directed to the stream
user_i/o.
.spb 2
U__s_a_g_e
.spb
     tra acceptance_test$absentee_read_delay
.spb 2
E__n_t_r_y:  acceptance_test$attach
.spb
     This entry point is referenced in absentee_test_dim_ and is entered when
user_input is attached to the dim.  The stream data block pointer returned is
copied from the user_i/o attach table entry.
.spb 2
U__s_a_g_e: tra acceptance_test$attach
.spb 2
E__n_t_r_y: acceptance_test$detach
.spb
     This entry point is referenced in absentee_test_dim_ and is entered when
a detach is performed on user_input.  An immediate return occurs with a
successful detach status word.
.spb 2
U__s_a_g_e: tra acceptance_test$detach
  



		    check_sst.an53.compin           10/22/84  1057.9rew 03/12/81  1810.9       10440


.ifi init_plm "AN53-00"
.srv section 8
.ifi l1h "N__a_m_e:  check_sst"
     The check_sst command performs a large number of consistency checks on
page control data bases in a copy of the System Segment Table (SST).  Such a
copy may be obtained from an fdump (see the extract command) or ring 0 (see
the copy_out command).
.spb 2
     The Core Map, Paging Device Map, and Active Segment Table (AST) are
scanned, and inconsistencies reported.  In addition, some meters on page and
segment usage gleaned from these scans are printed out.
.ifi l2h "Usage"
.spb
     check_sst path
.spb
where path is the pathname of the copy of the SST segment to be analyzed.
.ifi l2h "Notes"
     Copies of the SST copied out of ring 0 are likely to be inconsistent
unless special care is taken to minimize page faults and other system paging
activity while such a copy is made.
.spb 2
     The check_sst command makes its own copy of the SST provided.  In it, it
sets pad fields in CMEs and PDMEs, and ptw.processed bits as a form of
marking.  The presence of these bits in printouts of these data items should
be understood as originating in this manner.
.brp
.inl 0
.fin



		    copy.an53.compin                10/22/84  1057.9rew 03/12/81  1810.8        7920


.ifi init_plm "AN53-01"
.srv section 8
.brp 8-3
.ifi l1h "N__a_m_e:  copy_dump"
     The copy_dump command is used to copy an fdump image taken by BOS out of
the dump partition into segments in the Multics hierarchy.  The main entry
point copies dumps into segments in the directory >dumps.
.ifi l2h "Usage"
     copy_dump
.spb 2
     There are no arguments
.ifi l2h "E__n_t_r_y:  copy_dump$set_fdump_num, copy_dump$sfdn"
     This entry point is used to set the error report form (ERF) number for
the next fdump to be taken.
.ifi l2h "Usage"
     copy_dump$set_fdump_num erfno
.spb 
where erfno is the ERF number for the next fdump to be taken.
.inl 0
.ifi l2h "Note"
     This command does not allow a particular dump to be copied twice.  It
also does not allow the ERF number to be set if the dump currently in the dump
partition has not been copied.
.inl 0
.fin
.brp



		    copy_out.an53.compin            10/22/84  1057.9rew 03/16/81  1659.5        7227


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  copy_out"
     The copy_out command copies a segment from the supervisor ring into a
user-ring segment.
.ifi l2h "Usage"
     copy_out segname {path}
.spb
.inl 0
where:
.spb
.inl 12
.unl 12
1.   segname
.brf
is the SLT name or the octal number of the segment to be copied.
.spb
.unl 12
2.   newname
.brf
is the pathname of the copy created from segname.
.inl 0
.ifi l2h "Notes"
     If path is not specified the segment is copied into the working directory
with the entryname segname.  However, if an octal number is given, the correct
SLT name, if one exists, for the segment is used.
.spb 2
     If path already exists, it is truncated prior to the copy.  The
ring_zero_peek_ subroutine is used to copy the segment out.
.inl 0
.fin
.brp
 



		    copy_salvager.an53.compin       10/22/84  1057.9rew 03/12/81  1816.9        7218


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  copy_salvager_output"
     The copy_salvager_output command is used to copy the segment
>online_salvager_output into a user-ring segment.
.ifi l2h "Usage"
     copy_salvager_output path
.spb
where path is the pathname of the user-ring segment into which the copy of
>online_salvager_output is placed.  The segment is created if it does not
already exist.  If the segment already exists, its previous contents are
destroyed.
.inl 0
.ifi l2h "Notes"
     The privileged entry point phcs_$ring_0_peek is used to copy the data.
.spb 2
     The number of words copied is calculated from the bitcount of
>online_salvager_output.  Upon successful completion of the command the same
bit count is placed on the user-ring segment.
.brp
.fin
.inl 0
  



		    dump_pdmap.an53.compin          10/22/84  1057.9rew 03/16/81  1700.5        8505


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  dump_pdmap"
     The dump_pdmap command is used to check a copy of the System Segment
Table (SST) segment for consistency.  Its primary concern is with the paging
device map but other checks are also made.
.ifi l2h "Usage"
     dump_pdmap path {control_arg}
.spb
where:
.spb
.inl 12
.unl 12
1.   path
.brf
is the pathname of a copy of the SST to examine.
.spb
.unl 12
2.   control_arg
.brf
can be one of the following:
.spb
.unl 5
-long, -lg
.brf
prints out each Paging Device Map Entry (PDME) as it is scanned.
.spb
.unl 5
-brief, -bf
.brf
prints only summary information (default).
.inl 0
.ifi l2h "Usage"
     Most of the output is self explanatory.  The user should try to get as
consistent a copy of the SST as possible, since any inconsistencies found are
reported.  The copy_out command can be used to get a copy of the SST from a
currently running system.
.inl 0
.fin
.brp
   



		    extract.an53.compin             10/22/84  1058.0rew 03/12/81  1816.8        5976


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  extract"
     The extract command is used to extract a segment from an fdump and leaves
a copy of the segment in the working directory.
.ifi l2h "Usage"
     extract erfno segname
.spb
where:
.spb
.inl 12
.unl 12
1.   erfno
.brf
is the error report form number of the fdump from which the segment is to be
extracted.
.spb
.unl 12
2.   segname
.brf
is the name or octal number of the segment to be extracted.
.inl 0
.ifi l2h "Note"
     Only the first process in a fdump is searched.  The created segment has
the name segname.erfno, where segname and erfno are the command arguments.
.fin
.inl 0
.brp



		    online.an53.compin              10/22/84  1058.0rew 03/12/81  1816.8       41004


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  online_dump"
     The online_dump command is used to create a printable dump from an fdump
created by BOS.  The fdump must have previously been copied into the hierarchy
by the copy_dump command.  The printable dump image is output through the ios_
subroutine.  Optional control arguments may specify the segments that are to
be dumped, that the online_dump is to be restarted, and the device and DIM to
which output is attached.
.ifi l2h "Usage"
     online_dump erfno -control_args-
.spb
.inl 0
where:
.spb
.inl 12
.unl 12
1.   erfno
.brf
is the error report form number that is used to access the segments of the
dump image.
.spb
.unl 12
2.   control_args
.brf
are optional and can be chosen from the following:
.spb
.unl 5 
-dim dimname
.brf
is the name of the ios_ DIM through which the stream od_output is to be
directed.  The default is prtdim, unless a different DIM has been specified
earlier in the process.
.spb
.unl 5
-dev devname
.brf
is the name of the device or ios_ stream to be attached to od_output.  The
default is prta, unless a different device has been specified earlier in the
process.
.spb
.unl 5
-restart procno segno
.brf
is the process number and segment number (both octal) at which the dump is to
be restarted.  The process number is the position of a dumped process relative
to other dumped processes.
.spb
.unl 5
-segs
.brf
indicates that input is read from the following lines that specify which
segments are to be dumped by the online_dump command.  Any number of segments
may be specified on each line.  When the word "quit" is reached, the
online_dump command ceases reading input and begins the dump.
.inl 0
.ifi l2h "Notes"
     If the -restart control argument is present then the online_dump command
skips over the machine registers and all segments of the dump until process
number "procno" is read and a segment number greater than or equal to "segno"
is found.  From that point, the dump proceeds normally subject to the -segs
control argument, if present.
.spb 2
     If the -segs control argument is present, the segment identifications are
interpreted in the following manner.  If the seg_id is "regs" then the machine
registers are dumped.  Otherwise the seg_id is assumed to be the octal segment
number of a segment to be dumped.
.spb 2
     If the seg_id is not an octal number, then the seg_id is checked against
the first name of each segment given in the Segment Loading Table (SLT)
present in the dump.
.spb 2
     If the seg_id is not found in the SLT then the user is warned that the
segment cannot be found.
.spb 2
     In addition, the user is warned if the SLT or certain other segments that
are used to interpret the dump cannot be found.
.ifi l2h "Examples"
     If the printing of the entire dump image of erfno 45 was being done on a
line printer and was interrupted at segment 100 of process 1 then the
following command line would enable the rest of the dump to be printed:
.spb
     online_dump 45 -restart 1 100
.spb 2
     If the user wished to merely inspect the registers at his console:
.spb
     online_dump 45 -dim syn -dev user_io -segs
.brf
     regs
.brf
     quit
.spb 2
     The following exec_com causes the dumping of certain segments upon every
execution, and the specification of others as needed:
.spb
.fif
.inl 5
& ec to extract useful info from a dump and dprint the dumped 
& segments.
& command_line of
&attach
&input_line off
od &1 -dev dump_output.&1 -dim file -segs
regs dseg fault_vector iom_mailbox
 . . . .
kst_seg lock_seg str_seg
201 204 &2 &3 &3 quit
dp -dl -h "ERFNO &1" dump_output.&1
&quit
.fin
.inl 0
.ifi l2h "E__n_t_r_y:  od_cleanup"
     This entry point may be called when printing of a dump is to be suspended
so that the currently attached device may be detached.
.inl 0
.ifi l2h "Usage"
     od_cleanup
.spb
     There are no arguments.
.ifi l2h "E__n_t_r_y:  online_dump_355, od_355"
     This command is used to format a dump of a DATANET 6600 Front-End Network
Processor (FNP) core image that has been created by the BOS FD355 command and
copied into the Multics hierarchy.
.ifi l2h "Usage"
     online_dump_355 erfno -control_args-
.spb
.inl 0
where:
.spb
.inl 12
.unl 12
1.   erfno
.brf
is the same as the online_dump command above.
.spb
.unl 12
2.   control_args
.brf
can be chosen from the following:
.spb
.unl 5
-dim dimname
.brf
is the same as the online_dump command above.
.spb
.unl 5
-dev devname
.brf
is the same as the online_dump command above.
.inl 0
.fin
.brp



		    patch_ring_zero.an53.compin     10/22/84  1058.0rew 03/12/81  1816.6       10467


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  patch_ring_zero, prz"
     The patch_ring_zero command is used to change specified locations of ring
0.  It requires access to hphcs_ by the user.
.ifi l2h "Usage"
    patch_ring_zero segment offset values
.spb
.inl 0
where:
.spb
.inl 12
.unl 12
1.   segment
.brf
is the octal segment number or segment name of a ring 0 segment.
.spb
.unl 12
2.   offset
.brf
is the relative offset (in octal) of the first of _n consecutive words to be
changed.
.spb
.unl 12
3.   values
.brf
are the values for the specified locations in ring 0.
.inl 0
.ifi l2h "Notes"
    The call to the patch_ring_zero command first prints out the changes that
are performed and then asks the user if the changes are correct.  The user
must respond with "yes" for the changes to be made.  The user may patch
read-only segments in ring 0 without explictly changing the access, as this is
done by the command itself.
.ifi l2h "Example"
     patch_ring_zero sst 120  0  0
.spb
     120  001761101001  to  000000000000
.spb
     121  011376143210  to  000000000000
.spb
     Type "yes" if patches are correct:  yes
.inl 0
.fin
.brp
 



		    print_apt_entry.an53.compin     10/22/84  1058.0rew 03/12/81  1816.6        5301


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  print_apt_entry, pae"
     The print_apt_entry command dumps, in octal, the contents of the
specified user's Active Process Table (APT) entry.  The command searches the
Answer Table to find the process ID of the specified user and extracts the APT
entry specified by the process ID.
.ifi l2h "Usage"
     print_apt_entry user_name
.spb 
where user_name is either the name of the user or the name of the teletype
channel assigned to the user, e.g., ttyxxx (or caaxxx) where xxx specifies
some channel number.
.inl 0
.fin
.brp
   



		    print_dump_tape.an53.compin     10/22/84  1058.0rew 03/16/81  1703.3       10386


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  print_dump_tape, pdt"
     The print_dump_tape command is used to print dump tapes produced by BOS.
These tapes are written as unblocked BCD records.
.ifi l2h "Usage"
     print_dump_tape {control_args}
.spb
where the control arguments are optional and can be the following:
.spb
.inl 25
.unl 25
     tape_number         is the number of the dump tape to be printed.  If this control argument
is not specified, "*** dump tape ***" is used in the mount message.
.spb
.unl 25
     printer             is the name of the printer to be used.  This name
must begin with "prt".  If this control argument is not specified, "prtb34" is used.
.spb
.unl 25
     -file pathname      is used to direct the output into a file instead of
printing it online.
.spb
.unl 25
     -page page_no       is used to start printing the tape at the
specified page number.
.inl 0
.ifi l2h "Note"
     The following I/O streams are used:
.spb
.inl 25
.unl 25
     dump_tape_in        input tape stream
.spb
.unl 25
     dump_to_printer     output stream (usually to a printer
but possibly to a file)
.inl 0
.fin
.brp
  



		    ring_zero_dump.an53.compin      10/22/84  1058.0rew 04/15/81  1155.0       14931


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  ring_zero_dump, rzd"
     The ring_zero_dump command prints the locations of the specified ring 0
or user-ring segment in full word octal format.  This command does not require
access to phcs_ for those segments accessible through the ring_zero_peek_
subroutine.
.ifi l2h "Usage"
     ring_zero_dump segname {control_args}
.spb
where:
.spb
.fin
.inl 12
.unl 12
1.   segname
.brf
is either an octal segment number or the name of a ring 0 segment.  To specify
a segment name that consists entirely of octal digits the name must be
preceded by the -name (-nm) control argument.
.spb
.unl 12
2.   control_args
.brf
can be one of the following:
.spb
.unl 5
-first
.brf
is the octal location of the first word to dump.  If the first and count
arguments are omitted, the entire segment is dumped starting with location
zero.
.spb
.unl 12
-count
.brf
is the octal number of words to dump.  If count is omitted, count is set to
one.  When the count argument is supplied, the first argument must also be
supplied.
.ifi l2h "Examples"
     ring_zero_dump sst 200 10
.spb
     ring_zero_dump -nm 400 0 100
.spb
     ring_zero_dump 0 212
.inl
.ifi l2h "Notes"
     If the specified segment is not found in ring 0, the expand_path_
subroutine (described in MPM Subroutine, Order No.!AG93) is used for an
additional search.
.spb 2
     The -first control argument is verified to be a legitimate address.
.spb 2
.inl 0
     When the combination of the -first and -count control arguments specify
an address beyond the last page of the segment, the segment is dumped only
through the last page.
.inl 0
.fin
.brp
 



		    copy_dump_seg.an53.compin       10/22/84  1058.0rew 04/15/81  1155.5       11736


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  copy_dump_seg_"
     The copy_dump_seg_ subroutine is called by the online_dump command to
copy a segment from the dump image into a separate segment so that it can be
randomly accessed at a later time.
.ifi l2h "Usage"
.inl 10
.unl 5
declare copy_dump_seg_ entry (fixed bin, fixed bin,
(0:9) ptr, (0:9) fixed bin, ptr, fixed bin);
.spb
.unl 5
call copy_dump_seg_ (segno, cur_proc_index, ptr_array,
len_array, outptr, outlen);
.spb
.inl 0
where:
.spb
.inl 12
.unl 12
1.   segno
.brf
is the segment number that is looked
for in the dump image.  (Input)
.spb
.unl 12
2.   cur_proc_index
.brf
is the index in the array ptr_array of segment 0 of the process for which
segno is to be found.  (Input)
.spb
.unl 12
3.   ptr_array
.brf
is the array of pointers to successive segments of the dump image.  (Input)
.spb
.unl 12
4.   len_array
.brf
is the array of current lengths of the image segments.  (Input)
.spb
.unl 12
5.   outptr
.brf
is a pointer to the segment into which the copy is to be made.  (Input)
.spb
.unl 12
6.   outlen
.brf
is the number of words copied.  If the segment could not be found, the value
is 0.  (Output)
.inl
.ifi l2h "Note"
     The segment pointed to by outptr is truncated before the copy is made.
.inl 0
.fin
.brp



		    format_dump.an53.compin         10/22/84  1058.0rew 03/12/81  1815.9        5553


.ifi init_plm "AN53-01"
.ifi l1h "N__a_m_e:  format_dump_line_"
.spb
     The format_dump_line_ subroutine, an assembly language procedure, provides a high speed conversion
of eight words from a dump image into ASCII encoded octal word images.
.spb 2
U__s_a_g_e
.spb
      declare format_dump_line_ entry (ptr);
.spb
      call format_dump_line_ (buf_ptr);
.spb
where buf_ptr points to the start of a buffer into which the
formatted data may be deposited.  (Input)
.spb 2
.inl 0
N__o_t_e
.spb
     The output data has Multics PAD characters, (177)8,
embedded in order to speed up the conversion process.
   



		    format_355.an53.compin          10/22/84  1058.0rew 03/16/81  1706.0       22482


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  format_355_dump_line_"
     The format_355_dump_line_ subroutine is an ALM procedure that is called
by the online_355_dump_ subroutine to produce an octal representation of one
or more FNP words.  In addition, there is another entry point,
format_355_dump_line_$line, which produces an octal represention of one or
more FNP words as well as an octal representation of two fixed binary numbers
that are absolute and relative location counters.  These are printed on a dump
line by the online_355_dump_ subroutine.
.ifi l1h "E__n_t_r_y:  format_355_dump_line_"
     This entry point is called to convert one or more FNP words to their
octal representation.
.ifi l2h "Usage"
.inl 10
.unl 5
declare format_355_dump_line_ entry (ptr, fixed bin,
ptr);
.spb
.unl 5
call format_355_dump_line_ (input, count, output);
.spb
.inl 0
where:
.spb
.inl 12
.unl 12
1.   input
.brf
points to the first of the FNP words to be dumped.  This pointer must be
pointing to an 18-bit aligned item.  (Input)
.spb
.unl 12
2.   count
.brf
is the number of FNP words to convert to octal.  (Input)
.spb
.unl 12
3.   output
.brf
points to the area in which to place octal representation of 355 words.  This
pointer must be pointing to a 9-bit aligned item.  (Input)
.inl 0
.ifi l2h "E__n_t_r_y:  format_355_dump_line_$line"
     This entry point is called to convert one or more FNP words to their
octal representation.  In addition, it converts two fixed bin numbers to
octal.  These numbers are absolute and relative location counters.
.ifi l2h "Usage"
.inl 10
.unl 5
declare format_355_dump_line_$line entry (ptr, fixed bin,
ptr, ptr, fixed bin, ptr, fixed bin);
.spb
.unl 5
call format_355_dump_line_$line (input, count, output, 
absp, absloc, relp, relloc);
.spb
.inl 0
where:
.spb
.fin
.inl 12
.unl 12
1.   input
.brf
same as above.  (Input)
.spb
.unl 12
2.   count
.brf
same as above.  (Input)
.spb
.unl 12
3.   output
.brf
same as above.  (Input)
.spb
.unl 12
4.   absp
.brf
is a pointer to the area in which to place octal representation of absolute
location counter (argument 5).  (Input)
.spb
.unl 12
5.   absloc
.brf
is the absolute location currently being printed in dump.  (Input)
.spb
.unl 12
6.   relp
.brf
is a pointer to the area in which to place octal representation of relative
location counter (argument 7).  (Input)
.spb
.unl 12
7.   relloc
.brf
is the relative location currently being printed in dump.  (Input)
.inl 0
.fin
.brp
  



		    print_dump_segname.an53.compin  10/22/84  1058.0rew 04/15/81  1156.9       26982


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  print_dump_seg_name_"
     The print_dump_seg_name_ subroutine is called to print the Segment
Descriptor Word (SDW), pathname, and reference names for a segment.  It is
used by the online segment dumper.
.ifi l2h "Usage"
.inl 10
.unl 5
declare print_dump_seg_name_ entry (fixed bin, fixed bin(71),
ptr, ptr);
.spb
.unl 5
call print_dump_seg_name_ (segno, sdw, sstp, sstnp);
.spb
.fin
.inl 0
where:
.spb
.inl 12
.unl 12
1.   segno
.brf
is the segment number to be used.  (Input)
.spb
.unl 12
2.   sdw
.brf
is the SDW that is to be printed.  (Input)
.spb
.unl 12
3.   sstp
.brf
is a pointer to an image of the System Segment Table (SST) segment from the
fdump.  With the pointer sstnp, sstp is used to determine the names of
nonhardcore segments.  (Input)
.spb
.unl 12
4.   sstnp
.brf
is a pointer to an image of the SST name table segment in the fdump.  (Input)
.inl
.ifi l2h "Note"
     If sstp or sstnp is null or if the SST name table pointed to by sstnp is
not valid, only the SDW breakout is printed.
.ifi l2h "E__n_t_r_y: print_dump_seg_name_$hard"
     This entry point is called to print the SDW and name of a hardcore
segment.  The name printed is the first name given the segment in the Segment
Loading Table (SLT) Name Table.
.ifi l2h "Usage"
.inl 10
.unl 5
declare print_dump_seg_name_$hard entry (fixed bin,
fixed bin(71), ptr, ptr);
.spb
.unl 5
call print_dump_seg_name_$hard (segno, sdw, sltp,
namep);
.spb
.inl 0
where:
.spb
.fin
.inl 12
.unl 12
1.   segno
.brf
is the segment number to be used.  (Input)
.spb
.unl 12
2.   sdw
.brf
is the SDW that is to be printed.  (Input)
.spb
.unl 12
3.   sltp
.brf
is a pointer to the SLT to be used to find the segment's name.  (Input)
.spb
.unl 12
4.   namep
.brf
is a pointer to the SLT names segment to be used to find the segment's name.
(Input)
.inl 0
.ifi l2h "Note"
     If sltp or namep is null or segno is outside the limits found in the SLT,
only the SDW breakout is printed.
.ifi l2h "E__n_t_r_y:  print_dump_seg_name_$get_ptr"
     This entry point returns a pointer to an online copy of a segment to be
examined by the online_dump command or one of its associated subroutines.  It
is useful because copies of nonwritable (i.e., procedure) segments are
generally not dumped by BOS and are not present in the dump itself.
.ifi l2h "Usage"
.inl 10
.unl 5
declare print_dump_seg_name_$get_ptr entry (fixed bin,
ptr, ptr);
.spb
.unl 5
call print_dump_seg_name_$get_ptr (segno, sstp, sstnp, segptr);
.spb
.fin
.inl 0
where:
.spb
.inl 12
.unl 12
1.   segno
.brf
is the segment number that was associated with some procedure segment in a
dumped process.  (Input)
.spb
.unl 12
2.   sstp
.brf
is the same as for the print_dump_seg_name_ entry point.  (Input)
.spb
.unl 12
3.   sstnp
.brf
is the same as for the print_dump_seg_name_ entry point.  (Input)
.spb
.unl 12
4.   segptr
.brf
is a pointer to a copy of the procedure to be examined.  (Output)
.inl 0
.fin
.brp
  



		    online_355.an53.compin          10/22/84  1058.0rew 03/12/81  1820.1        9270


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  online_355_dump_"
     The online_355_dump_ subroutine is called by the od_355 command (see the
online_dump command).  It is passed a pointer to an FNP fdump.  It processes
the dump producing an octal memory dump, a print out of the FNP registers, and
an interpretation of the FNP software trace table.  This data is written using
the ios_ subroutine on the stream od_output, which must be attached before
calling the online_355_dump_ subroutine.
.ifi l2h "Usage"
     declare online_355_dump_ entry (ptr);
.spb
     call online_355_dump_ (dumpp);
.spb
where dumpp points to a FNP fdump.  (Input)
.ifi l2h "Note"
     The dump output begins with register values, trace table and memory
contents.  Memory is dumped eight words per line.  Included on the line are
the absolute location, module name, relative location in that module, and
memory contents.  Duplicate lines are not printed; instead an asterisk is put
at the beginning of the next line.
.inl 0
.fin
.brp
  



		    od_stack.an53.compin            10/22/84  1058.0rew 03/12/81  1820.1       15291


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  od_stack_"
     The od_stack_ subroutine is used by the online_dump command to format and
print stack segments.  Its primary purpose is to break the stack into frames
and to number them in sequence.
.ifi l2h "Usage"
     declare od_stack_ entry (ptr, fixed bin, ptr, ptr, ptr);
.spb
     call od_stack_ (stkp, stklen, sltp, namp, sstp, sstnp);
.spb
.inl 0
where:
.spb
.inl 12
.unl 12
1.   stkp
.brf
is a pointer to the stack segment.  (Input)
.spb
.unl 12
2.   stklen
.brf
is the length of the stack in works.  (Input)
.spb
.unl 12
3.   sltp
.brf
is a pointer to a Segment Loading Table (SLT) to be used to determine names of
hardcore segments.  (Input)
.spb
.unl 12
4.   namp
.brf
is a pointer to a name_seg to be used to determine the names of hardcore
segments.  (Input)
.spb
.unl 12
5.   sstp
.brf
is a pointer to an image of the SST segment from the fdump.  With the pointer
sstnp, it is used to determine the names of nonhardcore segments.  (Input)
.spb
.unl 12
6.   sstnp
.brf
is a pointer to an image of the SST name table segment in the fdump.  (Input)
.inl 0
.ifi l2h "Usage"
     The frames are numbered with the lowest number being at the head of the
stack.  If in some frame stack_frame.next_sp is equal to
stack_header.stack_end_ptr, then that frame is numbered zero.  If that is not
the case for any frame, then the stack header is given number zero.  The stack
trace continues beyond stack_end_ptr so long as the back pointers are good.
.spb 2
     If any stack frames have been jumped over by syserr in its attempt to
preserve the stack history, then these frames are also broken out and numbered
XX.
.inl 0
.fin
.brp
 



		    od_print.an53.compin            10/22/84  1058.0rew 03/12/81  1820.1       34011


.ifi init_plm "AN53-01"
.ifi l1h "N__a_m_e:  od_print_"
     The od_print_ subroutine provides a page and line formatting capability
for the online_dump command.
.ifi l2h "E__n_t_r_y:  od_print_"
     This entry point provides a general formatting capability for the
online_dump command equivalent to that of the ioa_ subroutine (described in
the M_P_M_ S__u_b_r_o_u_t_i_n_e_s, Order No.!AG93).
.ifi l2h "Usage"
      declare od_print_ entry;
.spb
      call od_print_ (nlines, fmt_string, arg1_, ..., arg_n);
.spb
.inl 0
where:
.spb
.inl 12
.unl 12
1.   nlines
.brf
is an integer value (fixed bin) denoting the number of lines to be generated
during formatting.  (Input)
.spb
.unl 12
2.   fmt_string
.brf
is the control string (char(*)) used to produce the desired output.
Formatting control characters are identical to those of the ioa_ subroutine.
(Input)
.spb
.unl 12
3.   arg_i
.brf
are the arguments required by the fmt_string argument.  (Input)
.inl 0
.ifi l2h "E__n_t_r_y:  od_print_$op_fmt_line"
     This entry point is called to format and print eight words in octal with
their associated location field.
.inl 0
.ifi l2h "Usage"
.inl 10
.unl 5
declare od_print_$op_fmt_line entry (fixed bin, fixed bin,
.brf
(0:7) fixed bin);
.spb
.unl 5
call od_print_$op_fmt_line (abs_loc, loc, arr);
.spb
.inl 0
where:
.spb
.inl 12
.unl 12
1.   abs_loc
.brf
is the absolute location the data occupied.  (Input)
.spb
.unl 12
2.   loc
.brf
is the integer value to be printed as the offset for the line.  (Input)
.spb
.unl 12
arr
.brf
is the array of words to be printed.  (Input)w
.inl 0
.ifi l2h "E__n_t_r_y:  od_print_$op_finish"
     This entry point is called when dumping is finished to transfer the last
buffer of formatted characters to the I/O switch.  (See "Notes" below.)
.ifi l2h "Usage"
      declare od_print_$op_finish entry;
.spb
      call od_print_$op_finish;
.spb
      There are no arguments.
.ifi l2h "E__n_t_r_y:  od_print_$op_new_seg"
     This entry point is used to inform the od_print_ subroutine that a new
segment is being printed.  This is done so that a new page may be started with
the new segment number included in the page header.
.ifi l2h "Usage"
      declare od_print_$op_new_seg entry (fixed bin);
.spb
      call od_print_$op_new_seg (segno);
.spb
where segno is the number of the segment to be printed next.
(Input)
.ifi l2h "E__n_t_r_y:  od_print_$op_init"
     This entry point is called to initialize the od_print_ subroutine and
provide certain constant information for the page header.
.ifi l2h "Usage"
.inl 10
.unl 5
declare od_print_$op_init entry (fixed bin,
fixed bin(71));
.spb
.unl 5
call od_print_$op_init (erfno, time);
.spb
.inl 0
where:
.spb
.inl 12
.unl 12
1.   erfno
.brf
is the error report form number associated with the dump.  (Input)
.spb
.unl 12
2.   time
.brf
is the time at which the dump was created.  (Input)
.inl 0
.ifi l2h "E__n_t_r_y:  od_print_$op_new_page"
     This entry point may be called when the next line of output should appear
on a new page.
.ifi l2h "Usage"
      declare od_print_$op_new_page entry;
.spb
      call od_print_$op_new_page;
.spb
      There are no arguments.
.ifi l2h "Notes"
     Formatted data is internally buffered so that the I/O switch is called
less often.
.spb 2
     In order to speed up the online dumper's operation, formatted data as
passed to the I/O switch contains ASCII NUL characters, (000)8.
.spb 2
     The size of the formatted string may not exceed 256 characters.
.spb 2
     A new page header is printed before the currently requested line is
printed, if the number of lines currently formatted on the page, plus the
number of lines for the current request, exceeds the number of lines per page.
.inl 0
.fin
.brp
 



		    get_dump_ptrs.an53.compin       10/22/84  1058.1rew 03/12/81  1821.6       13932


.ifi init_plm "AN53-01"
.ifi l1h "N__a_m_e:  get_dump_ptrs_"
     The get_dump_ptrs_ subroutine returns pointers to the component segments
of a fdump, given the ASCII representation of the error report form number for
the fdump.
.ifi l2h "Usage"
.inl 10
.unl 5
declare get_dump_ptrs_ entry (char(*), (0:9) ptr,
(0:9) fixed bin, fixed bin, char(32) aligned);
.spb
.unl 5
call get_dump_ptrs_ (erfno, ptr_array, len_array, nsegs,
primary_name);
.spb
.inl 0
where:
.spb
.inl 12
.unl 12
1.   erfno
.brf
is the ASCII representation of the error report form number of the fdump.
(Input)
.spb
.unl 12
2.   ptr_array
.brf
is filled in with pointers to the component segments of the fdump.  (Output)
.spb
.unl 12
3.   len_array
.brf
is an array of the current lengths of the component segments of the fdump.
(Output)
.spb
.unl 12
4.   nsegs
.brf
is the number of segments that make up the fdump, i.e., the number of pointers
returned.  If this number is 0, there was some trouble initiating the
specified fdump segments.  (Output)
.spb
.unl 12
5.   primary_name
.brf
is the entryname of the first segment of the fdump.  (Output)
.inl 0
.ifi l2h "Note"
     The format of standard fdump names is as follows:
.spb 2
         mmddyy.hhmm.i.erfno
.spb 2
where:
.spb
.fin
.inl 12
.unl 12
mmddyy
.brf
is the date of the fdump.
.spb
.unl 12
hhmm
.brf
is the time of the fdump.
.spb
.unl 12
     i              is an integer from 0 to 9 indicating
which fdump segment it is.
.spb
.unl 12
erfno
.brf
is the error report form number of the fdump.
.inl 0
.fin
.brp



		    get_ast_name.an53.compin        10/22/84  1058.1rew 03/12/81  1821.8       14769


.ifi init_plm "AN53-01"
.ifi l1h "N__a_m_e:  get_ast_name_"
     The get_ast_name_ subroutine is called by the online_dump command to
obtain the pathname of a segment from copies of the SST and SST name table
segments.
.spb 2
     The get_ast_name_ subroutine assumes that the SST name table supplied was
validly filled (by either Multics or BOS).  The get_ast_name_ subroutine tries
to fit the full primary_name pathname of the specified segment in the supplied
return string.  If it cannot fit, components of the pathname are removed
(recognizable as ">>" in the output string) towards the left-hand end of the
pathname.  The get_ast_name_ subroutine never truncates a pathname, and thus,
the entryname always appears intact.  If the get_ast_name_ subroutine cannot
obtain the pathname, the message "CANNOT GET PATHNAME" is returned.
.ifi l2h "Usage"
     declare get_ast_name_ entry (ptr, ptr, ptr, char(*));
.spb
     call get_ast_name_ (astep, sstp, sstnp, retstr);
.spb
where:
.spb
.inl 12
.unl 12
1.   astep
.brf
is a pointer to the Active Segment Table Entry (ASTE) of the segment whose
pathname is desired.  The ASTE must be in the segment pointed to by sstp.
(Input)
.spb
.unl 12
2.   sstp
.brf
is a pointer to the copy of the SST segment to be used to determine the
pathname.  sstp must point to the base of a segment.  (Input)
.spb
.unl 12
3.   sstnp
.brf
is a pointer to the copy of the SST name table segment to be used to determine
the pathname.  sstnp must point to the base of a segment.  (Input)
.spb
.unl 12
4.   retstr
.brf
is the pathname of the segment whose ASTE is pointed to by astep.  (Output)
.inl 0
.fin
.brp
   



		    print_aste_ptp.an53.compin      10/22/84  1058.1rew 03/12/81  1821.8        7245


.ifi init_plm "AN53-01"
.srv section 8
.ifi l1h "N__a_m_e:  print_aste_ptp, pap"
     The print_aste_ptp command prints out the Active Segment Table Entry
(ASTE) and page table of the specified segment.  If any pages are in core, the
device address is extracted from the appropriate Core Map Entry (CME) and
printed out as well.
.inl 0
.ifi l2h "Usage"
     print_aste_ptp segment
.spb
where segment is either the pathname or the segment number of the segment
whose STE is to be printed.  If the argument is an octal number, it is taken
to be the segment number of the segment to be printed.  If the argument is a
pathname, the specified segment is printed.  If the segment cannot be found,
the ring 0 segments are searched to see if the segment name given specifies a
ring 0 segment
.brp
.fin
.inl 0
   



		    s8.an53.compin                  10/22/84  1058.1rew 03/16/81  1654.3        2565


.ifi init_plm "AN53-01"
.srv section 8
.ifi l0h "Command Descriptions"
     This section contains the command descriptions needed to analyze dumps.
Some of these commands have been referenced in previous sections.
.spb 2
     The command descriptions are arranged alphabetically.
.brp
   



		    ol_dump.an53.compin             10/22/84  1058.1rew 04/14/81  1819.5      123030


.ifi init_plm "AN53-01"
.ifi l1h "N__a_m_e:  ol_dump"
     The ol_dump command can be used to look at selected parts of an online
dump created by the BOS FDUMP command and copied into the Multics hierarchy by
the copy_fdump command.  The command is designed to aid system programmers in
.cbd
the task of crash analysis.
.*.cbn
.ifi l2h "Usage"
.*.cbf
     ol_dump {erfno} {-control_arg}
.spb
where:
.spb 1
.inl 10
.unl 10
1. erfno
.brf
.*.cbn
is an error report form number given in decimal, or "last" if the latest dump
taken is to be selected.  If erfno is not specified, ol_dump enters its
request loop described below.  If an erfno is given, ol_dump searches its
currently referenced dump directory (see below) for a copy of the dump; if it
finds the dump, it initializes itself to be able to process the given dump.
If the dump is not found, the user is told and the request loop is entered.
.spb 1
.unl 10
.*.cbn
2. control_arg
.brf
can be -pathname path or -pn path to specify a directory pathname where online
dumps are to be found.  If the -pathname arg is not given, the default dump
directory of >dumps will be used.
.*.cbf
.inl 0
.ifi l2h "Request Loop"
     Once ol_dump has processed the erfno argument, it enters a loop reading
requests from user_input.  The requests allow the user to look at selected
regions of the dump currently under analysis or to choose another dump (erfno)
for analysis.  The following requests are implemented (letters in parentheses
are abbreviations):
.spb 1
.inl 10
.unl 5
.*.cbn
erf arg
.brf
selects another dump for immediate analysis; arg can be either an erf number
or "last" if the latest dump taken is to be analyzed.
.*.cbf
.spb
.unl 5
quit (q)
.brf
returns.
.spb
.unl 5
.cbn
command (c) or (..)
.cbf
.brf
passes the rest of the request line onto the current command processor.
.spb
.unl 5
list (l)
.brf
.*.cbn
lists the dumps in
.*.cbf
the current dump directory by showing the name of the first component of the
dump.  The names of dumps tell when the dump was taken and what the erfno is.
.spb
.unl 5
help (?)
.brf
lists the requests of ol_dump.
.spb
.*.cbn
.unl 5
dump (d) {arg1 arg2 arg3 arg4 arg5}
.brf
displays selected words located in the current dump under analysis.  Arguments
are as follows:
.spf
.inl 23
.unl 13
arg1  must be one of the following:
.spf
.unl 7
seg s  displays
selected words from segment "s" in the current process where
"s" may be a segment number or name.
If no other arguments are specified, then the entire segment is dumped in octal.
.spf
.unl 7
mem    displays selected words starting at absolute memory location indicated by arg2.
A search is made of all running process's descriptor segments and AST/PT
entries.  If the requested address is found, the segment number and name,
segment offset, and the process DBR value is output as well as the
requested number of words. If the requested memory address is not found,
it is assumed to be in free store.
.spf
.inl 17
.unl 6
arg2 segment offset (if arg1 is seg s) or the absolute memory address
(if arg1 is mem).
.spf
.unl 6
arg3  if the first character of arg3 is a "+" or "-", then the
rest of arg3 is either added or subtracted (as an octal number) from the base of arg2. If the first character of arg3 is not a "+" or "-", then arg3 is the number of elements to be dumped.
.spf
.unl 6
arg4  if the "+" or "-" option of arg3 is present, then arg4 is the number of
elements to be dumped. If the "+" or "-" option was not used with arg3, then arg4
is any of the output modes used with the debug
command ("o" for octal, "a" for ASCII, "p" for pointer, "i" for
instruction format, etc.).  If the instruction mode ("i") is used, and if
the requested segment is not found in the dump (only segments with read
and write access are found in the dump; this usually precludes executable
object segments from being dumped), then a search of the library directories
is made.  If the segment is then found, the segment is dumped in instruction format.
.spf
.unl 6
arg5  if present, arg5 is used to specify the output mode as above.
.spf
.inl 10
.unl 5
dbr arg {n}
.brf
switches to another process (in the
same dump). Arguments are as follows:
.spf
.inl 17
.unl 7
cpu    switches to process that
is executing on CPU n. n maybe either the cpu number (0 to 7) or cpu tag (a to h).
.spf
.unl 7
value  switches to another process by specifying the
dbr value for the new process.
.brf
.cbd
.spf
.inl 10
.unl 5
name (n) segno {offset}
.brf
displays the SLT or SST name.
.spf
.inl 18
.unl 8
segno   is a segment number.
.spf
.unl 8
offset  displays
the bound segment name as well as
the component name and the relative offset in that component (if the specified segment is bound).
.brf
.cbd
.spf
.inl 10
.unl 5
amsdw (ams) {prds}
.brf
displays the saved contents of the SDW associative memory.  If the optional
prds argument is present, the saved SDW associative memory in the prds is
displayed.  If the prds argument is not present, then the saved contents of
the SDW associative memory at the time of the dump in the bootload CPU is
displayed.
.spf
.unl 5
amptw (amp) {prds}
.brf
displays the saved contents of the PTW associative memory. If the optional prds argument
is present, the saved PTW associative memory  in the prds is displayed. If the prds argument is
not present, then the saved contents of the PTW associative memory at the time of the dump in the bootload CPU
is displayed.
.spf
.unl 5
syserdta (sdta)
.brf
displays the message entries in the wired message segment
"syserr_data".
.spf
.unl 5
syserlog (slog) n
.brf
displays the specified number of message entries
in the paged message segment "syserr_log" starting with the most recent entry.
.spf
.unl 5
proc (p) arg
.brf
displays some APT data for the process
specified.  Arguments are as follows:
.spf
.inl 15
.unl 5
all  displays all of the APT entries.
.spf
.unl 5
cur  displays only the APTE for the current process (as defined by the dbr value).
.spf
.unl 5
run  displays only those APTEs that are currently executing
on the configured CPUs.
.spf
.unl 5
rdy  displays only those APTEs whose execution state is "ready".
.spf
.unl 5
wat  displays only those APTEs whose execution state is "waiting".
.spf
.unl 5
blk  displays only those APTEs whose execution state is "blocked".
.spf
.unl 5
stp  displays only those APTEs whose execution state is "stopped".
.spf
.unl 5
emp  displays only those APTEs whose execution state is "empty".
.spf
.unl 5
n    displays APTE whose number is n.
.spf
.inl 10
.unl 5
.cbn
stack (s) seg {os args lg fwd}
.cbf
.brf
displays a stack trace of the stack segment specified by seg.
.spf
.inl 15
.unl 5
seg  is either a segment name or number or the key word "ring" in which case, the next arg would be the ring number of the stack to be traced. (i.e. stack ring 0).
.spf
.unl 5
os   starts trace at
the frame whose offset is os and continues to the end of  the stack.
If the offset argument is omitted, then the trace is started at the stack base.
The segment name of the return pointer is displayed for all  segments.
If the name is a bound segment, the component name as well as the relative offset is
displayed in the form "bound_seg$comp_name|offset".
If the return pointer indicates "pl1_operators", then Pointer Register 0 is picked up and used instead.
This is indicated by the flag "[pr0]" being displayed after the segment name.
.spf
.unl 5
args displays the stack frame arguments in interpreted format.
.spf
.unl 5
lg   produces
an octal dump of each stack frame, as well as interpreting the arguments as above.
.spf
.unl 5
.cbn
fwd  starts the trace at the beginning of the stack (as defined by the stack begin_pointer) and continues to the end of the stack (as defined by the stack_end pointer).
.cbf
.spf
.inl 10
.unl 5
mcprds (mcpr) arg {lg}
.brf
displays the PRDS machine conditions for the
specified argument. Only an interpreted version of  the SCU data is displayed unless the "lg" argument is used.
.spf
arg  can be one of the following:
.inl 20
.spf
.inl 25
.unl 10
int       displays machine conditions for prds|interrupt data.
.spf
.unl 10
systroub  displays machine conditions for prds|system trouble data.
.spf
.unl 10
fim       displays machine conditions for prds|fim data.
.spf
.unl 10
all       displays all machine condition save areas in PRDS.
.spf
.inl 15
.unl 5
lg   displays pointer registers and processor registers as well as SCU data.
.spf
.inl 10
.unl 5
mcpds (mcp) arg {lg}
.brf
displays the PDS machine conditions for the
specified argument. Only an interpreted version of  the SCU data is displayed unless the "lg" argument is used.
.spf
arg  can be one of the following:
.spf
.inl 22
.unl 7
pgflt  displays machine conditions for pds|page fault data.
.spf
.unl 7
fim    displays machine conditions for pds|fim data.
.spf
.unl 7
sig    displays machine conditions for pds|signal data.
.spf
.unl 7
all    displays all machine condition save areas in PDS.
.spf
.inl 15
.unl 5
lg   displays pointer registers and processor registers as well as SCU data.
.spf
.inl 10
.unl 5
mc arg1 arg2 {arg3 lg}
.brf
displays machine conditions from anywhere. Only the SCU data is
displayed unless the "lg" argument  is used. Arguments are as follows:
.spf
.inl 17
.unl 7
arg1  segment name/number or "cond", to display machine conditions from a condition frame specified in the following arguments.
.spf
.unl 7
arg2  segment offset, if arg1 is a segment name/number, or segment
name/number, if arg1 is equal to "cond".
.spf
.unl 7
arg3   if arg1 is equal to "cond", then arg3 is the segment offset of the condition frame.
If arg3 is not specified, and arg1 is equal to "cond", then the entire stack segment
(from arg2) is searched for a condition frame. In this case the first condition
frame found (starting from the stack_end_ptr and working toward the stack_begin_ptr) is displayed.
.spf
.unl 7
lg     displays the pointer registers and processor registers as well as the SCU data.
.spf
.inl 10
.unl 5
dumpregs (dregs) {arg}
.brf
displays the processor registers that were saved at
the time of the dump, from the bootload CPU. If no arguments are given, all of the registers
are displayed. The optional arguments are as follows:
.spf
.fif
ptr       displays the pointer registers only.
.spf
preg      displays the processor registers only.
.spf
scu       displays the saved SCU data only.
.spf
all       displays all of the above.
.fin
.spf
.unl 5
lrn {segno1 segno2}
.brf
displays a breakout of the descriptor segment (dseg) by printing the SDW's
segment numbers, and names for specified segment numbers of dseg.
If no optional arguments are given, the descriptor segment is broken out from segment number 0 to the last segment
in dseg.
.spf
.inl 18
.unl 8
segno1  segment number at which breakout starts.
.spf
.unl 8
segno2  segment number at which breakout stops.  If this argument is not given, but segno1 is, breakout
continues to the end of dseg.
.inl 10
.spf
.unl 5
segno (segn) name
.brf
displays the segment number for a given entry name.
.spf
.unl 5
ssd arg {paths}
.brf
allows the user to specify up to three directories for finding offsets and
bindmaps for hardcore segments.  The default directory is
>ldd>hardcore>object.
.spf
arg    can be chosen from the following:
.spf
.inl +12
.unl 5
pr   displays the current directories searched.
.spf
.unl 5
def  resets the directories searched to the default value.
.spf
.inl 17
.unl 7
paths  pathnames of directories to search (maximum of three).  If no arg argument is given,
at least one path argument must be given.  When more than one path argument is specified, the directories
are searched in the order specified.
.inl 10
.spf
.unl 5
hisregs (hregs) arg1 {arg2 arg3}
.brf
displays a composite analysis of the processor history registers.
Arguments are as follows:
.spf
arg1  may be chosen from the following:
.spf
.inl +12
.unl 6
pds   displays the stored history registers from the PDS.
.spf
.unl 6
dmp   displays the history registers stored at the time of the dump by the bootload processor.
.spf
.unl 6
help  displays a list of the abbreviations used in the history register analysis, and their meaning.
.spf
.unl 6
seg   maybe a segment name or number.
.spf
.unl 6
cond  displays history registers from a condition frame, the location of
which is described by arg2 and arg3.
.spf
.inl 16
.unl 6
arg2  if arg1 is "seg" then arg2 describes the segment offset
to the beginning of the history register area. If arg1 is "cond" then arg2
defines the segment name or number for the desired history registers.
.spf
.unl 6
arg3  if arg1 is "cond", then arg3 describes the segment offset to the start of a condition frame. If arg3 is not present and arg1 is "cond",
then the entire stack segment (specified by arg2) is searched for a condition
frame.
.spf
.inl 10
.unl 5
pcd {arg}
.brf
displays the contents of the "config_deck" segment in an interrupted
fashion. Arguments can be any one of the card types found in the configuration deck (cpu, mem, prph, etc.).
The pcd command will process from 1 to 32 arguments. If no arguments are given, the entire config deck is displayed.
.spf
.unl 5
ast (pt) name
.brf
displays the AST entry and page
table for the given segment.  Name may be an segment name or number.
.spf
.unl 5
queue (tcq)
.brf
displays the scheduler's priority
queue in order of priority.
.spf
.unl 5
.cbn
dumpdir path
.brf
sets the dump directory to that specified by path.
.spf 2
.inl
.unl -5
If the request line is none of the above, an error message is displayed and the
request loop is re-entered.
.cbf
.*     If the request line is none of the above, the _e_n_t_i_r_e line is
.*passed directly to the current command processor.
.*.cbf
  



		    s2.an53.compin                  10/22/84  1058.1rew 03/16/81  1537.5      567873


.ifi init_plm "AN53-01"
.srv  section "2"
.ifi l0h "Crash Analysis in the Ring Zero Environment"
     This section provides some basic knowledge about the supervisor
environment necessary to anyone analyzing a dump regardless of the cause for
the crash.  Some fundamental knowledge about Multics process
environments relevant to this task is reiterated here as well;
the reader should be reasonably conversant with most of the material
in the MPM Reference Guide (Order #AG91) and the MPM Subsystem Writer's
Guide (Order #AK92).
.ifi l1h "Segment numbers in the Supervisor"
     The entire supervisor environment is built by system initialization at
bootload time.  Because the supervisor is at a lower level than the standard
linkage fault mechanism, it cannot use it, and thus, all of the supervisor
must be "prelinked" at bootload time.  The supervisor (unless constructed
improperly while debugging) will never take linkage faults.
.spb 2
     Thus, the correspondence of segment number to segment in the supervisor
is fixed at bootload time, and is in fact known at system tape
generation time (the check_mst tool will display this correspondence
under the heading "Loading Summary").  The segment numbers of
all parts of the supervisor are the _s_a_m_e in _a_l_l _p_r_o_c_e_s_s_e_s
running under the same system tape.  When a process is created,
the "supervisor region" (lower-numbered segments) of its descriptor segment
is copied from that of the Initializer (with the exception of a handful of
per-process SDWs, to be discussed below).  This ensures that pointers
made and stored by the supervisor in supervisor data bases, including
links in the supervisor's linkage sections, have the same meaning
in all processes.  Therefore, while one must qualify
a statement like "The segment number of bound_pl1_ is 332" with
a statement of which process this correspondence was observed in,
one can say unambiguously "the segment number of bound_page_control is 34"
given a single system tape.  The correspondence of segment number
to segment is defined in the Segment Loading Table (SLT).
.spb 2
     The supervisor has one set of linkage segments, built at
bootload time.  Of the two which survive past initialization time,
a distinction is drawn between the segments active_sup_linkage
and wired_sup_linkage.  The latter is fully resident in main memory
at all times, being the combined linkage of procedures which must
(at times) operate without taking page faults.  The linkage
sections of the supervisor are found via the supervisor's
LOT (Linkage Offset Table), which, like the linkage sections themselves,
is one per system.  All stacks used in ring zero
designate this LOT as the LOT to be used.
Since there is only one set of supervisor linkage sections,
symbolic references such as "tc_data$initializer_id" designate
the same object, no matter in what process the supervisor code runs.
.spb 2
     As stated above, a small set of segment numbers used by the supervisor do
_n_o_t designate the same object in every process.  For example,
the SDW for the descriptor segment itself is placed into a fixed
slot in a descriptor segment being created.  This is currently
(but _n_o_t _o_f _n_e_c_e_s_s_i_t_y!) slot 0, which is the
segment number corresponding to the name "dseg".  Therefore,
a reference to "dseg$" in any supervisor program will always
reference segment #0 in whatever process it runs, although this
will be a different physical _s_e_g_m_e_n_t in each process.
Similarly, the process data segment (PDS) of a process, which
contains identifying information and critical state information
(see below), the Known Segment Table (KST), and the various
multiplexed stacks (see below) also fall into this category.
Although "they" have the same segment numbers in each process,
the segment numbers refer to different objects.  A reference
to "pds$processid" in a program in the supervisor is a reference to
(for example) segment 56, word 320, no matter what process
it runs in, but as to what segment 56 _i_s, which is to say,
_w_h_o_s_e PDS, this is a function of process.
.ifi l1h "Per-process supervisor segments"
     Each process has several segments used by the supervisor
unique to that process.  They are created at the time
the process is created, by the Initializer, as segments
in the new process' process directory.  They are forced
to be permanently (until process destruction) active, so that
SDW's describing them may be used without the possibility
of segment fault.
.spb 2
.ifi l2h "The Descriptor Segment"
     Perhaps the most important of these is the
Descriptor Segment, which is used by the hardware to define
exactly which segments constitute the process.
Because the descriptor segment is used by the hardware, there
is no space in it for auxiliary information about the process
or its segments; it consists only of Segment Descriptor Words
(SDW's), some put there at process creation time, some swapped
at scheduling time, and the rest managed by segment control in
response to segment faults, deactivations, and the like
(See the Storage System PLM, Order# AN61 for a full description
of these activities).
.spb 2
     The first page of the descriptor segment is wired (locked in
main memory) when a process is made eligible by the traffic
controller.  The wiring of this page (along with the first page
of the PDS, see below) at eligibity-awarding time
is called _l_o_a_d_i_n_g, and is a complex multipass operation
involving a subtle interaction between traffic control and
page control (see AN61 for page control's view of it).
.bbf
Loading is not actually performed at eligibility grant time,
but at the time a process is picked up by the traffic controller
which is not loaded, yet ought be.  The code of pxss is
the only reference source for those wishing to master this
subtlety.
.bef
Since a page fault or interrupt (or connect fault or timer runout)
can occur at any time in a process without warning, the
SDW's for those segments that must be involved in the handling
of these faults and interrupts must be in main memory _i_n _a_d_v_a_n_c_e.
Thus, we group all SDW's for such segments in the first page of the descriptor
segment (segment numbers 0 to 511 decimal) and ensure the residency of that
page in main memory when a process runs.
.ifi l2h "The PDS (Process Data Segment)"
     The most important supervisor software data base in a process
is surely the PDS (Process Data Segment).  Like the Descriptor Segment,
it is created with the process, its SDW placed in the Descriptor Segment
at process creation time, and referenced symbolically by the supervisor
by a system-wide segment number, viz., the SLT value of the symbol "pds$".
The PDS contains _a_l_l supervisor information about a process which
.spb 
.bbk
    1) Is not needed except when running in that process
       (Globally needed info would be in tc_data)
    2) Is not the actual SDW's for segments (these are in the
       descriptor segment)
    3) Is not the segment number to segment mapping
       (that is stored in the KST of the process)
    4) Is not the stack frames of procedures (stacks are used
       for that, and discussed below.
.bek
.spb
     Among the information stored in the PDS is the identity of the
process (its 36-bit process ID and "group ID" (e.g., "Qaddafi.Libya.a")),
machine conditions for page faults and pre-empts (see the discussion
"Finding the right fault data" below), the per-process page-fault trace,
the per-process lock array, and so on.  The first page of the PDS is
wired when a process is loaded, as a process can take a page fault
at _a_n_y _t_i_m_e, and page fault data can be stored in this page
of the PDS with no advance warning or preparation.  The end of the PDS
trails off in an array of SCU data and EIS data for unrestarted faults
which have been signalled.  It is used to validate user changes
to signalled machine conditions prior to restart.  A discussion of
what to look for in the PDS when analyzing crashes appears in Section V.
.ifi l2h "The KST (Known Segment Table)"
     The remaining per-process data base of the supervisor is
the Known Segment Table, or KST.  Like the PDS and Descriptor Segment,
it is created in a nascent process' process directory, by the 
Initializer, and its SDW placed in a fixed slot in the new descriptor
segment.  In addition to a small amount of header information
describing its contents, the KST contains an array defining
the non-supervisor segment numbers in the process.  It is
managed by address space management in the process, in response
to (sometimes implicit) making-knowns and terminations of segments.
It is used by the segment fault mechanism to resolve segment faults,
which is to say, to find out what segment in the permanent Storage
Hierarchy is meant when the hardware faults on a  missing SDW the
first time a segment is referenced in a process.  More information
about the KST and its format is found in Section V.
.ifi l1h "Stacks"
     The Multics PL/I environment which is used not only by all
Multics user programs, but all parts of the Operating System as well,
requires a LIFO organization of procedure activation records.
This is implemented as a _s_t_a_c_k, a dedicated segment
upon which an array of _s_t_a_c_k _f_r_a_m_e_s, or activation
records, are layed out.  A _s_t_a_c_k _h_e_a_d_e_r resides
at the beginning of the stack segment (see stack_header.incl.pl1),
and not only defines the extent of valid data on the stack, but
contains pointers needed by the PL/I program environment to locate
critical data (for example, the LOT (Linkage Offset Table), which
locates the linkage section for each procedure in a process).
.spb 2
     Stack frames contain a fixed header (see below), which includes
threads to the previous and next stack frame.  The stack header
defines where the first frame is, or is to be placed (stacks may
be empty) via the pointer stack_header.stack_begin_ptr, and where
the next frame is to be placed (stack_header.stack_end_ptr).
.spb 2
     A process has one stack segment for each ring it has ever used,
other than Ring Zero.  The obviously crucial case of Ring Zero
will be dealt with shortly.  A set of segment numbers defined by
the hardware value of dbr.stack_base (this times eight to that plus seven)
defines the stack segment numbers of a process (currently, this is
the same for all processes, and is a number over 200 octal).  When
the hardware crosses into an inner ring, Pointer Register
7 (SB) is loaded with a pointer to the base of the appropriate stack
segment for the target ring, this segment number being computed
from dbr.stack.  The first time the process tries to reference a segment
number in this range, the segment fault handler (seg_fault) notices this,
(from a null pointer in the relevant element of the array pds$stacks)
creates a stack segment  and initializes it properly.  Stacks
are normal Storage System segments named stack_1, stack_2 etc. (per ring)
in the process directory of the process to which they belong.
.spb 2
     A variety of segments of different types are used by the supervisor
as stacks in ring zero.  This is because unusual requirements,
resulting from the nature of what it is that the supervisor does, are
placed upon such segments.  For instance, at the time
a page fault is taken, a stack which is resident in main memory is
needed.  At the time the page fault occurs, it is too late to
"wire down" (assure main memory residency dynamically for) the
segment to be used as a stack (what shall we use for a stack during
this latter proposed operation?).  Thus, there must be a wired
(resident in main memory) stack for each processor
.bbf
Only a  processor can take a page fault: processes that are not
currently on a processor can't take page faults.   Thus, with
a bit of cleverness, only one "wired" stack per processor is needed,
which requires us not to lose a processor, or put process state-information
in it, while using it.  Note that the very early versions of Multics
indeed had a wired stack per process.
.bef
.spb 2
     The same argument applies to segment faults.  At the time a segment
fault occurs, the supervisor needs a segment to use as a stack
during segment fault handling.  It cannot deal with a stack
segment which may not be active: what shall it run on as a stack
while activating _t_h_a_t?   So it would seem that we need an "always-active"
segment per process, for use as a stack too.
.spb 2
     The argument stops there.  There are basically two ring zero stacks
needed by a process, one which is simply guaranteed active and present
at all times, and one which is additionally guaranteed to be resident
in main memory.   (Were consumption of main memory not an issue, these
two could as well be one, totally wired in main memory).  As to
the rest of the PL/I environment, such as the linkage sections and
data bases of Ring Zero, these things are dealt with on a per-system
basis, and do not require the dynamic treatment afforded stacks.
.ifi l2h "The PRDS as Stack"
     The PRDS, the per-processor data segment, is multiplexed for
use as a stack by page fault and interrupt handlers.  This is because
it is both per-processor and guaranteed fully resident in main memory.
This makes it ideal for such use.  All of the per-processor data items
in the PRDS of a processor reside in a region between the stack header
and the place where the first frame would go (as pointed to by
prds|stack_header.stack_begin_ptr).
.spb 2
     Since the PRDS is a per-processor resource, those programs that
use it must ensure that the binding between process and processor
of the process in which they are running does not change while
they are using it: thus, interrupts occur, set up stack frames
on the PRDS (the interrupt interceptor module, ii, does this),
are processed, and return, abandoning the PRDS stack history.
Similarly page faults.  To ensure this condition,
as well as for other reasons, procedures using the PRDS as a stack
may not take page faults, and must mask against interrupts.
.bbf
Note that timer runouts and connects are not only acceptable,
but necessary, if page control is to be able to manage the
associative memory, processors reconfigured, etc.  Connect
faults and timer runouts received while "running on the PRDS"
are simply a case of "timer runouts and connect faults
received while running in ring zero", and not a special case at that.
.bef
.spb 2
     The case of traffic control's use of the PRDS as a stack is
a special one.  There are two cases of interest, those entries
to traffic control which can give up the processor (e.g.,
block, wait) and all others.   In the latter case, the traffic
controller entry mechanism "switches" to the PRDS as a stack
.bbf
As a matter of fact, the name of the traffic controller, pxss,
originally stood for "process exchange switch stack": there
was another module named "px" (process exchange) to which it interfaced.
.bef
as a cheaper technique for what is usually accomplished by stack-wiring
(see below).  As a price for this optimization, it must be coded in ALM
and go through great pains to ensure that it can reference
its arguments without taking page faults after it has switched stacks.
(this is the purpose of the PDS "variables" pds$arg_1, pds$arg_2, etc.)
It must go through even greater pains to ensure the semantic validity
of its arguments because of the technique just described.
.spb 2
     In the case of those entries that _c_a_n give up the processor, the traffic
controller encodes the state of the process in a slew of variables
(apte.savex7, pds$last_sp, pds$pc_call, pds$page_fault_data,
pds$ipc_block_return), making sure that _n_o _p_a_r_t _o_f _t_h_e _s_t_a_t_e
_o_f _t_h_e _p_r_o_c_e_s_s is embedded in the PRDS stack history.  This enables the
traffic controller to switch into another process, place the SDW for the PRDS
it was using in the new process' DSEG (thus ensuring the continuity of the
PRDS/processor association), and continue using that _p_r_o_c_e_s_s_o_r's stack history
on the PRDS.
.spb 2
     The case of page control abandoning its stack history on the PRDS
is a subcase of the traffic controller PRDS policy described above.
.spb 2
      The length of the PRDS (all PRDS's) is fixed at bootload time.
Occasionally, it is "not enough".  Given the Site-dependent
responsibility of writing communications demultiplexers, which
can consume arbitrary extents of stack at interrupt time,
a site can easily encounter "boundsfaults on the PRDS"
(which will be detected by fim and crash the system with a BOS-reported
message).  Occasionally pathlogical call sequences in Ring Zero
can cause this, too.  The proper solution to such an occurence
is modification of the MST header and
tc_data$prds_length.
.ifi l2h "Paged Ring Zero Stacks"
     Segments to be used by the supervisor as stacks when in Ring Zero,
and there is no wiredness requirement, must still not
be subject to segment faults, as discussed above.  Before Release 8.0,
the PDS had a region used as a stack in this way, for it satisfied
the requirements in this regard.  However, the need for these
segments to be maintained active, as well as the paging
implications of paging new PDS's in when processes became
eligible, let to the design of the current strategy, for performance
reasons.
.spb 2
     A group of segments, named stack_0.nnn, nnn from 1 to the largest
number (tc_data.max_max_eligible) of processes allowed to be 
simultaneously eligible during the bootload, is created in >system_library_1
at initialization time.  They are forced active, and their SDW's
placed in the segment stack_0_data.  They are assigned to processes
as processes become eligible, and taken back as eligibility is lost.
As the free list is a LIFO stack, the chances of a stack just
freed being largely paged-in when it is given to a new process
are intentionally high.
.spb 2
     This policy relies upon the fact that only eligible processes
can have a ring zero stack history.  While it is true that only
_r_u_n_n_i_n_g processes can have a processor, or take page faults,
and need a PRDS, processes which are _w_a_i_t_i_n_g on page
faults in Ring Zero, or locks, may well have a Ring Zero stack
history, and not be running.  Except for the Syserr Log Daemon
process (which must be special-cased, discussed below), processes
do not call the traffic controller "block" entry except from
outer rings.  Processes running in Ring Zero cannot be pre-empted
(wired_fim delays a potential pre-empt via the Ring Alarm register
in this case).  Thus, with the noted exception, it is impossible for
a process running in ring zero to lose eligibity.  Therefore,
only eligible processes have Ring Zero stack histories, and thus,
only eligible processes need Ring Zero stacks.
.spb 2
     When a process is given eligibility, a usable SDW for a stack_0
is placed in its descriptor segment in the slot that the hardware
assigns to the Ring Zero stack, as defined by dbr.stack (see above).
Processes abandoning stacks at eligibility loss time, or terminating,
have a responsibility to reset these stacks appropriately.  The
header information in the stack_0 segments is valid across all processes.
.spb 2
     The processing of calls to hcs_$block must be complex to allow
this trick to work, i.e., the abandonment of the stack_0 and its
return to the pool when the traffic controller goes blocked.
The target of the gate hcs_$block is fast_hc_ipc, an ALM
program that does not use stack_0 at all to do its processing.  If, when
it so determines from its various PDS flags, it must call hc_ipc_,
a large PL/I program that copies ITT messages to outer rings,
it establishes a stack frame on the process' stack_0, calls
this utility subroutine, and upon return, reverts to its
previous state of not using a stack.  When it actually calls the
traffic controller block entry, it has no stack history on stack_0,
and the traffic controller is aware of this.  The return of the
traffic controller upon wakeup is to a prearranged location
in fast_hc_ipc.
.ifi l2h "Stack Wiring"
     Often, supervisor programs have to  call page control, or
other programs that lock very-low-level locks, for services.
These programs are always running using the "Ring Zero Stack"
just described.  Page control, or whatever is being called,
must ensure that it will not take page faults (or otherwise
lose the processor) while it has its low-level locks locked.
As a consequence, the region of the Ring Zero stack to be used
as a stack during these activities must be temporarily wired.
.spb 2
     For this reason, privileged_mode_ut$wire_and_mask is provided.  When
called, by some procedure which is about to lock a critical lock, it attempts
to wire the current and next three pages of the current
stack, and page 0, which contains the stack header (which will be needed
during call, return, and entry sequences. It also "masks the current
process against interrupts"
.bbf
This means, "set to zeroes that Interrupt Mask of the bootload
memory (which receives all system interrupts) that is directed
at the processor we are running on".  The smcm instruction does this.
As this mask is read by the traffic controller before a process
loses a processor, and stored and set back when the process
ispicked up, an effective "process mask" is maintained.  For
processors which have no masks assigned (and thus cannot process
interrupts), a simulation of mask processing is accomplished via
ldaq/staq instructions into prds$simulated_mask.
Executing the smcm/rmcm or ldaq/staq in a processor dependent way
is accomplished by xec'ing an array of processor-dependent
instructions in the SCS.
.bef
Wire_and_mask avoids wiring and masking if it senses that Page
0 of its current stack is already wired; this facilitates
recursive use.  It also avoids any action if it senses that
it was invoked while the PRDS was being used as a stack.
Wire_and_mask encodes its state in two arguments that it expects
to be passed back to pmut$unwire_unmask, which undo this action.
If unwire_unmask senses that  wire_and_mask did nothing for the
reasons just discussed, it too will do nothing.
.spb 2
     The technique used by wire_and_mask to wire is simply
turning on the ptw.wired bits of the stack's PTW's and touching
the pages.  Page control is aware that this can happen without
the page-table lock being locked; there is no problem here (see AN61
for more detail on why this is valid).
.spb 2
     The amount of stack wired by wire_and_mask must be known
at the time it is called.  Since no generally-written program can
know who its ultimate callees will be, this number is a system
constant, currently 4.  If, via some pathological
call sequence, or bug, or site modification, more pages are found
to be needed in the midst of processing, nothing can be done,
and the system will crash, either with a page control MYLOCK
ON GLOBAL LOCK or miscellaenous esoteric hangs.  If,
in analyzing a crash, the system is found to be hanging, or all
processes waiting,and some eligible process' pds$page_fault_data
implicates priviliged_mode_ut, this may be your problem.  The
fixed number has grown over the years; there is no way to
determine it dynamically.
.ifi l2h "Initialization/Shutdown Stack"
     Bootload uses the supervisor segment inzr_stk0
.bbf
The cryptic acronym which clearly means "initializer_stack_0"
is this way because of the bootstrap1 name-finding mechanism.
Segments that must be located by bootstrap1 must have names unique to 8
characters.
.bef
as a stack.  It cannot use stack_0's, for they are not created until
late in the bootload (when the Storage System is fully operative).
The Initializer process uses inzr_stk0 until it comes out to
Ring One, and then never uses it again.  Of course, during
page faults and interrupts in the course of initialization,
the PRDS is used as normal, for the stack-switching mechanism
are operative.
.spb 2
     The segment inzr_stk0 is kept for use at emergency shutdown time.
When the system is re-entered from BOS for Emergency Shutdown,
the PRDS of the processor which re-entered Multics is used
as a stack while the I/O mechanism and page control are
reinitialized and consisitentized.  No page faults are taken during
this processing.  When Emergency Shutdown must enter the Paged
Environment, inzr_stk0 is switched to as a stack.  It is
used because it is always there (i.e., at all phases of
initialization and system running), and has an SDW (and identical
segment number) in all processes, as it is a supervisor segment.
.spb 2
     Privileged_mode_ut$wire_and_mask operates upon inzr_stk0
in the same way as it does upon stack_0 segments; it sees no distinction.
.ifi l2h "Syserr Daemon Stack"
     As the Ring Zero syserr-log Daemon
.bbf
The process which picks up syserr messages from the wired syserr data
segments and places them in the paged syserr log.  It is woken
up by syserr when messages are placed in the wired syserr data
segment.  Since wired and masked code can call syserr,
the caller cannot in general touch the paged syserr log, and
hence, the Daemon.
.bef
indeed calls pxss$block during its processing, it cannot
lose its ring zero stack when it loses eligibility.  Thus,
it does not participate in the stack-sharing mechanism, and
uses the supervisor segment syserr_daemon_stack as a stack.
The bit apte.hproc informs the traffic controller not to
allow it to participate in stack sharing.
.ifi l2h "Treatment of Stacks in Idle Processes"
     Idle processes have no stack.
.spb 2
     Idle processes have no call side. They take no segment or
linkage faults.  Thus, they do not need a ring zero stack.
When they take interrupts, they use the PRDS as a stack,
as do other processes.  When not processing interrupts, a null
pointer value is maintained in SP, the Stack Pointer Register.
.ifi l1h "Stack Frames"
     The principal and most important artifact of a stack is the stack frame.
The stack frame contains the automatic variables and call-out return
point of an active procedure.  All PL/I programs "push"
(establish) stack frames on the stack in use at the time they are called,
when they are called, and "pop" (abandon) them when they
return.  Some ALM programs push and pop stack frames, but others do not.
Those ALM programs that have no automatic variables and do not
call other programs via standard call need no stack frames.
Traffic control and page control have a complex stack-frame
sharing discipline: see AN61 for more detail on this.
.spb 2
     The first 50 (octal) words are reserved in each stack frame (see the
Subsystem Writer's Guide for the layout thereof).  The first sixteen words are
used by ALM programs making "full standard calls" to other programs to save
the pointer registers.  The ALM "call" macro saves the pointer registers here
and restores them upon return.  An ALM program need not save its pointer
registers if it cares not to; such a program uses the "short_call" macro
instead.  Since PL/I in general does not depend upon registers across a call,
it does not save all the pointer registers; PL/I uses words 0 through 17
(octal) for internal working storage of the PL/I runtime and operators.
Locations 40-47 (octal) is used similarly by ALM programs executing "full
calls" to store the A, Q and index registers, to be restored upon return.
PL/I does not use this area (stack_frame.regs) at all.
.spb 2
     Words 20 through 23 (octal) contain the pointers that thread the stack
frames together on a stack in a forward and backward manner.  One can start
from the stack begin pointer in the stack header and using the forward pointer
in each frame, it is possible to "trace" the stack.  Similarly, starting at
any given stack frame, it is possible to trace forward or backward using the
forward or backward pointers in each stack frame.  The stack header pointers
(stack_header.stack_begin_ptr and stack_header.stack_end_ptr, respectively)
define where the first frame lies and where the next available place
for a frame to be laid down are.
.spb 2
     Pointers in stack frames (the standard part thereof) are constrained to
be standard ITS pointers.  In the case of some special frames, bits 24-29 of
stack_frame.prev_sp are used for flags.  Listed below are some of the more
important ones to notice:
.spb 2
.inl 22
.unl 22
BIT        OCTAL,dl   Meaning, if O_N_
.spf
.unl 22
24         4000       This frame belongs to signal_
.unl 22
26         1000       The _n_e_x_t frame was laid down by signaller.
.unl 22
28         0200       This frame belongs to an environment support procedure.
.unl 22
29         0100       At least one condition is established in this frame.
.inl 0
.spb 2
All of the stack frame flags may be found in the Subsystem Writer's
guide.
.spb 2
     At location 24 (octal) a program stores the address to which it
expects to be returned control when it calls another program.
This quantity (stack_frame.return_ptr) is only valid if the program
had called out to another program at the time the dump was taken,
and is not even meaningful if the program has never called another program.
The PL/I entry operator initializes this variable to  a pointer
to the base of  a procedure's text segment at procedure
entry time; this case can be recognized by  a pointer to location
zero of a segment here.  When tracing stacks to determine
call histories, this quantity is of primal significance.
By interpreting this quantity, i.e., identifying the segment
pointed to, and resolving the pointer with respect to the
bindmap of the appropriate version of the segment, the
problem analyst can identify the particular statement at which
each active procedure was exited.
.spb 2
     At location 26 (octal) is stack_frame.entry_ptr, which is set by the PL/I
and ALM entry sequences to point to the actual entry point which was called.
By resolving these pointers with respect to the linkinfo's of segments they
describe (the online interative dump analysis tools do this automatically),
the identities of procedures and entrypoints active at the time of the crash
may be determined.  The analyst should be aware that some ALM programs do NOT
set this field, in particular, those that do not push a stack frame.  While
stack_frame.entry_ptr is to be considered invaluable for identifying PL/I
procedures active, it is _n_o_t to be considered authoritative; only
stack_frame.return_ptr can be relied upon.
.spb 2
     Location 32 (octal) (stack_frame.op_ptr)
contains a pointer to the operator segment for most translators.  However, ALM
programs use this double word as a place to store the linkage section pointer.
When an ALM program does a call, the call operator reloads pointer register
4 (the linkage section pointer) from this location (it is saved there by the
ALM push operator).  The reason it is reloaded before calling is in case an
ALM program is calling a Version 1 PL/I program that is bound into the same
segment as the ALM program.  In this case, the standard Version 1 entry
sequence that loads the linkage section pointer register is not invoked so
that the ALM program must ensure that it is correct.  When Version 1 PL/I
programs cease to exist, this will no longer be a requirement.
.spb 2
     Following the entry pointer is a pointer to the argument list of the
procedure that owns the stack frame.  The format of an argument list is
discussed below.
.spb 2
     At location 36 octal in the stack frame are two 18-bit
relative addresses.  These addresses are relative to the base of the stack
frame.  The first relative address points to a series of 16-word on unit
blocks in the stack frame.  Each on unit block contains a 32-character
condition name, a chain pointer, and some flags.  Listed below is the PL/I
declaration for an on unit block:
.spb 2
.inl 5
.fif
.bbk
dcl 1 on_unit based aligned,
    2 name  ptr,
    2 body ptr,
    2 size fixed bin,
    2 next bit (18) unaligned,
    2 flags unaligned,
      3 pl1_snap bit (1) unaligned,
      3 pl1_system bit (1) unaligned,
      3 pad bit (16) unaligned,
    2 file ptr;
.bek
.fin
.inl
.spb 2
     At location 31 in the stack frame header is a word known as the operator
return offset.  In fact, this word really consists of two 18-bit halves.  The
left-most 18 bits contain the translator ID.  This is a number that tells what
translator compiled the program that owns the stack frame.  The various IDs
are as follows:
.spb 2
.inl 5
.fif
0   Version 2 PL/I
1   ALM
2   Version 1 PL/I
3   signal caller frame
4   signaller frame
.fin
.inl
.spb 2
The right half of the word holds the address in the program at which a called
operator returns.  This is useful in debugging, for it describes the return
address for certain calls to pl1_operators_.  If a crash occurs and the
machine conditions show that some fault or interrupt was taken by
pl1_operators_, XO contains the return address for the operator that took the
fault or interrupt.  If the operator was forced to use XO, then the operator
return pointer in the stack frame contains the return address.  If this cell is
nonzero, it contains the return address.  If zero, XO contains the return
address.  Given this, one can look at the program that called the operator to
determine why the fault occured.  This applies only to PL/I (and FORTRAN)
programs.
.spb 2
     It is to be noted that PL/I programs reserve the first 100 octal
(64 decimal) words of the stack frame for use by the operators and runtime.
Automatic variables for PL/I programs begin at location 100 of the stack
frame, and all PL/I programs have a minimum stack frame size of 100 (octal)
words.
.ifi l1h "Argument Lists"
     An _a_r_g_u_m_e_n_t _l_i_s_t is an array of pointers (to
locations in a process), set up by one procedure when it calls
another, specifying the addresses of the _a_r_g_u_m_e_n_t_s
of the first procedure to the second, which are also known
as the _p_a_r_a_m_e_t_e_r_s of the second procedure.  PL/I,
which provides the basic runtime semantics
of Multics, specifies  call-by-reference as an argument-passing
methodology, and thus, a procedure's parameters are referenced via indirection
through elements of an argument list prepared by its caller.
.spb 2
     Every standard Multics call must construct a standard argument list.
All PL/I code in the supervisor calling PL/I or other code, or
any code at all calling a PL/I program, uses the standard Multics call.
The only time non-standard calling sequences are used are within
complex ALM-coded subsystems in the supervisor, such as page
control and traffic control.  PL/I's calling of "quick" internal
procedures is also non-standard, but standard argument lists are
used in such calls.
.spb 2
     A standard argument list begins on an even word boundary.
It consists of a two-word _a_r_g_u_m_e_n_t _l_i_s_t _h_e_a_d_e_r,
followed by a pointer to each argument, in order, spaced two
words apart.  The address of the pointer to the n'th argument,
where the first argument is number 1, is thus at an offset
of 2*n from the address of the argument list.  The
_a_r_g_u_m_e_n_t _p_o_i_n_t_e_rs may thus be two-word or single-word
pointers, although by far the most common case is that of
two-word ITS pointers.  Usually, argument lists and pointers are prepared
"on the fly" at the time of procedure call, but applicable
PL/I internal calls, as well as some calls in ALM-coded subsystems,
take advantage of the ability to specify single-word and ITP
pointers to present constant argument lists, which do
not contain ITS pointers.   Argument (and descriptor, see below)
pointers _a_r_e constrained to be indirectable-through by
the hardware: they must be valid indirect address words of any form.
Thus, unaligned (packed) pointers may not be used in an
argument list.
.spb 2
     The argument list header  gives the number of arguments in
the argument list it heads: twice this number (being the offset to the
last argument) is stored in the uppermost halfword thereof.
The argument list header also specifies if _a_r_g_u_m_e_n_t
_d_e_s_c_r_i_p_t_o_r_s are provided in the argument list.
When parametric-length strings (e.g., char!(*)) or adjustable
arrays are passed, or an "options!(variable)" entry is called,
the callee must have sufficient information to determine at runtime
the extents (or lengths, and sometimes data-types) of its parameters.
If descriptors are passed, the argument list header, in the
upper halfword of its second word, contains twice the number
of descriptor pointers: the descriptor pointers follow the
argument pointers, and are subject to the same restrictions.
The only useful datum to bear in mind about descriptors
is that the low halfword of descriptors for strings tells
the length of the string.
When a PL/I internal entry (an entry point of a non-quick
internal procedure) is called, the argument list
includes a quantity known as the _d_i_s_p_l_a_y _p_o_i_n_t_e_r
after the argument pointers, but before the descriptor pointers
(if any).  This quantity is a pointer to the correct
active stack frame of the procedure's containing block,
.bbf
A PL/I internal procedure can only be called from the block
(procedure or begin block) in which it appears, or from some block
contained in that block, or via an entry variable.  In
the first two cases, these are the only blocks in a program
(or set of programs) in which the name of the procedure is known
(it is not declared, and thus undefined in other blocks).
Thus, in the first case, the stack frame of the calling block
is the "correct" stack frame.  In the second case, inductively,
a contained block of the procedure's containing block
was active, and this implies that the containing block was active:
the stack frame of the containing block that called the procedure
that called the procedure.. etc., is the "correct one". In the third case,
the entry variable had to be assigned a value in an environment
such as either of the above cases, and can only be used validly
when the block that assigned it is active: the
correct display pointer is embedded at "closure time" (i.e.,
assignment to the entry variable) in the value of the entry variable.
.bef
and allows it to access the variables of its lexically containing
blocks.  If a display pointer (also called an _e_n_v_i_r_o_n_m_e_n_t
_p_o_i_n_t_e_r) is provided, the last halfword of the argument list
header contains a 10 (octal).
.spb 2
     The format of the argument list, as well as the format of the
header and all descriptors, may be seen in the Multics Subsystem
Writer's Guide, Order #AK92.
.spb 2
     When one procedure calls another, Pointer Register 0 (AP)
is set to the address of the argument list.  In standard Multics calling,
a null argument list (a header specifying no arguments) must
be passed if there are no arguments.  The callee has the
responsibility of saving the argument pointer for his own
use if he so requires; if the callee sets up a stack frame,
the argument pointer (the address of the argument list
with which he was called) is stored in location 32
(stack_frame.arg_ptr) of the callee's stack frame.   In
future references to arguments, the callee is not constrained
to use pointer register 0: PL/I programs never do, and tend
to use PR4 for this purpose (PR0 being reserved for pointing
to the PL/I Operator transfer vector.)
.spb 2
     When tracing call histories in ring zero, during crash analysis,
(or for that matter, when tracing _a_n_y call histories in Multics)
argument lists for each procedure active at the time should be inspected
routinely.  The arguments to each active procedure should be
determined (marking them in colored pen or circling them on a
paper dump helps; the online dump analysis tools will print
out argument lists of procedures active in a dump).  Often,
simple inspection of character strings appearing as arguments,
or pointers, identifies pathnames of relevant objects,
entries in system tables, and so forth.  A skilled dump reader
and problem analyst can often infer what objects were passed
for action to relevant system routines simply by looking at
the arguments, and knowing the name of the called routine
(sometimes even the latter is not necessary): it is rarely
necessary for a skilled problem analyst to refer to listings
to ascertain the identities of system objects implicated in
call histories.
.ifi l1h "Machine Conditions"
     Whenever any type of fault or interrupt occurs, the state of the
processor is saved.  This involves saving all live registers and the state of
the Control Unit.  The 48-word block in which these registers
are stored completely encodes the state of the interrupted/faulted process.
It has a standard format (see Figure 2-1 below), and is known
as a set of "Machine Conditions".
.spb 2
     There are five areas in a set of machine conditions: sixteen
words of pointer registers stored by the SPRI instruction, eight
words of A/Q/index/Exponent/Timer/RALR stored by the SREG
instruction, eight words of "SCU Data", eight words of EIS "pointer
and length" information, and eight words of non-register
software information, which includes a 52-bit clock fault time in its last two
words.  By far the most important item is the SCU data, which encodes
the state of the instruction which was being executed, as well as identifying
_w_h_i_c_h instruction was being executed (i.e., the PSR/IC (segment number and word
number of the instruction counter)).
.spb 2
     When inspecting machine conditions, look at once at the
second word of the SCU data: the low two digits specify the fault code.
Here are some common fault codes as they appear in SCU data
in machine conditions.  The complete list may be found in the
Hardware/Software Formats manual (Order #AN87):
.spf
.bbk
.inl +13
.unl 13
     21      Connect, and thus pre-empt, crash, etc.
.unl 13
     23	   Parity
.unl 13
     25      Illegal Procedure, including illegal opcode, illegal modifier, etc.
.unl 13
     27	   Op Not complete
.unl 13
     30	   IOM terminate interrupt.
.unl 13
     41	   DF0 (Segment fault)
.unl 13
     43	   DF1 (Page fault)
.unl 13
     51	   Access violation, including boundsfault, ring violations, and ring alarm faults.
.unl 13
     61      FT2 (Linkage fault)
.unl 13
     77	   Trouble fault (hardware error in fault vector)
.inl -13
.bek
.spb 2
     The next quantity in the Machine Conditions which should be inspected
is the PRR (procedure ring register), the first three bits of the
first word of the SCU data.  This tells in what ring the process which
took the fault or interrupt was executing.  Faults other than
page faults (or segment faults on directories) taken in ring zero
are to be viewed with some suspicion.
.spb 2
     The identity of the procedure taking the fault, or procedure that was
interrupted, is the next most important quantity, as it will ultimately direct
the analyst to the correct listing.  The segment number of this
procedure is found in the second through sixth octal digits of the
SCU data (mc.scu.psr); the location counter (instruction counter, IC)
is the upper halfword of word 4 of the SCU data (zero-relative).
For a fault, this is usually the address of the faulting instruction
.bbf
More correctly, it is the address of the sequentially-fetched
instruction responsible for causing the execution of the
current instruction.  For instance, in the case of an instruction
executed via an XEC or XED taking a fault, it is the address of the
latter which appears.  See also the note on port-logic faults below.
.bef
and for an interrupt (other than a mid-instruction interrupt in a long
EIS instruction)  it is the address of the next instruction to be fetched.
The identity of the faulting procedure may be determined  by
finding out (from the Descriptor Segment listing or the interactive
dump analysis tools) its segment name, and referring to bindmaps
(which should a_l_w_a_y_s_ be kept on line) to find the correct
program.
.spb 2
     The next most important item is the identity of the segment
faulted on.  This datum appears as scu.tsr, the second through sixth
octal digits of word 2 (zero relative) of the SCU data.  This
quantity is _o_n_l_y _v_a_l_i_d _f_o_r _f_a_u_l_t_s; it
is not meaningful and should not be interrogated in any way in the machine
conditions for an interrupt or group VII
.bbf
Timer runout, connect, or execute.
.bef
fault.  The  Control Unit Computed Address, scu.ca, tells what location
of the segment faulted on was being referenced,
.bbf
Again, this is valid and meaningful when and only when scu.tsr
is valid and meaningful.
.bef
and is stored in the upper half of the fifth (zero-relative) word of the SCU
data.  In the case of boundsfaults, page faults, and linkage faults, the
value of this quantity is central to the determination of appropriate
fault-handling action (i.e., how long, what page, which link).
Faults taken in ring zero (other than page faults) upon ring zero segments
are to be viewed with great suspicion.
.spb 2
     Due to the asynchronous nature of the Level 68 processor, parity, store,
and most classes of command fault are detected by the processor long after it
has finished address preparation of the failing instruction.  Thus, the
segment number and word number being referenced when a parity error (for
example) is detected are _n_o_t _a_v_a_i_l_a_b_l_e to the CPU.  Therefore, the SCU data
for these "port logic" faults cannot be used to deterministically establish
the identity of a segment upon which a parity error has been detected.
What is worse, the overlap of the processor (viz., the fact that
the Control Unit will be preparing the effective address of the
_n_e_x_t instruction (or further) at the time such
errors are detected at the port logic, implies that the
machine conditions for such faults cannot even be used to
deterministically identify the _i_n_s_t_r_u_c_t_i_o_n which
encountered the error, let alone the operand.
.spb 2
     The faults APU (Appending Unit) status bits in the first word of the SCU data
are often of interest when an APU-detected fault (e.g., segment fault,
page fault, boundsfault) has occurred:  they tell whether a PTW
or SDW was being analyzed, a PTW being accessed for EIS
prepage (scu.ptw2), and so forth.  If hardware action
culminating in a segment or page
fault becomes implicated in a system problem, a summary check of the
correspondence between these cycle bits and the PTW'S and SDW's
thought to be involved may be productive.
.bbf
For example, an old class of problems in the pre-4.0 storage system produced
zero-containing PTW's, resulting in "Directed fault 0's" where page faults
whould have occured.  Multics separates page faults from segment faults by
using different directed faults, i.e., 0 for segment fault, 1 for page fault,
_n_o_t by dispatching on APU state in stored fault data.
.bef
.spb 2
     It is to be noted that the SCU data encodes two partially-executed
instructions, the _c_u_r_r_e_n_t and _o_d_d instructions.
The Level 68 processor fetches two words at a time in normal sequential
instruction fetch, an even/odd pair.  As the CPU Control Unit
(that portion of the processor responsible for indirection and indexing,
among other things) proceeds with the effective address development of 
an instruction, it modifies its holding registers for the "active
instruction".  The "odd instruction" is held in another register
until it is ready to be decoded when it becomes the "active instruction".
.bbf
It is at this time that the CPU sends out its request for the next
two instructions.
.bef
Thus when a fault is taken, the active instruction (in the control
unit) is saved in the machine conditions, and the odd instruction,
which has not yet been decoded, and words 6 and 7 (zero-relative)
of the SCU data may be seen to contain these two instructions.
Note that word 6 may show the active instruction in altered
form if address modification has taken place. Note further that
if a fault occurs during the address development of the odd instruction,
the same instruction will be reflected in words 6 and 7.
.bbf
Although the processor could have been designed to store only one instruction,
the intricacies of the XED and RPD instructions require that two instructions
be stored at fault time and reloaded into the control unit at
restart time.  In the case of these two instructions, an
actual even/odd pair is always stored by the control unit.
.bef
.spb 2
     The pointer registers in machine conditions are stored (by the SPRI
instruction) as eight two-word ITS pairs.  Along with the SCU data
and registers stored by SREG, these data items are invaluable in determining
the state of faulting programs by the standard machine-level debugging
paradigm.  Beyond this, one should _a_l_w_a_y_s inspect the value of the sixth
(zero relative, words 14 and 15 octal) pointer-register image, that
of SP (PR6), the stack pointer register when looking at machine
conditions.  The value of this register identifies the segment being
used as a stack (see "Stacks" above), and the location of the stack
frame of the faulted or interrupted procedure.  Sometimes erroneous
values detected here are potent clues to the cause of a crash;
various forms of environment failure can lead to such symptoms.
If ALM code was interrupted, Pointer Register 7 (SB, stack base),
should correspond to that segment as well.  The ring fields of the
pointer registers (bits 18-21 of the first word of the stored ITS pair)
should also be checked if access violations are implicated.
.spb 2
     The "registers" (stored by the SREG instruction), following the
pointer registers, are not often of interest, except in
the machine-lanugage analysis of failing programs.  In the
ALM code of page control, (See AN61, Chapter 8) and in traffic
control, the index registers are extremely significant, and
are necessary to trace call flow as well as identify relevant
data objects.  The ring alarm is sometimes also of interest
in potential appending unit problems.
.spb 2
     The EIS pointer and length information encodes the state of
Decimal Unit (which is responsible for decimal, character, and bit
instructions), and the state of the address preparation of
long Decimal Unit instructions when they are interrupted or
encounter faults (e.g., page or bounds-faults).  Multics
software does not look at or interpret the contents of this
data in any way, and it is thus rarely interesting.  Its
format, for those interested, may be seen in AN87, the
Hardware and Software Formats PLM.  Although the
EIS data need not be saved and restored unless an
EIS instruction was interrupted (as evidenced by the
bit scu.mif being on  in the SCU data), proper resetting
of the CPU fault-taking mechanism requires that the SPL
instruction (which stores this data at fault/interrupt time) be
executed regardless.  However, Multics software occasionally optimizes
out reloading of this data if scu.mif is not on.
.spb 2
     The software information in the last eight words of machine
conditions contains several interesting items.  The 72-bit
cell mc.mask contains the System Controller Mask value
associated with the process at the time of the fault or interrupt;
in cases of suspicious interrupts (e.g., interrupts
recognized when interrupts are not to be taken, as during
page control), this can be of great interest.  The fault
software stores a 52-bit clock time in the last words of the software data:
this is often useful for sorting out the time-sequence of events
leading to a system failure.  The "error code" (mc.errcode)
is never stored at fault time, but is placed in the machine conditions
by system fault handlers (see the discussion of system and user faults below),
and specifies (via an error_table_ code) the system's disposition
(which is always some subcase of "unsuccessful" if this code be
non-zero) of the handling of the fault.
.ifi l1h "Finding the Right Fault Data"
     There are several "machine condition areas" where machine conditions
are stored.  These locations are either per-process or per-processor:
there are no "global" machine condition areas.  The segmentation mechanism
and the notion of process are used to direct fault data to unique
places.  For instance, the fault vector location for page faults
(directed fault 1) indicates that SCU data is to be stored at
(for example) location 0 of segment 56.  Since segment 56
in one process is not necessarily the same as segment 56 in another process,
two processes (and thus, two processors, for it is _c_r_i_t_i_c_a_l and
definitional that no two processors can ever be in the
same process at the same time) can take page faults and store
fault data at 56|0 _s_i_m_u_l_t_a_n_e_o_u_s_l_y.   This is
one of the most elegant uses of Multics segmentation to solve what
is a large problem in other operating systems.
.spb 2
    Thus, by design, all machine-condition areas are placed
in the "segments" pds and prds.  We say "segments" because
the pds of one process is not the pds of another, although
the pds's of all processes have the same segment number.
Similarly, there is one prds per processor: the SDW for the
prds is "carried around" by the processor during process-switching,
and inserted into the identically-numbered descriptor segment
slot in each process as the processor enters it.
.spb 2
     There are six normally-used machine-condition areas,
three in each process' PDS and three in each processor's PRDS.
On the PDS we find pds$fim_data, pds$page_fault_data, and pds$signal_data,
and on the PRDS, prds$fim_data, prds$interrupt_data, and
prds$sys_trouble_data.  Among all possible faults
and interrupts, some have their fault vectors (and subsequent
storage of live registers) directed to various members of this set.
The rule for remembering which faults and interrupts get directed
to where is simple: if the exception/interruption is something
that happens to a _p_r_o_c_e_s_s, it is the PDS; if it happens to a _p_r_o_c_e_s_s_o_r, it is
the PRDS.  For example, a page fault happens to a _p_r_o_c_e_s_s.  In the course of
processing  page fault, a process may give up one processor, and be resumed
later by another.  The page fault data encodes the state of the 
process: thus it is stored on the PDS, at pds$page_fault_data.
An interrupt is taken by an arbitrary processor (the system controller
interrupts among processors anonymously).  The processor processes an
interrupt and returns to the interrupted activity without
switching processes.  The interrupt is thus something that
happened to the processor, not the interrupted process, and thus,
_a_l_l interrupts store their machine conditions at prds$interrupt_data.
.spb 2
     From these principles, we may deduce the entire fault/interrupt
distribution scheme of Multics.  Connect faults can be for any
of several process or processor-related reasons, e.g., pre-empt
(a process), clear associative memory or crash (a processor) and
so forth.  Thus, connect faults go to a processor-related place
(prds$fim_data) until it is determined (from flags in the scs)
what the occasion for the connect was.  The stored data
may be moved (by wired_fim, or sys_trouble) to prds$sys_trouble_data
(for a crash), pds$signal_data (to signal QUIT or another IPS
signal in a process) or pds$page_fault_data (when a pre-empting
of a process is intended).  This multiplexing of pds$page_fault_data
relies upon the fact that both page faults and pre-empts remove
a processor from a proces, and both are mutually exclusive.
.spb 2
     Thus, page faults go to pds$page_fault_data.  Timer runouts
go to prds$fim_data.  Connect faults and timer-runouts mutually
exclude each other via use of the inhibit bit, so the multiplexing
of prds$fim_data can succeed.  Pre-empts and timer runouts
detected as occuring in ring zero are "postponed" via ring alarms,
and thus, the exclusivity of page faults (which are
processed in ring zero) and pre-empts is assured.
.spb 2
     All of the remaining faults (e.g., overflow, access violation,
segment and linkage faults, parity faults) deposit their data
at pds$fim_data or pds$signal_data: we will draw the distinction shortly.
These faults are all handled by the module "fim",
the fault interceptor module.  If fim sees these faults when in
an idle process, or while the PRDS is being used as a stack
or the page table lock is locked, an error in the system
is indicated: no meaningful processing of the fault is possible,
and the system is crashed via setting a "magic number" in
scs$sys_trouble_pending (see  Section I) and a connect is sent
to the running processor, causing a crash as described in the
earlier section.  When this is done, fim has loaded Pointer
Register 2 (bp) with the address of fault machine conditions,
and this value of bp will appear in the fault data for the
"sys trouble" connect fault.  When a crash due to a fault in
invalid circumstances (i.e., the above) is suspected,
chasing pointer register 2 from the sys-trouble data
will usually lead immediately to the relevant fault data.
.spb 2
     Faults handled by fim are subdivided into two categories, system and user
faults (hence the name, "bound_system_faults").  System faults are those
faults intended to be handled transparently in ring zero, by the supervisor,
for example, linkage faults and segment faults.  All machine conditions
for systems faults are stored at pds$fim_data.  When fim processes such a
fault, it sets up a stack frame, so that it can call the appropriate
supervisor fault-handling routine.  It also copies the machine conditions from
pds$fim_data into its stack frame before calling any procedure, so that other
system faults can be processed (recursively, thus overwriting pds$fim_data)
while processing this system fault.  In a crash or problem resulting from a
chain of (say) segment fault, one must look for "fim frames" (see S__t_a_c_k
F__r_a_m_e_s, below) in the ring zero stack history, and find fault data there.
Once you have seen fault data enough times, you will be able to identify it in
a dump by sight, without knowing its exact location in a frame or in the fixed
region of the PRDS or PDS.  Assuming all processing succeeds, system fault
processing ends with the fim copying the machine conditions back into
pds$fim_data, and restarting them from there.  Note that this implies the
proper LIFO protocol on recursive system faults.  Note also that no action
while in the fim is allowed to cause a fim-type (system or user, but _n_o_t page,
timer-runout, etc.) fault, for the integrity of pds$fim_data would thus be
violated.  That is why a fim-processed fault detected in the fim is grounds
for immediate process termination.
.spb 2
     User faults are those faults processed by the fim which
are not handled transparently in ring zero, and usually
represent a user error, such as overflow.  In these cases,
the fim causes an action known as "signalling the fault
in the faulting ring".  This means that a PL/I condition will
be signalled to the program executing the faulting instruction.
The type of signal will be derived from the type of fault
(a table in fim defines this mapping).  The technique
for signalling in the faulting ring consists of building
a special stack frame ("a signaller frame") on the stack
being used at the time of the fault (this is determined by
the value of pointer register 6 (sp) in the fault machine conditions),
.bbf
There is an obscure case involving GCOS-emulator BAR
mode where the word-offset of the SP register cannot be trusted,
as the emulated GCOS slave program can destroy this field.
By contract with the mechanism being described, the GCOS
emulator stores the last valid SP value in the bar_mode_sp
field of the stack frame header for just this reason, and this
is utilized if a signallable fault in BAR mode is processed.
.bef
storing the fault machine conditions there, and effecting a transfer
to the PL/I signal-propagating-module (signal_) in the
faulting ring.  The program that performs these actions
is named "signaller", not to be confused with signal_.
.spb 2
     The protocol for invoking signaller (or "the signaller")
consists of placing the "machine conditions to be signalled"
in pds$signal_data, abandoning any regnant Ring Zero stack history,
and transferring (non-returnably) to the signaller$signaller.
The condition name is placed in pds$condition name.
User faults that involve _a_b_s_o_l_u_t_e_l_y no ring zero
processing
.bbf
E.g., store, mme1, derail, directed fault 3.
.bef
store their machine conditions _d_i_r_e_c_t_l_y in pds$signal_data
as an optimization, since nothing can ever become of such a fault
except a signal.
.spb 2
     System faults can turn into user faults if their ring zero
processing is unsuccessful, for instance, a linkage fault
where the segment or entry point cannot be found, or a segment
fault where the segment faulted upon has been deleted.  Parity
errors and other hardware malfunctions
(which are processed in ring zero by a hardware-fault
processing utility, hardware_fault) are always treated as "unsuccessful"
system faults.  When the system fault handler returns to fim, fim
checks the "error code" in the machine conditions (mc.errcode)
and if non-zero, treats the fault as a user fault, signalling it
in the user ring.  Page faults which encounter record_quota_overflow
or page read errors, as well as connects which map into QUITs
(or other IPS's) are also mapped into user faults in this way.
.spb 2
      When analyzing process terminations, or crashes resulting
from process terminations (e.g., "attempt to terminate initializer
process"), repeated or failing signalling of some fault will
usually be found close to the root of the problem.  pds$signal_data
will always contain the machine conditions for the last fault
signalled in the process, and this is a prime candidate for
inspection to find the problem.  Now it is often the case
that some fault is signalled over and over again due to some
problem in stack format, or error in the operator segment,
or similar.  In such cases, pds$signal data will be found to
be relatively uninformative, such as a boundsfault near the
end of the running stack.  In such cases, the stack in
the ring encountering the problem will be laden with
signaller frames, which are readily identified by
the machine conditions they contain.  In these cases,
the signaller frame (and thus the set of machine conditions)
closest to the base of the stack, i.e., the oldest,
will probably indicate the primal cause of the problem.
.spb 2
     Be sure to check the error code (mc.errcode) in machine
conditions for signalled faults:  it tells the Storage
System error which caused the fault to be signalled
instead of restarted.
   



		    s1.an53.compin                  10/22/84  1058.1rew 03/17/81  1459.5      327744


.ifi init_plm "AN53-01"
.srv section "1"					
.ifi l0h "Crash Procedures"				
     This section covers the information necessary to understand how 
Multics crashes (i.e., returns to BOS), how dumps are taken, and how these
dumps are processed.
.ifi l1h "Returning to BOS"
     There are six ways in which Multics can crash.  The first, and most
common, way is for some type of fatal error to be discovered by Multics, and
reported via the syserr (operator's console message) facility.  When this
happens, a message that describes the surface cause of the crash (e.g., LOCK
NOT EQUAL PROCESSID) is typed on the operator's console and the system
"returns to BOS".  This means that all processors other than the "BOS
processor" halt, and the BOS processor transfers control to the BOS toehold.
To effect this reentry, syserr (which typed the message on the operator's
console) calls privileged_mode_ut$bos_and_return.  This program sets the flag
scs$sys_trouble_pending, and issues a connect fault (via a cioc instruction)
at the CPU on which it is running.
.spb 2
     The handler for connect faults is the module wired_fim.  Upon seeing the
flag scs$sys_trouble_pending set, wired_fim perceives a system crash in
progress, and transfers to sys_trouble$sys_trouble.  The ALM module
sys_trouble is charged with the responsibility of organizing crashes and
returns to and from BOS.  All processors will be made to enter this code, and
either stop or enter BOS as appropriate.  The first processor entering
sys_trouble$sys_trouble notices that it is the first processor to do so, (the
flag scs$trouble_flags is used to establish first entry into this code) and
"broadcasts" connect faults to all other processors.
.spb 2
     Processors entering sys_trouble$sys_trouble copy the machine
conditions (see Section 2) of the connect fault from
prds$fim_data (in the per-processor data segment prds) to
prds$sys_trouble_data.   The fault vector for the connect fault stored
the SCU data there, and wired_fim stored the rest of the particular
processor's machine state, before the interpretation of the
connect fault (i.e., "system crash") was determined.  This
interpretation having now been made, the data is moved to
prds$sys_trouble_data.  This data area (see Section 2 for heuristics on
finding it) is of _c_r_u_c_i_a_l importance the Multics crash analyst;
for every process running on a processor at the time of the crash, it tells
_p_r_e_c_i_s_e_l_y where that process was.  In this fashion,
erroneous process behavior can be identified,
potential misinteraction between processes can be understood.
.spb 2
     Each processor entering sys_trouble$sys_trouble determines whether
or not it is the "BOS processor", meaning the one which will actually enter
and execute BOS.  The BOS identity of the BOS processor is
defined by scs$bos_processor_tag.  The "BOS processor" is initially
defined at bootload time as the CPU on which BOS last executed;
however, should this processor be deleted, reconfiguration will
assign the responsibility of being BOS processor to some
other CPU.  One criterion for being the BOS CPU is having
an SCU mask assigned to the potential BOS CPU: SCU-directed
mask-setting instructions in BOS must actually work, and not
fault.  This requires having an actual SCU mask directed at the 
potential BOS CPU.
.spb 2
     The processor which determines it is the BOS CPU proceeds
to enter BOS, as the others execute a DIS instruction in sys_trouble.
The processor which enters BOS need not be the same as the
processor which started the crash.  DIS (Delay until Interrupt
Signal) is an effective HALT instruction.  If a GO
(the BOS command which restarts Multics) is subsequently executed,
connects will be re-broadcast to all CPU's, "interrupting"
them out of their DIS state.  Note how the copying of the machine
conditions to scs$sys_trouble_data was _n_e_c_e_s_s_a_r_y to
prevent this possible subsequent connect fault from overwriting the
first set of machine conditions (the "sys trouble data")!
.spb 2
    The method used to enter BOS is as follows:  the BOS
CPU loops for a while to allow all pending I/O operations to
quiesce.  Since this loop is inhibited, the lockup fault vector
is patched to ignore any lockup faults taken during this loop.
Once the loop is completed, the two instructions, an SCU and a
TRA instruction pair, as picked up from location 4 of the BOS
toehold (absolute location 10004, accessed via the segment
bos_toehold$) are patched into the DERAIL
fault vector.  The previous contents of this fault vector are
saved prior to doing so.  Finally, a DERAIL instruction is
executed.  Since the DERAIL fault vector was patched, the SCU is
done and BOS is entered in absolute mode via the TRA instruction.  
.spb 2
     The second way that Multics can crash and enter BOS is for
an error to occur that cannot be reported by a syserr message on
the operator's console.  These errors are the arrival of an
interrupt while the processor is using the PRDS as a stack, a
page fault while the processor is using the PRDS as a stack,
certain faults while the processor is using the PRDS as a stack,
or in an idle process,
or premature faults during initialization.
In these situations, it is not clear that enough of the
Multics environment in Ring 0 of the erring process is operative to
run the Operator's Console software, a  bulky and complex mechanism
requiring many things to operate properly.  In these
cases, the low-level code of the operating system (usually ALM
code, or some PL/I code in initialization) places a "magic number"
in the low bits of scs$sys_trouble_pending, and intiiates
the "sys trouble (interpretation of) connect" as above.
When sys_trouble sees this non-zero number, the BOS processor copies
the correct canned message (see the listing of sys_trouble for
an enumeration of them) into flagbox.message, and sets the bit flagbox.alert
in the "BOS flagbox" segment.  BOS will print out this message
(all in upper case: this is a good way of identifying such messages)
under its own control.  When such a crash occurs, the contents of the
flagbox (bos_flagbox$) hold the clue to what happened, and
all the sys_trouble_data's in the PRDS's of all CPU's tell the
whereabouts of the CPU's when the error occurred.  The synchronization
of crashing CPU's and return to BOS are effected as above.
.spb  2
     Another way that BOS can be entered is by an execute fault.  An execute
fault is caused by the depression of the EXECUTE FAULT button on any processor
maintenance panel.  The handler for this fault is wired_fim$xec_fault, and the
fault is treated in exactly the same way as a system trouble connect (which is
to say, the first entering CPU broadcasts connects, etc.) A "magic number" as
above stored in scs$sys_trouble_pending identifies this as the crash reason.
The fault data stored by the execute fault is moved to prds$sys_trouble_data
in the same way as a sys_trouble connect.
.spb 2
     Another way that BOS can be entered is by the manual
execution of an XED instruction directed at location 10000 octal (the BOS
toehold).    This operational tactic is known as "executing the switches".
The two instructions in the toehold are executed by 
placing an inhibited XED 10000 in the processor DATA switches
(octal pattern 010000717200).  The processor on which the switches
are executed indeed returns to BOS at once, oblivious to other
activity in the system.   On a running multi-CPU system, this is a very
dangerous technique for just this reason: it is _c_r_u_c_i_a_l to
place ALL CPU's in MEM STEP before executing the switches.
Recommended operational technique for executing the switches to
return to BOS is given in the following section.
.spb 2
     It should be pointed out that of the last two ways mentioned
for entering BOS, the execute fault is the normal way used to
crash the system.  This would be done for example when it is noticed
that Multics is in a loop.  The execute fault method
ensures that all processors are stopped via system trouble connects
before returning to BOS.  The manual execution of switches
should only be used when running one processor, in a debugging
or development situation, or when an execute fault has appeared
not to work.  If done carefully on a single-CPU system, it
is possible to start Multics again with a BOS GO command after
perhaps doing some patching or dumping of Multics from BOS.  Manual
execution of the switches is to be considered a last resort on a
multi-processor system, and should be performed with the issues
outlined above fully in mind.
.spb 2
     It is also possible to return to BOS in a restartable manner
during initialization, under control of patterns in the processor
DATA switches.  This causes the sending of a sys_trouble connect
as described above.  In particular, bootstrap1 will place
messages directly in the BOS flagbox, and other initialization programs
will call syserr$panic to place messages there.  The syserr mechanism
will place messages there in those phases of initialization executed
before the operator's console (and implicitly, those mechanisms upon
which it relies) is operable.  For more
details see the S__y_s_t_e_m I__n_i_t_i_a_l_i_z_a_t_i_o_n PLM, Order No. AN70.
.spb 2
     The final way that BOS can be entered is via an explicit
call to hphcs_$call_bos, which may be invoked by the Initializer
"bos" command, or certain failures in administrative ring
initialization.  Any sufficiently privileged process can call
this entry from the user ring.  The effect is precisely the
same as if the syserr mechanism (the first case covered above)
had initiated a crash.
.ifi l2h "Technique for Executing the Switches"
     The procedure for executing the switches is described in this section.
.spb 2
     First, A_L_L_ processors are put in MEM STEP.  Then a CPU is chosen to
return to BOS.  If at all possible, this should be the system's choice of BOS
CPU (the original bootload CPU, if it is still configured, and had not been
deleted is best), or some CPU known to have SCU masks assigned via EIMA
switches on the SCU's.  For the reasons outlined above under the discussion of
choice of BOS (bootload) CPU, failure to use a processor with mask will cause
BOS to fault and fail.  Under extreme emergencies, SCU switches can be
reorganized to assign a mask at RTB (return to BOS) time, but the implicit
danger grows.  On a multi-cpu system, there is no restarting (GO) from a
manual execution of switches.
.spb 2
     Now the processor selected to return to BOS, which should be
in MEM STEP, is approached, and the correct configuration (010000717200)
verified in the DATA switches.  The EXECUTE SWITCHES/EXECUTE FAULT
switch is verified to be in the EXECUTE SWITCHES position, and
then the EXECUTE SWITCHES pushbutton is depressed _o_n_c_e.
The STEP button is then depressed several times.  Then the
processor is taken out of MEM step mode and the STEP button is
depressed once more to cause BOS execution to continue.
The console should unlock and appear at BOS command level. If it
does not, the procedure should be repeated.  By the time
this level of desperation is necessary, the chances of a successful
ESD (emergency shutdown) are to be considered slim.
.spb 2
     Due to the design of the Level 68 processor,
manual execution of the switches at an arbitrary stopping point
does not always succeed: there is no way around this.  Success
of executing the switches is indicated by the pattern in the processor
DATA switches appearing in the _t_o_p _r_o_w of the Control Unit
display on the left-hand door of the processor maintenance panel.
Failure is indicated by a mostly-zero pattern appearing.  If you
press "EXECUTE SWITCHES" and see this pattern, allowing continued
execution will _g_u_a_r_a_n_t_e_e that an illegal procedure fault
will follow and overwrite fault data critically needed for analysis.
Should you see this mostly-zero pattern,
_i_n_i_t_i_a_l_i_z_e the CPU (some state will be lost, but not
as much as if you would proceed as before), and re-execute the switches.
.spb 2
     One technique which has been found useful to increase the probability
of successful switch execution is getting the Level 68 Processor
Control unit into an INS FCH (Instruction fetch) state before executing
the switches; after the CPU is placed in MEM STEP, if the INS FCH
light on the control unit state display on the left-hand maintenance
panel is not on, STEP a few times until it is.  T__h_e_n execute
the switches.
.spb 2
     One very dangerous trap, which will lose machine state and possibly
the chance of a successful ESD, is laid by the INITIALIZE CONTROL/INITIALIZE
DATA & CONTROL switch next to the EXECUTE SWITCHES/EXECUTE FAULT
switch on the CPU maintenance panel.  This switch M_U_S_T_ be in the
INITIALIZE CONTROL position at all times.  Should it be in the
other position at the time the switches are executed, dozens of critical
machine registers will have their contents irretrievably replaced by zeros
when the switches are executed.
.ifi l1h "On Entering BOS"
     As perceived by BOS, there are three entries from Multics to BOS.
They are performed by the absolute-mode execution by the BOS processor
of the instruction pairs in locations 0, 2, and 4 of the BOS toehold
(in Release 9.0, locations 10000, 10002, and 10004 absolute).  Location
10000 is intended to be used by manual execution of the switches, as described
above.  Location 10004 is intended to be used by Multics when
returning to BOS under program control, as described above.
It is necessary to have two different entries because the "from
Multics" entry requires incrementing the stored location counter
(which will indicate the location in sys_trouble of the "DERAIL"
instruction described in the previous section) to the instruction
_a_f_t_e_r the derail instruction if Multics is to be successfully
restarted.  The instruction counter stored by executing the switches
should be usable for restarting as is.
.spb 2
     The use of the instruction pair in 10002 will be explained below.
For the while, suffice it to say that it is there to facilitate debugging BOS.
.spb 2
     Upon entry, the toehold writes the first 60000 (octal)
locations of main memory to a fixed location in the BOS partition.  This is to
allow BOS programs to be read in and execute without destroying Multics data.
The BOS main control program (SETUP) and the BOS communications area are read
into this low area of memory, and the BOS command loop is entered.  The
combination of the saved area on disk, the remainder of main memory, and
machine conditions saved by the toehold upon entry constitute (to BOS) an
entity called the "Main Memory Image" (formerly, and still sometimes, called
"Core Image").  Restarting Multics involves copying the region saved on disk
back to main memory (overwriting all of BOS except the toehold) and carefully
restoring the saved machine conditions into the BOS processor.
.spb 2
     A significant  subset of BOS commands (e.g., PATCH, DUMP, FDUMP, GO)
are concerned with inspecting, modifying, or dumping the core image.
The ESD and BOOT commands are actually implemented as commands which
modify the machine state in the core image to a fixed or
partially fixed quantity, and restart the core image (via "chaining"
to the "GO" command).  The BOOT command, for  example, zeroes
the core image, and reads the first record of the MST into location
10000, and sets this as the location counter (among a few other frills)
before GOing.
.spb 2
     Some BOS commands (e.g., SAVE, RESTOR, LOADDM, FWLOAD) require
more main memory (often for buffers) to operate than provided in the
initial main memory area saved upon entry to BOS.  These commands
attempt to obtain an area of main memory usually used for active  Multics data
for their buffers.  If, however, there is an active core image, BOS
utilities will inform the operator of this fact in strong language and
query him as to whether he wishes to proceed.  If he does,  the
Multics core image will be destroyed, and will no longer be
restartable, patchable, dumpable, or capable of emergency shutdown.
The BOS CORE comand (CORE SAVE, CORE RESTOR) can be used to save
and restore the entire core image to/from magnetic tape should it become
essential to run one of these core-image destroying commands
when Multics has not been shut down.  It is recommended that
_s_e_v_e_r_a_l copies (CORE SAVE) be made if you must do this; the
integrity of your file system and possibly many, many hours of
down time hinge on this.
.spb 2
     If the switches are executed while BOS is in control, with
the normal pattern (010000717200), BOS will _n_o_t save its
region of main memory to the BOS partition when the toehold is
re-entered.  This is to say, it will _n_o_t consciously overwrite
a Multics main memory image.  Thus, execution of the switches
may be used to break BOS out of loops, problems, and perceived loops.
This would seem to make it impossible to obtain dumps of BOS;
it is for this reason that the instruction pair in location 10002 exists.
Execution of the switches directed at this instruction pair
_w_i_l_l _a_l_w_a_y_s overwrite the main memory image disk buffer.
(otherwise, the effect of BOS entry via 10002 is identical to entry via
10000).  For this reason, one must be particularly careful not to
leave this dangerous pattern in the DATA switches
after debugging BOS or otherwise using it.  (This pattern can
also be used as a very last resort technique if entry from Multics
via 10000 fails, but full appreciation of this paragraph ought
be kept in mind before so doing).
.ifi l1h "Taking a Dump"
     Taking a Multics Dump means producing a partial snapshot of
the Multics virtual memory, including the data bases of "interesting"
processes, for later analysis, at the time of an entry to BOS.
There are three places one can put such dumps, viz.,
paper, magnetic tape, and disk.
.spb 2
     The first two media are dealt with by the BOS DUMP command (see the MOH,
Order #AM81 for a full usage description).  This command is intended for
developers and system programmers either debugging new hardcore systems,
or in machine room "disaster" situations where  FDUMP, the normal
program for crash dumps (see below) is inoperable.  One enters
the DUMP command from BOS command level.  Its "attention"
is directed at the process in which the processor which entered BOS
was executing (i.e., the DBR value at BOS entry times defines
the default process).  Requests to dump specific segments to the
printer are interpreted relative to that process.  The DBR request
to DUMP (See the MOH for usage) directs DUMP's "attention"
to other processes.  If one has to dump many processes
to analyze a problem, it is usually a good idea to dump the segment
"tc_data" first, and inspect the Active Process Table (APT)
to obtain the DBR values of "interesting"  processes.  The
DBR value is currently at locations 44 and 45 (octal) in the APT entry,
and the Initializer's APT entry is always the first one.  In release
9.0, location 1260 starts the APT array.
.spb 2
     The TAPE request of the DUMP command directs its output
to tape; the default tape drive is Drive 1, see the MOH for
more details.  The DMP355 command can be used to dump
communications processors to printer or tape.
.spb 2
     If you are attempting to analyze a crash (i.e., a return to
BOS) with the DUMP command, you will probably want to know
the state of the processor which entered BOS (see Section
2 for more detail on chasing down crash causes).  The
REG request to DUMP prints this (or puts it on tape).  The
output of the REG request includes an interpreted display
of the descriptor segment identified by the selected process'
DBR: the names of segments (pathnames or SLT names as appropriate)
in the BOS processor's process' address space are displayed in this way.
.spb 2
     You will also want to dump the PRDS's of interesting processes, for they
contain prds$sys_trouble_data, which indicates where the "sys trouble connect"
was encountered.  In a single-cpu system, the only PRDS in the system is the
one currently in the process that returned to BOS, and the request "STACK
PRDS" will dump this out.  (Although we could use "SEG PRDS", the PRDS is also
used as a stack for interrupt and page-fault processing, and if either of
these were an issue at the time of the crash, the DUMP "STACK" request affords
the maximum automatic interpretation available).  For multi-CPU systems, you
must analyze tc_data to find the other "interesting" processes, and request
"SEG PRDS" (or "STACK PRDS") after using the "DBR" request to get there.
.spb 2
     You will also likely want to dump the Process Data Segment (PDS)
of interesting processes; "SEG PDS" in the appropriate processes
will get you the right data.  The Ring 0 stack is almost always
important too, as it tells what the "call side" of the supervisor
(actions in response to hcs_ and other system calls) was doing
in that process.  The request "STACK RING 0" to BOS DUMP
will get you this.  (Segment and linkage fault processing as well
as calls are handled by Ring 0 using the "Ring Zero Stack",
as opposed to the PRDS: the issue as to which stack is appropriate
is rooted in the design of the locking hierarchy, and is not relevant here).
.spb 2
     What constitutes an "interesting process," or an "interesting
segment"?  We can't answer that here.  If you are debugging a new
version of the supervisor, you will likely know what segments
of the supervisor you will want to dump.  If you are dumping
a crash via DUMP for reason of FDUMP failure, in an unknown
situation, dump to tape, and use one of the PROC keywords
to DUMP to produce a large, comprehensive dump.
.spb 2
     Tapes produced by BOS DUMP (or other BOS dumping tools)
can be printed by the Multics print_dump_tape (pdt) command,
described in Section VIII.  The BOS "PRINT" command can also be
used to print such tapes.
.spb 2
     The more usual procedure for taking dumps is to use the BOS
FDUMP (Fast DUMP) command.  FDUMP writes selected segments from
selected processes in the crashed Multics to the DUMP partition
on disk, where online Multics tools may be used to
inspect the "fdump" (as the resulting compound image is called)
or produce a DUMP-like dprintable output file.
The format of the fdump consists of a header followed by
images of segments.  The header describes what segments
(by number and length, processes are dumped sequentially)
have been dumped, and contains the machine conditions from
the BOS core image.  The exact selection of which processes
and which segments are dumped is controlled by the keywords
supplied to the FDUMP command (see the MOH).  The FDUMP
command scans the APT (Active Process Table, located in
the segment tc_data) of the crashed Multics,
and selects processes based on the criteria specified by the
keywords.  Normally, all of the supervisor data bases are dumped,
as well as supervisor data in running processes.  Pure segments
(procedures and fixed data) are never dumped, for they cannot
contain clues to what went wrong. (Now obviously it is conceivable
that severe hardware failure could damage a "pure" segment, but the
extremely rare need to dump them does not validate the vast
expense of dumping them regularly).
.spb 2
     To dump a process, the FDUMP command scans the descriptor segment of that
process and dumps appropriate segments (as selected by the keyword
arguments to the FDUMP command).  FDUMP tries to avoid dumping segments
multiple times, dumping unpaged segments (which must be supervisor data bases)
only in the first process dumped, and maintaining a bit map of the page table
area (the Active Segment Table, or AST, in the segment sst_seg)
for all other segments.  FDUMP calls BOS utilities (APND) to
simulate paging, for not all pages of each segment are in main memory.  Thus,
FDUMP interprets each Page Table Word (PTW) for a segment.  If the fault bit
is off, FDUMP interprets the secondary storage or Paging Device address that
is stored in the PTW and it reads the page from that location and dumps it.
Thus, whole segments (not just "in core portions") are placed in the fdump.
.spb 2
     Figure 1-1 below depicts the layout of the DUMP partition following
execution of the BOS FDUMP and FD355 commands.  Once the FDUMP and/or FD355
commands are executed, standard crash recovery procedures can be initiated
(e.g., Emergency Shutdown (ESD), and reboot).  To process the fdump, the
command copy_dump (described in Section VIII) must be used.  This command uses
the gate hphcs_ and therefore is generally executed by Initializer.SysDaemon,
to determine whether the DUMP partition
contains a valid dump (dump.valid = "1"b--see include file
bos_dump.incl.pl1).  If it does, the Multics fdump is copied into
one or more segments in the directory >dumps.  These segments
have the name date.time.n.erf_no where date is in the form
MMDDYY, time is in the form HHMMSS, n is a number, starting at 0,
incremented by one for each segment of a multi-segment dump,
and erf_no is a number (the "Error Report Form" number, from
Multics' historical past at MIT) incremented
each time an FDUMP is taken, as extracted from dump.erfno.  If there
is a valid FNP dump (dump.valid_355 = "1"b), it is copied into a
segment in >dumps named date.time.0.erf_no.355.  The error report
form number is maintained (incremented each time) by the BOS
FDUMP command.  The  number can be set to a new value (e.g., 1) at
any time by typing FDUMP n where n is the new error report form
(ERF) number (crash number).
.spf 2
.bbk
.fif
   0 ________________________________________ __
     |                                      |   \
     |                                      |    |
     |______________________________________|    |
     |                                      |    |
     |            segment map               |    |
     |                                      |    |
     |______________________________________|    \_ FDUMP header
     |                                      |    /
     |                                      |    |
     |                                      |    |
     |                                      |    |
     |                                      |    |
     |                                      |    |
2000 |______________________________________| __/
     |                                      |
     |                                      |
     |       FNP core image (optional)      |
     |                                      |
     |                                      |
     |______________________________________| __
     |                                      |    \
     |                                      |     |
     |            segment image             |     |
     |                                      |     |
     |                                      |     |  copies of
     |______________________________________|     \_ segments of
     |                                      |     /  processes
     |                                      |     |  dumped
     |            segment image             |     |
     |                                      |     |
     |                                      |     |
     |______________________________________|     |
     |                                      |     |
     |                                      |     |
     .                                      .     .
     .                                      .     .
     .                                      .     .
.fin
.ifi fig "Format of the DUMP Partition"
.bek
.spb 2
.ifi l1h "Dumping the Initializer Process"
     Perceived problems in the Initializer process
(nobody can log in or out, "sac" fails, etc.) leading to crashes
(usually manual crashes via execute fault) require special handling.
The Initializer process is not dumped by FDUMP by default; if you
wish it to be dumped, your site RUNCOM's should include the
INZR keyword] to FDUMP.  This is strongly recommended.
If it does not,
and you are about to crash the system intentionally due to a
perceived Initializer problem, change the console switches
to prevent RUNCOM execution, and perform an appropriate FDUMP
"manually".
.ifi l1h "Processing an Fdump"
     If a crash occurs, the Multics and possibly the FNP core
images are in several segments in the directory >dumps.  There
are two commands that can be used to print these dumps.  One
command, online_dump, or od, is used to print the Multics dump.
The other command, online_dump_355, or od_355, is used to process
the FNP dump.  These command descriptions can be found in Section
VIII.
.spb 2
     If it is desired to merely examine portions of the fdump	
from a terminal, the command ol_dump (see Section VIII) should be used.
Online dump analysis is highly preferable to analysis of
printed octal dumps:  powewrful tools running in the
Multics environment greatly simplify the task of analyzing
the millions of words of data involved.
Paper dump analysis should only be necessary in machine-room
debugging situations where new versions of the supervisor
are being debugged, or hardware failure is involved.
.ifi l1h "Examination of Registers"
     The first block of information available in either an fdump or a dump
printed by the BOS DUMP command is the state of various processor registers,
excerpted from the BOS machine image.  These will be the registers of the
processor which actually returned to BOS.  On a multi-CPU system, this may
well _n_o_t be the processor which encountered whatever problem caused the crash.
.spb 2
     The first register of interest is the Procedure Segment Register (PSR).
The PSR contains the segment number of the procedure that actually returned to
BOS.  In all but one case, this should be the segment number of
bound_interceptors, which contains the module sys_trouble.
The only case in which this is not true is when BOS is entered by a manual
execution of switches, in which case it is whatever it was when the processor
was stopped to perform the manual execution of the XED instruction.  Listed
along with PSR is the instruction counter (IC), the Ring Alarm Register
(RALR), the A and Q registers, the exponent register, the Interval Timer
register, and the index registers.  In analyzing program loops, the values
printed here should be carefully correlated with the object code identified by
the PSR and instruction counter.
.spb 2
     Since Multics can enter BOS and be subsequently restarted with a GO
command, BOS saves all registers.  It also saves all interrupts that come in
after Multics has entered BOS.  These interrupts are set again (via a SMIC
instruction) when the core image is restarted via a GO command.  The
interrupts are printed in the dump in the form of the word INTER followed by
12 octal digits.  The first 32 bits of the 12 octal digits correspond to the
setting of interrupt cells 0-31.  See section I of the Hardware and Software
Formats manual (Order #AN87) for a detailed description of the data format for
the SMIC instruction.
.spb 2
     Following the interrupt word in the dump are the values in each of the
eight pointer registers.  When BOS is entered from sys_trouble (in Multics,
which is to say, by any kind of RTB other than manual execution of the
switches) pointer register 2 (bp) points to the actual machine conditions that
were stored when the cause of the crash actually happened.  These machine
conditions are usually stored in prds$sys_trouble_data, and, for a
fully-running Multics, will always be the conditions for a "sys trouble
connect" or an execute fault.  On a system being bootloaded, which crashed due
to a premature fault, the actual fault data will be pointed to by Pointer
Register 2 as reported by BOS.
.spb 2
     After the pointer registers, the contents of the PTW and SDW associative
memories are printed.  This data is printed in an interpreted format.  Figures
containing the layout of the associative memories as stored in memory can be
found in Section I of the Hardware and Software Formats manual (Order #AN87).
The figures of interest are: SDW Associative Memory Register Format, PTW
Associative Memory Register Format, and PTW Associative Memory Match Logic
Register Format.  Generally, the associative memory contents are of little use
except in debugging hardware problems.  One thing to check for if associative
memory problems are suspected is nonunique usage counts (i.e., two associative
memory slots having the same usage number).  Another possibility is for two
slots to have the same contents (e.g., two slots in the SDW associative memory
pointing to the same segment).
.spb 2
     Following the associative memory printout is an interpreted description
of what memories are attached to the bootload processor and the size of each
memory.  The information is printed in two columns.  The first column contains
the beginning address, in 64-word blocks (two digits of zeroes droped off the
end), of a memory.  The second column contains the size of that memory in
64-word blocks.  There are eight entries in each column, one for each
processor port.  Listed below is a sample printout for a system with 128k on
each of the first three processor ports.
.spb 2
.fif
     Coreblocks:     First     NUM
                         0     4000
                      4000     4000
                     10000     4000
                     NO MEM
                     NO MEM
                     NO MEM           
                     NO MEM
                     NO MEM
.fin
.spb 2
     Following the display of the memory layout is a printout of the memory
controller masks for the memory on each processor port.  A memory mask is
stored as 72 bits.  Bits 0-15 contain the settings of bits 0-15 of the
interrupt enable register for a memory.  Bits 32-35 contain bits 0-3 of the
port enable register for a memory.  Bits 36-51 contain the settings of bits
16-31 of the interrupt enable register.  Bits 68-71 contain bits 4-7 of the
port enable register.  See the description of the System Controller Interrupt
Mask Register in Section II of the Hardware and Software Formats manual (Order
#AN87).
.spb 2
     The last set of registers are the history registers.  These are processor
type dependent.
.spb 2
On the Level 68 and DPS processors there is one set of history registers for
each of the four operational units of the processor: the Operations Unit (OU),
Control Unit (CU), Appending Unit (APU), and Decimal Unit (DU) or EIS portion
of the processor.
.spb 2
On DPS 8 processors the Operations Unit (OU) and Decimal Unit (DU) history
registers have been combined into a single Operation Unit/Decimal Unit (OU/DU)
history register set.
.spb 2
See Section I of the Hardware and Software Formats manual (Order #AN87) for
detailed figures of: CU History Register Format, APU History Register Format,
DU History Register Format, and OU/DU History Register Format.  In the case of
a manual execution of switches, they may contain information leading to
valuable insight as to what the BOS processor was doing.
.spb 2 
     The last set of information that is printed with the registers is an
interpretive layout of the descriptor segment.  Each SDW is printed in an
expanded format.  SDWs that are all zero (directed fault zero or segment
fault) are not printed.
.spb 2
     Along with each SDW a name is printed.  For a segment in the
Storage System hierarachy, this name is a pathname reconstructed
from the names of segments and directories retrieved from the
"SST Name Table".  These names were copied out from the
VTOC entries of active segments by the BOS "SSTN" utility (that which
prints out "FILLING SSTNT" at FDUMP time).  The names in the VTOC
entry are put there by Multics for this reason only, and thus,
Multics does not go through much expense to keep them accurate.
Therefore, they reflect the primary name of segments at the time
they was created.  This should be kept in mind when "shriek"
names and other apparent anomalies appear in the Descriptor Segment
listing.  For supervisor segments not in the hierarchy,
the principal segment name as stored in the SLT (Segment
Loading Table, the effective "directory" of supervisor segments)
is printed.



		    s3.an53.compin                  10/22/84  1058.2rew 03/17/81  1506.1      113922


.ifi init_plm "AN53-02"
.srv section "3"
.ifi l0h "Crashes With No Message or BOS-printed message"
     Multics never returns to BOS without printing some form of message;
in earlier versions of the system (before MR5.0), it did.  Currently,
at the time of a crash, Multics either prints a message itself,
via syserr, the operator's console mechanism, or returns to BOS
leaving information in the flagbox (see Section I) so that BOS
can print a message on return.  BOS-printed messages can be identified
on the operator's console as being all in upper-case (e.g.,
"FAULT/INTERRUPT WITH PTL SET"), and not being followed by
the "Multics not in operation" message.
.spb 2
     Multics can also, on account of serious failures, "hang",
with processors looping in an inutile manner, where no users are accomplishing
useful work.  In such a case, the operations staff must
initiate an execute fault, or "execute the switches", as described
in Section 1.  In this case, the BOS-printed message will indicate
nothing more than the fact that this has occured.  Analysis of
this type of crash is similar to that where BOS prints
the error message.
.spb 2
     This section describes a deterministic algorithm for ascertaining the
immediate reason for a system return to BOS with a BOS-printed message, or a
forced return to BOS.  It is the intent of this section and the next to
describe an appropriate course of action for determining the immediate cause
of a crash, when presented with a dump.
.spb 2
     It goes without saying that the "ultimate" cause of a crash usually
requires insight, experience, and patient investigation to determine, even if
these procedures for locating the "immediate" cause of the crash produce the
same readily.  The "ultimate" cause of a crash is taken from the domain of
"hardware problem", "system bug", "system design limitation", and so forth.
Saying that the system crashed because of "a boundsfault taken on the PRDS" is
an "immediate reason", but not an ultimate reason.  Possible ultimate reasons
might be "a site-supplied MCS multiplexer had too large a stack frame, and ran
off the end of the PRDS", or "the CPU hardware dropped bit 29 and
interpreted the '6' for 'SP' as an offset of 600000".
.ifi l2h "How did the system return to BOS?"
     When inspecting a paper dump, or an online (FDUMP) dump, one may not
have the message produced by BOS available.  One must determine how the
system returned to BOS from the machine conditions stored by BOS.
.spb 2
     The first quantity to inspect is the PSR in the registers printed by BOS,
or online_dump.  If the PSR contains the segment number of any segment other
than that of bound_interceptors instruction in the module sys_trouble that
returns to BOS), the system did not return to BOS of its own volition, and a
manual transfer to BOS ("Execution of the switches", i.e., XED 10000 or XED
10002) was made by the operator.  (Specifically, the PSR/Instruction counter
should point to the instruction in the module sys_trouble following the derail
instruction that causes the return to BOS) One may assume that the system was
looping, and the general analysis for loops should be undertaken: the
state of the processor whose switches were executed can be determined
from the BOS-stored registers, and the state of other processors will probably
not be available.
.spb 2
    If the PSR/ICT points to the correct place in sys_trouble, pointer
register 2, as printed by BOS or online_dump, points to the machine conditions
that caused the bootload processor to return to BOS.
This is prds$sys_trouble_data.  If the
fault/interrupt code in the second word of the SCU data in these machine
conditions is anything other than a connect fault (octal 21 as it
appears there), these machine conditions represent one of the following cases:
.spb 1
.inl +10
.unl 5
1.   An execute fault was taken by the bootload processor, i.e., the operator
pressed the EXECUTE FAULT pushbutton on this processor.  This fault/interrupt
code is 37, octal.  Presumably, the bootload processor was looping,
and diagnosis should proceed from the analysis of its state as encoded
in these machine conditions.
.spb 2
.unl 5
2.   A fault is detected during initialization, before the  mechanism to
handle that fault is set up.  The fault/interrupt code will be that
of a fault (an odd number).  There should be no premature faults
during initialization; a misconstructed system tape or hardware
error is to be suspected.
.spb 2
.inl -10						
     If the fault/interrupt code in the SCU data pointed to be pointer
register 2 as given by BOS reflects a sys_trouble connect, some other module
or processor caused this interrupt.  All modules that send a sys_trouble
connects execute DIS instructions (octal 000000616000, although sometimes with
modifiers) immediately after issuing the CIOC instruction.  Hence, if the SCU
current/odd instruction words (6 and 7) do not have NOP instructions or a CIOC
instruction, one concludes that some processor other than the bootload
processor first sent a sys_trouble connect The machine conditions at
prds$sys_trouble_data for all running processes should be inspected to find
one that was faulted out of NOPs or a CIOC, or records an execute fault (37),
the latter case indicating that the operator pressed "execute" on that CPU.
(It is possible, however, for a processor to be executing some NOP loop, such
as certain locking code at the time a sys_trouble connect is received from
another processor.  If, in a multi-CPU dump, many such sets of sys_trouble
data are found, this should be suspected, and the set of conditions that
identifies NOP after a CIOC instruction sending sys_trouble found.)
.spb 2
     When the processor that started the sys_trouble broadcast has been found,
the program that sent the first sys_trouble connect must be identified.  If
the sys_trouble_data indicates an execute fault, then the operator pressed the
"execute" button on the CPU in that process, to deliberately crash the system.
This should be analyzed as a loop on that processor, with the execute-fault
machine conditons in prds$sys_trouble_data indicating the state of that
process.
.spb 2
     If not an execute fault, some procedure deliberately crashed the system
because of circumstances it detected, and that procedure must be identified.
This can be done by inspecting the PSR in the machine conditions for this
first sys_trouble connect.  If it is the segment number of the fault
interceptors in bound_interceptors, a fault or interrupt encountered in
invalid circumstances was detected.  This can be a fault (generally a
user-fault type fault, see Sec.  2) while running with the PRDS as a stack, or
taken in an idle process, or an interrupt while processing an interrupt.
.bbf
See the section "Crash Policy vis-a-vis faults" below.
.bef
(fim-type) faults and interrupts with the page table lock set will always
crash the system immediately in this way, too.  In all these cases, pointer
register 2 in the sys_trouble machine conditions points to the machine
conditions stored at the time of the problematic fault or interrupt.
.spb 2
     If the PSR identifies bound_page_control, a page fault was taken in an
invalid circumstance.  Pointer register 0 in the system trouble machine
conditions points to the page fault machine conditions.  Such circumstances
are time spent in idle processes, and times that the PRDS is being used as a
stack.
.spb 2
     If the PSR identifies privileged_mode_ut (in bound_priv_1) an explicit
call was made to pmut$call_bos.  This is always done in the case of fatal
crashes with a message, in which case syserr makes this call.  One should
identify the owner of the stack frame pointed to by sp (pointer register 6) in
the sys_trouble data.  By _o_w_n_e_r, we mean the procedure indicated by the return
pointer (location 24 octal).  (pmut does _n_o_t push a frame in this case.) If
the owner is bound_error_wired (which contains syserr) then a call was
probably made to print out a crash message.  The arguments to syserr in a
preceding frame should be inspected.  In this case, either the message was
printed out by the operator's console, or some difficulty may have been
encountered in trying to print it out.  Otherwise, it may be assumed that
privileged_mode_ut$call_bos was invoked (through hphcs_) by some program in
the outer rings, and a stack trace should determine the problem (for instance,
the operator might have issued the "BOS" command to the Initializer.
.ifi l2h "Crash Policy vis-a-vis Faults"
     Some words of design motivation are felt to be appropriate here.
.spb 2
     At first, it seems as though a crash for the reason
of a fault having been detected "while using the PRDS as a stack"
indicates a deficiency in the operating system.  Why is there
not adequate code to process these faults even if the PRDS is
being used as a stack?  What is the relevance of the PRDS
to the processing of faults anyway?
.spb 2 
     The problem here is not one of fault-processing, or lack of
fault-processing facilities.  Looking a bit more deeply into the meaning of "a
fault", we must ask, about a fault taken "on the PRDS", what kind of fault was
it?  Was it a hardware error?  Was it a programming violation (illegal opcode,
out of bounds, or so forth)?  Was it a page fault?  Although for different
reasons, all of these lead to the same conclusion: the operation of this
bootload of the operating system must be terminated.
.spb 2
     Consider a hardware error, such as parity, detected, say, "while the PRDS
is being used as a stack".  From Section 2, we learn that we cannot even tell
where a parity error was encountered, much less restart or retry the
instruction (which we cannot even identify) that encountered the fault.  Even
if we could, the hardware is telling us that some data fetched from memory is
faulty.  In either case, we can no longer continue execution of the procedure
which encountered the fault: we have lost its control point.  When using the
PRDS as a stack, we are, of necessity, either handling an interrupt or a page
fault, or in the traffic controller.  In any of these cases, losing  the
control point of one of these critical procedures is guaranteed to
leave their data bases in an inconsistent state.
.spb 2
    Hardware errors detected in the user ring by user programs
create user-visible errors.  The user can retry his command, logout,
or take other corrective action.  But in the case of a fault
taken in traffic control, page control, or interrupt handling,
an arbitrary amount of data involving an unknowable number of users
is involved, and there is noone to tell what went wrong.  The
consistency of the system for the remainder of the bootload is at stake,
and the ability of the system to perform services on behalf of _a_n_y
users.
.spb 2
     For faults taken in other parts of ring zero, there are often
recovery procedures.  For instance,  all calls into ring zero
are on behalf of the user who called in.  Many calls affect
only that user's per-process data.  Many affect directory control,
and directory control has an extensive recovery system:  when
a directory-control program loses its state, directory control
knows precisely which directories may have been left inconsistent,
and has, and executes, procedures to ensure the integrity
(salvage) of affected directories.
.spb 2
     The subsystems serviced at interrupt, page fault, and traffic
control time are very complex, and damage to their data bases
is not readily localizable. Thus, there are no dynamic reconstruction
procedures for the data bses of these subsystems
.bbf
There is a dynamic reconstructor (pc_recover_sst) for
page control which is used at ESD time, but recovery "on the fly"
at fault time is still a much harder problem.
.bef
at this time (MR8.0).
.spb 2
     In the case of program errors such as illegal instructions
or bounds violations detected while executing in critical subsystems,
the very integrity of the code of the  supervisor (perhaps
it was destroyed by hardware, or bad site-supplied code was involved)
is to be questioned as well as all of the above issues.  In the
case of a page fault or interrupt during a page fault or interrupt,
some gross error in environment managment must have occurred,
such as the loss of a mask, or a lock failing, or stack header
pointers lost, or so forth: in such circumstances, it is unclear
that the system can even function effectively enough to produce
a syserr message, _l_e_t _a_l_o_n_e continuing to serve users,
_l_e_t _a_l_o_n_e perform an analysis, develop a diagnosis,
and apply a remedy to itself.  



		    s4.an53.compin                  10/22/84  1058.2rew 03/12/81  1823.5       21519


.ifi init_plm "AN53-01"
.srv section "4"
.ifi l0h "Crashes With a Message"
     When Multics crashes after printing a message on the operator's console,
that message is always printed by syserr.  After printing the message, syserr
calls privileged_mode_ut$bos_and_return (in bound_priv_1),which sends a system
trouble connect to the current processor.  The receipt of this system
trouble connect sends similar connects to all other processors in the
system.  When analyzing a dump, look at the system trouble machine conditions
on PRDS of each processor.  One set of such machine conditions has a PSR equal
to the segment number of bound_priv_1.  In addition, the even and odd
instructions in the SCU data are both NOP instructions since
privileged_mode_ut executes NOPs waiting for the system trouble connect to
go off.
.spb 2
     Once the correct machine conditions have been found, pointer register 6
(the stack pointer) contains a pointer to the syserr stack frame.  If the
segment number in pointer register 6 is for the PRDS, the previous stack frame
belongs to the caller of syserr.  If, however, the segment number is for
another segment, i.e., a ring 0 stack, syserr uses a different convention.
syserr makes its stack frame at location 30000 octal on that stack.  It does
this so that possibly valuable stack history is not overwritten by its stack
frame.  This would happen if it laid its frame down right after the frame of
its caller.  An examination of the stack frame at 30000 shows that it has two
frames following it.  The first is for wire_stack, a program that wires pages
of the ring 0 stack so that syserr does not take a page fault running on it.
The second is for syserr_real, the program that actually prints the message.
Further examination of the stack frame at 30000 shows that the back pointer
points to the stack frame of the caller of syserr.  This frame is usually
quite far back on the stack with the intervening area holding the stack
history.  To examine this history, it is necessary to know the old forward
pointer in the stack frame of the syserr caller since the current forward
pointer points to 30000 now.  The old forward pointer is saved in location 26
octal of the frame of the caller of syserr.  Given this old forward pointer
then, it is possible to examine the stack history to see the last set of calls
before the syserr call.
.brp
.inl 0
.fin 



		    s5.an53.compin                  10/22/84  1058.2rew 04/15/81  1158.2      367236


.ifi init_plm "AN53-01"
.srv section "5"
.ifi l0h "Major System Data Bases"
     This section describes those parts of the system data bases that one
might wish to examine after a crash.
.ifi l1h "System Segment Table"
     The System Segment Table (SST) is a variable size (via configuration
card) unpaged segment.  It holds all the page tables in the system.  In
addition, it holds control blocks for main memory management, for paging
device management, and for active segment management.  Many of the data items
in the SST contain the addresses of other items.  These addresses are
expressed as 18-bit relative pointers to the SST.  Figure 5-1 below gives the
general layout of the SST.
.ifi l2h "SST Header"
   The SST header consists of various control variables and meters.  The meters
are defined in the include file sst.incl.pl1  These meters
are not discussed further in this document.  It would be useful to have a copy
of this include file in hand before reading further.
.spb 2
     The first item of interest in the SST header is the page table lock,
sst.ptl.  This lock is set when taking a page fault and remains set until page
control has completed processing the page fault (e.g., initiated a disk read
for the page).  The page table lock is also used at other times and it locks
the header, core map, paging device map, and page tables.  It is also a lock
on those parts of an ASTE needed by page control.  If any _p_r_o_c_e_s_s_o_r attempts
to lock the page table and it is already locked, that processor either loops,
dispatches to the traffic controller, or places elements int he "coreadd
queue".
.spb 45
.ifi fig "Layout of System Segment Table (SST)"
.spb 2
     Following the page table lock is the AST lock (sst.astl).  This lock is
generally used only by segment control.  Instead of being a loop lock, the AST
lock is a wait lock.  This means that if a _p_r_o_c_e_s_s
finds the AST lock locked, it gives  up the processor and informs the traffic
controller that it wishes to WAIT on this lock.  When the process that locked
the AST lock is finished, it notifies all processes that are waiting on the
lock.  The SST variable, sst.astl_event, contains the event
upon which processes contending for the AST lock should wait.
.spb 2
.unl -5The next item of interest is the pointer to the beginning of the AST,
sst.astap.  As described below, there is a page table following each ASTE.
The maximum size of a page table is 256 PTWs.  Clearly it would be wasteful to
allocate a maximum size page table for every active segment.  Consequently,
the ASTEs are broken up into pools where all the ASTEs in a pool have the same
number of PTWs.  The current pool sizes are 4, 16, 64, and 256.  Each element
in the array sst.level consists of a pointer to the used list (described
below) of ASTEs of a pool, and the number of ASTEs in the pool.  There are
also special pools for various classes of supervisor and initialization
segments (see the S__t_o_r_a_g_e S__y_s_t_e_m
PLM, Order No. AN61 and the Initialization PLM).
.spb 2
The next item of general interest in the SST header is the set of control
words for the core map.  The variable sst.cmp is a pointer to the
start of the core map, and sst.usedp is a relative pointer to the
least-recently-used core map entry of the used list (described briefly below
and fully in the Storage System PLM).  There is also a pointer to
the ASTE for the root in sst.root_astep.  There is room now in the header for
a block of meters.  Following the meters is a block of information used in
management of the paging device.  The variable sst.pdmap is a pointer to the
paging device map in the SST
and sst.pdusedp points to the least-recently-used entry of the paging
device used list.
.ifi l2h "Core Map"
.unl -5
Directly following the SST header is the core map.  The core map consists of a
4-word Core Map Entry (CME) for each 1024-word core block that is configured.
Even if a memory is not online, if it is to be added dynamically, each page in
the memory must be represented by a CME.  Figure 5-2 below depicts a CME.
.spb 25
.ifi fig "Core Map Entry"
.spb 2
     The first word contains the forward and backward threading pointers.  These
pointers (actually, they are addresses relative to the base of the SST) are
used in the implementation of the Least Recently Used (LRU) algorithm for core
management.  The header variable, sst.usedp, points to the head of this
circular list and in fact points to the CME that represents the block of core
least recently used.  The LRU algorithm is described fully in the Storage
System PLM.  One important thing to be checked in a dump analysis is that the
CMEs are all threaded together correctly.  In Release 2.2 and later systems,
CMEs for out-of-service pages and RWS (read-write sequence) buffers are
_n_o_t threaded in.
.spb 2
.unl -5
Each CME holds the device address for the page that resides in the core block
represented by that CME.  A device address consists of an 18-bit record number
and a 4-bit device identification.  (The first bit of this device ID indicates
the paging device.)  The one exception is when the page occupying the core
block associated with the CME is a new page and has not yet been assigned a
disk address.  In this case, a null device address is stored as the device
address.  Null device addresses may also appear in PTWs.  Null device
addresses are coded for debugging purposes to be able to tell which program
created the null address.  Listed below are the null addresses (any address
greater than 777000 octal is considered to be a null address):
.spb 2
.fif
.inl 5
777777   created by append
777001   created by pc$truncate
777002   created by pc$truncate
777003   created by salv_check_map
777004   created by salv_check_map
777005   created bysalv_truncate
777006   created when page is zero
777007   created by pc$move_page_table
777010   created by pc$move_page_table
777011   created by get_aste
777012   created by make_sdw
777013   created by deactivate
777014   created by move_file_map
777015   created when page is bad on paging device
.fin
.inl
.spb 1
Listed below are the Multics device ID numbers:
.spb 1
.fif
.inl 5
1   Bulk Store
2   D191
3   E191
4   D190
5   E190
6   D181
.fin
.inl
.spb 2
.unl -5
If the paging device indicator is not on, then the device address is a disk
address.  The only consistency check one can make in this case is to look at
the PTW pointed to by the PTW pointer in the CME and make sure that the core
address in the PTW corresponds to the core block represented by the CME.
During a read/write sequence, the PTW pointer is replaced by a pointer to a
Paging Device Map Entry (PDME).  A simple algorithm to do this is:
.spb 2
.fif
.inl 5
Let x = ptw.add     (18 bit 0 mod 64 core address)
Let y = sst.cmp     (y is offset of core map in sst)
Then address of CME = y + x/4
.fin
.inl
.spb
.unl -5
If this relationship is not true, then either a read/write sequence is in
progress (in which case the PTW pointer no longer points to a PTW, but to a
PDME), or there is an inconsistency in the SST.  It can easily be determined
if a read/write sequence is in progress since there is a flag in the CME
(cme.rws as defined in cmp.incl.pl1) that indicates this.
.spb 2
.unl -5
If the paging device indicator bit is on, then the record address is an
address on the paging device.  As described below, there is a
4-word PDME for each record on the paging device.  It is possible to find the
PDME associated with a particular CME by taking the paging device record
number, multiplying it by four, and adding in the offset portion of the
pointer in sst.pdmap.  It is important to note that this can be a
_n_e_g_a_t_i_v_e  offset.  This is
true, for example, when Multics is only using the last 1000 pages of a 2000
page paging device.  Rather than having 1000 empty PDMEs for pages 0-999, the
pointer in sst.pdmap is backed up so that the first PDME in the Paging Device
Map represents record 1000.  Once the PDME is collated, several consistency
checks can be made on it.  Figure 5-3 depicts the format of a PDME.  The PDME
is defined in cmp.incl.pl1.
.spb 10
.ifi fig "Paging Device Map Entry"
.spb 2
.unl -5
One check to be made is to make sure that the PTW pointer points to the same
PTW as the CME.  Another check is to see if the device address is for a disk
address.  If not, there is an error.  Other checks are listed below.
.ifi l2h "Paging Device Map"
.unl -5
The Paging Device Map directly follows the core map in the SST.  It has a
very similar function in that it is used to manage the 1024-word pages on the
paging device in such a manner that the least recently used page on the paging
device is the one selected for removal when a new page must be placed on the
paging device.  This removal process is called a Read-Write Sequence (RWS).
It involves reading a page from the paging device and writing it its secondary
storage (disk) address.  It is presumed that the reader is familiar with the
use of the paging device as described in the Storage System PLM.  There are
various consistency checks that can be made on the Paging Device Map.  First,
all PDMEs must be correctly forward and backward threaded.  The thread starts
with the PDME pointed to by sst.pdusedp.  There is one exception to this
rule .  When a RWS is in progress for a page, its PDME has a zero forward
pointer and its back pointer contains the address of the associated CME.  Both
the CME and the PDME should have the RWS flag on (cme.rws and pdme.rws in
cmp.incl.pl1).
.spb 2
.unl -5
Another consistency check one can make is to see if the secondary storage
address stored in the PDME is incorrect.  One can do this by applying the
paging device hashing algorithm to that secondary storage address to see if
the PDME in question is on the hash thread.  As described in the Storage
System PLM, when the paging device hashing algorithm is applied to a disk
address, a PDME address is produced.  If the disk address in question is not
stored in that PDME, the value in pdme.ht is the address of another PDME to
look at.  Thus, there exists a thread of PDMEs all of which hold disk addresses
that produce the same value when the paging device hashing algorithm is
applied.  To perform the consistency check, take the 18-bit secondary storage
record address stored in the PDME, perform a logical AND with the value stored
in sst.pd_hash_mask (which is a function of the paging device size), and
divide the result by 2.  The quotient gives the offset from the base of the
hash table (as pointed to by sst.pdhtp) of a pair of hash table addresses.  If
the remainder of the previous division is 0, use the upper address, otherwise
use the lower address.  The selected address is either zero (no secondary
storage addresses have copies on the paging device) or it is the address of
the first PDME in a chain of one or more PDMEs.  By following the chain
(pdme.ht), the secondary storage address in question should be found or the
consistency check has failed.  Of course, if the selected hash table address
is zero, the check has failed.
.spb 2
.unl -5
Another useful consistency check is to confirm the correctness of the PDME,
PTW, and CME association if the page is in core or of the PDME and PTW
association if the page is not in core (as determined by the setting of the
PDME flag pdme.incore).  If the page is not in core then look at the PTW
pointed to by pdme.ptwp (if pdme.ptwp is zero, the segment containing that
page is not active and hence has no active PTWs).  The device address in the
PTW must be for the paging device or there is an error.  To determine if it
is the correct paging device address, multiply the 18-bit paging device record
number by 4 (the size of a PDME) and add the offset portion of the pointer
stored in sst.pdmap.  This should yield as a result the offset of the
associated PDME in the SST.  If the page is in core, compute the CME address
from the PTW pointed by pdme.ptwp as described earlier.  The device address in
the CME must be for the paging device and the address of the associated PDME
can be computed as just described.
.spb 2
.unl -5
For any PTRW that has ptw.df on, the PTW must, of necessity, contain a core
address.  If ptw.df is off, it always contains a device address for all
systems earlier than Release 2.2.  In the case that this page is being read
in (ptw.df = "0"b, ptw.os = "1"b), there is always a CME associated with the
PTW which, in systems prior to Release 2.2, must be searched for.  In Release
2.2 and later systems, a PTW for a page being read in contains a core address,
which allows quick location of the CME.  In all other cases, the PTW contains
a device address.
.spb 2
.unl -5
Another quick consistency check is that all PDMEs that are free (last three
words are zero) must be at the head of the used list.  The used list is traced
by following forward pointers.  The address of the first PDME is stored in
sst.pdusedp.  Also, the number of free PDMEs in the used list plus the number
of PDMEs that have an RWS active (stored in sst.pd_wtct) should equal the
value in sst.pd_free.
.spb 2
.unl -5
The last type of check that can be made is really more of a heuristic one.
The pdme.abort, pdme.truncated, and pdme.notify_requested flags are rarely on
and may be symptomatic when looking for the reason for a crash.  Also, the
pdme.removing flag should only be on when the associated paging device record
is being explicitly deleted by the operator.
.ifi l2h "Active Segment Table "
.unl -5
The Active Segment Table (AST) described earlier contains a number of Active
Segment Table Entries (ASTE) and associated page tables.  The ASTE is eight
words long and basically contains copies of some pieces of directory
information about a segment.  This information, which can change quite
rapidly, may be updated in the ASTE rather than paging in the directory to do
the updating each time.  Figure 5-4 below shows the format of an ASTE.
.spb 45
.ifi fig "Active Segment Table Entry"
.spb 2						
     There are not a large number of consistency checks that one can make on
an ASTE.  One thing that can be checked is the consistency of the last word of
the ASTE.  The field aste.marker (as defined in ast.incl.pl1) must have the
value 02 in it.  Also, aste.ptsi must contain the page table size index.  An
index of 0 means a page table size of 4 PTWs.  An index of 1 is a size of 16,
2 is a size of 64 and 3 is a size of 256.  (These indices are used for the
array of page table sizes in sst.pts.) Another useful check is to compare the
value in aste.np (the number of pages in core) with the PTWs associated with
that ASTE.  The number of PTWEs with the directed fault bit on (in core)
should be equal to the value in aste.np.
.spb 2
     Another item of interest is that an ASTE with the flag aste.gtms on is
almost always an ASTE for a directory.  Since the Backup Facility uses this
flag, this is not a foolproof indicator.  Of course, if aste.ic is nonzero,
then that ASTE is guaranteed to be for a directory .  Only a directory can
have inferior entries.  Another check one might want to perform is to see how
the information in an ASTE compares with the branch information in the
directory (e.g., to compare secondary storage addresses in the PTWs with those
in the filemap for the segment).  To do this, one must find the ASTE for the
containing directory using aste.par_ring.  Then the descriptor segments that
are dumped must be searched for an SDW whose page table address is equal to
the address of the first PTW following the original ASTE in question.  If this
SDW can be found (if that process wasn't dumped, it can't be found), then the
directory pathname is printed in the dump and the branch information in that
directory can be found using the value in aste.rep in the original ASTE.
.ifi l2h "SST Analysis Tools"
     There are two commands, dump_pdmap and check_sst (described in Section
VIII), that perform many of the checks mentioned above.  A copy of the SST in
an fdump should be extracted using the extract command (described in Section
VIII).  Then the commands should be run.
.ifi l1h "Active Process Table"
     The Active Process Table (APT) is a variable size (via configuration
card) data base.  It is contained in the unpaged segment tc_data (for traffic
controller data).  It holds the control blocks called Active Process Table
Entries (APTE) for each process in the system as well as some interprocess
communication control blocks.  Figure 5-5 below gives the general layout of
tc_data.
.spb 45
.ifi fig "tc_data"
.spb 2
     The header contains a number of meters and variables needed by the
traffic controller.  This information is given extensive coverage in the
Multiprogramming PLM and is not discussed further other than to point out the
variable tcm1.ready_q_head (defined in cm.incl.pl1).  Using this variable
(also segdefed at tc_data$ready_q_head), it is possible to trace through the
ready list finding the APTEs for all processes that are running, eligible to
run, ready to run, or waiting.  Figure 5-6 below describes the ready list.
All other APTEs in the APT that are not threaded into the ready list are in
the blocked state or unused state.
.ifi l2h "The APTE"
     Generally, when analyzing a crash involving some type of loop, the APT is
examined.  One usually looks for APTEs waiting for strange events, APTEs in
inconsistent states, etc.  The format of an APTE is defined in apte.incl.pl1.
Generally the thing one looks at first in an APTE is the flags and state word.
The states are:
.fif
.spb 2
.inl 5
0   Empty (not in use)
1   Running
2   Ready
3   Waiting
4   Blocked
5   Stopped
.fin
.inl
.unl -5
.spb 2
The flags are covered in the Multiprogramming PLM.  In the field
apte.processid is stored the process ID for that user .  Processids consist of
two 18-bit items.  The left item is the offset in the APT of the user's APTE.
The right most 18 bits hold the last value of a number maintained by the
answering service that is incremented each time a user logs in and rolls over
at 262144.  The next item of general interest in the APTE is the word
apte.ipc_pointers.  The upper half of this word, if nonzero, is the address of
the first of one or more event messages waiting for the process.  Event
messages are stored in the ITT area of the APT.  The format is shown in Figure
5-7 below. 
.unl -5
Directly after the ipc thread is a word called apte.ips.message.  This word
holds 1-bit flags for each of the system-defined ips signals.  (These system
events are stored in sys_info$ips_mask_data.)  There are three types of ips
signals.  Bit 0 of apte.ips_message is used for the ips signal QUIT.  Bit 1 is
for CPUT (cpu timer runout).  Bit 2 is for ALRM (real-time timer runout).
.spf 45
.ifi fig "Ready List Format"
.spb 2
.spf 45
.ifi fig "Format of ITT Message"
.spb 2
     The next field of interest in the APTE is labeled apte.asteps.  This
two-word field holds the relative offset in the SST of the ASTE for the PDS of
this process, the offset of the ASTE for the descriptor segment of this
process, and the offset in pxss (the traffic controller) of the last call (a
TSX7 instruction) to the getwork subroutine.  The getwork subroutine of pxss
is called when a process must give up the processor it is running on to some
other process.  By seeing what other subroutine in pxss called the getwork
subroutine, it is possible to tell what event caused that process to give up
the processor (e.g., end of time quantum, process going blocked, page fault,
etc.).
.spb 2
     Another item of interest in the APTE is the wait event (apte.wait_event).
Listed below in Table 5-1 is the current set of wait events:
.ifi tab "Wait Events"
.spb
.inl 30
.unl 25
E__v_e_n_t_s                   M__e_a_n_i_n_g
.unl 25
000000000071             ttydim waiting for per channel lock.
.unl 25
400000000000 (octal)     Waiting on AST lock.
.unl 25
"dtm_"                   Disk metering waiting on lock.
.unl 25
"free"                   Waiting on system_free_seg lock.
.unl 25
"ioat"                   Wait on ioat lock.
.unl 25
777777777776 (octal)     Online salvager waiting on lock in salv_data.
.unl 25
777777777777 (octal)     Waiting on lock in the root.
.unl 25
000000xxxxxx             If xxxxxx>sst.astap+1, then wait event is offset in
SST of PTW for which I/O has been started.  Otherwise, the wait event is the
offset in the SST of a PDME for which an RWS has been started.
.unl 25
Valid processid          Loop wait in ttydim.
.unl 25
xxxxxxxxxxxx             Directory unique ID.
.spb 2
.inl 0
     The last two items of general interest in an APTE are the Descriptor Base
Register (DBR) value in apte.dbr and the clock reading in
apte.state_change_time.  This is the clock reading taken at the last time a
process changed its execution state (see the Multiprogramming PLM).  Hence, it
may be an indicator of trouble if it has been a long time (current time minus
apte.state_change_time) since a currently waiting or blocked process ran.
.ifi l1h "Fault Vector"
     While the fault vector is not a data base of general interest, one would
look at the fault vector if a trouble fault occurred since a typical reason
for a trouble fault is a bad instruction pair in a fault or interrupt vector.
The fault vector actually consists of interrupt and fault vectors.  It begins
at absolute location zero.  There are 32-double word interrupt vectors
followed immediately by 32-double word fault vectors.  Each vector consists of
an SCU instruction indirect through an ITS pointer and a TRA instruction
indirect through an ITS pointer.  The ITS pointers come directly after the
fault vectors.  They are ordered in the following way: ITS pointers for TRA in
interrupt vectors, ITS pointers for SCU in interrupt vectors, ITS pointers for
TRA in fault vectors, and ITS pointers for SCU in fault vectors.
.ifi l1h "Known Segment Table"
     The Known Segment Table (KST) is described in detail in the Storage
System PLM and so is not described here.  About the only issue one would have
in the KST when analyzing a crash is finding out the names associated with a
segment number when that segment has been deactivated.  In the case of active
segments, the BOS DUMP program and the online_dump program both print the
names associated with each SDW in the descriptor segment.  To find out the
names associated with a given nonhardcore segment number (hardcore segment
names are not in the KST and hardcore segments are never deactivated anyway),
one can use the following algorithm to find the address of the Known Segment
Table Entry (KSTE) for the segment:
.fif
.spb
.inl 5
1.   Assume y is the segment number in question.
2.   Let x = y-kst.hcscnt-1 (kst.incl.pl1)
3.   Let i = x/kst.acount
4.   Let j = mod(x,kst.acount)
5.   Let kstarrayaddress = kstarrayaddress+j*4
.fin
.spb
.inl
.unl -5
Step 2 above subtracts the highest hardcore segment number from the original
segment number since hardcore segments are not represented in the KST.  Step 3
divides the result of tep 2 by the size of a KST array holds the KSTE.  Step 4
finds the number of the KSTE within the KST array.  The KST array address is
found by indexing into kst.kstap in step 5 with the result of step 3.
Finally, the KSTE is found by adding the KSTE number times the size of a KSTE
(4) to the address of the KST array in step 6.  Given a KSTE address now, use
the name address in the KSTE to find the list of names associated with
the segment number.  Figure 5-8 below gives the format of KSTE (see
kste.incl.pl1) while Figure 5-9 gives the format of a name (see
kst_util.incl.pl1).
.spb 2
.unl -5
One should be aware that the reference names resulting from this algorithm are
only a heuristic help in identifying the segment.  The branch pointer in the
KSTE, identifying a directory entry, can be of help too.
.ifi l1h "Linkage Section"
.unl -5
Quite often when analyzing a dump, it is necessary to examine internal static
or to find the name associated with an unsnapped link.  In the user rings, all
linkage information usually exists in one combined linkage segment.  In
general, at the base of the linkage segment is a Linkage Offset Table
(LOT).  (The stack header contains a pointer to the LOT.)  To find the linkage
section for a given segment number, that segment number is used as an index
into the LOT, which is an array of packed pointers.  The packed pointer, if
nonzero, points to the base of the linkage section for that segment.  Usually
this linkage section is somewhere within a combined linkage segment.  Once
this address is known, internal static can be located by using the linkage
section offset given to the internal static variable by the translator or
binder.  Within the hardcore, there are two combined linkage segments and a
separate segment to hold the LOT.  One combined linkage segment,
wired_sup_linkage, holds the linkage segments for most wired segments while
the other combined linkage segment, active_sup_linkage, holds the linkage
segments of most paged segments.  The exceptions to the wired and paged
segments having their linkage sections in wired_sup_linkage or
active_sup_linkage are due purely to reasons of antiquity.  In any case, the
LOT entries for these special cases point to the correct segment.  (For
example, the LOT entry for the fim points to a segment called fin.link.
.spb 2
.unl -5
The following discussion describes the procedure involved in associating a
segment name and entryname with an unsnapped link.  This is expanded upon in
the Binding, Linking, and Namespace Management PLM, Order No. AN81.  The
reader should refer to Figure 5-10 while reading this material.  Assume you
are presented with machine conditions indicating a fault tag 2 (linkage
fault).  The TSR contains the segment number of the linkage segment and the
computed address holds the offset in the linkage segment of the unsnapped
link.  The fault/interrupt code in the SCU data is 61, octal.  To find the
names involves a 4-step process.
.spb 20
.ifi fig "Format of KSTE"
.spb 25
.ifi fig "Format of name in KST"
.spb 2
.unl -5
Step 1 is to find the linkage section header for the linkage section.  This be
done in two ways.  The most common way is to add the value labeled "header
relp" in the link to the address of the link.  This value is a negative number
that is the negative of the link's offset from the linkage section header.
The other way to find the linkage section header is to index into the LOT
using the value of the PSR in the machine conditions.  This is a packed
pointer pointing to the linkage section header.  The item of interest in the
header is an ITS pointer to the definitions section of the object segment
that took the linkage fault.  This pointer occupies the first two words of the
linkage section header.
.spb 2
.unl -5
Step 2 is to add the 18-bit "expression relp" in the link to the offset
portion of the definitions pointer.  This produces a pointer to an expression
word.
.spb 2
.unl -5
Step 3 is to add the 18-bit "type pair relp" of the expression word to the
offset portion of the definitions pointer.  This produces a pointer to a
double word type pair block.
.spb 2
.unl -5
The last step is to add the 18-bit "segname relp" to the offset portion of the
definitions pointer to produce a pointer to an ACC segname string.  Also add
the 18-bit "offsetname relp" to the offset portion of the definitions pointer
to produce a pointer to an ACC entryname string.  ACC strings consist of a
9-bit length field followed by 9-bit ASCII characters.
.spb 2
.unl -5
As a final aid in understanding this, listed below is a PL/I program fragment
that encodes the algorithm just described.
.inl 5
.spb 2
.fif
/* Assume linkp points to unsnapped link */
headerp = addrel (linkp, linkp->link.header_relp);
.unl -5
/*point to link sect hdr*/
defp = headerp->header.def_pointer;
.unl -5
/*copy definition section pointer*/
expp = addrel (defp, linkp->link.exp_word_relp);
.unl -5
/*point to expression word*/
typrp = addrel (defp, expp->exp_word.type_pr_relp);
.unl -5
/*point to type pair block*/
.unl -5
/*point to ACC segname*/
entryp = addrel (defp, typrp->ty_pr.entryname_relp);
.unl -5
/*point to ACC entryname*/
.fin
.inl 0
.spb 2
.spf 45
.ifi fig "Association of Name with Link"
.spb 2
.ifi l1h "Lock Seg"
     One data base to examine when analyzing crashes related to locking
problems (deadly embrace, idle loop) is lock_seg.  This segment is a
wrap-around history queue of all attempts to lock wait type locks in ring 0.
The segment consists of an index into the wrap-around queue and a 127-entry
queue.  The index is in the eighth word of the segment (the first seven are
zero) and is the index of the oldest entry in the wrap-around queue.  Indexing
is from 0 - 126.  Figure 5-11 shows the format of one eight-word entry in the
array.
.spf 15
.ifi fig "Format of Lock Entry in lock_seg"
.spb 2
.unl -5
The wait event is one of those listed previously (a directory unique ID,
"ioat", etc.).  The error code is the rightmost 18 bits of any
error_table_ code that lock returned to its caller.  The call type occupies
bits 21-26 of the word and is the type of entry into lock.  (These can be
determined by examining the listing of lock.pl1.)  The fail switch occupies
bit 35 of the word and is on if a lock try failed.  The last word holds the
value of sst.total_locks_set at the time the entry was made.  The value in
sst.total_locks set is the total number of locks being waited on by all
processes in the system.  It is checked for zero when the system is shut down
and a warning is printed on the operator's console if it is nonzero.  This is
done to indicate that perhaps a directory lock still remains locked, and so
the salvager should be run.
.ifi l1h "Pds"
.unl -5
Another important data base is the PDS.  This per-process segment is used as a
ring 0 stack and has a number of per-process items in the header that are
useful in dump analysis.  All of the data items are defined by segdefs and are
referenced by name in this discussion.  Of course, various sets of machine
conditions are stored on the PDS.  These have already been discussed.  For most
faults handled by the fim, the history registers are stored in
pds$history_reg_data and the associative memories are stored in pds$am_data.
This information should be examined if a fault occurs and it looks like a
hardware error may be involved.  (In some releases, these may be on the PRDS,
however.)
.spb 2
     The PDS holds the process ID in pds$processid.  Given this, the APTE can
be located as described previously (pds$apt_ptr also locates the APTE
directly).  The PDS also holds the process group ID (e.g., Jones.Project_id.a)
in the variable pds$process_group_id so that the name of the user may be
associated with the PDS being examined.  Another useful variable is
pds$lock_id.  This is the unique 36-bit value used in locking outer ring
locks.  It is also kept in the APTE for the process so that if an outer ring
lock is locked, a call can be made to ring 0 to look at all the APTEs to
discover if the process associated with the lock ID still exists.  If it does
not exist, then that outer ring lock may be zeroed.
.spb 2
.unl -5
When investigating locking problems, pds$lock_array should be examined.  This
is a 20-entry array that contains information about each ring 0 wait-type lock
currently locked by that process.  Each eight-word lock array entry holds a
pointer to the lock, the 36-bit wait event associated with the lock (e.g.,
unique ID for a directory), the fixed binary (35) type of data base being
locked (directory - 1), a pointer to the caller of lock, a fixed bin (2) number
that if 0 means the locked data base (a directory) is being read and if 1
means it is being written, and a fixed bin (2) modify switch that if 1 means
the locked data base (a directory) was modified while locked.  Figure 5-12
shows the format of a lock array entry.
.spb 2
.unl -5
The main use of the lock array besides debugging is to provide a record of any
directories that were locked and modified by a process.  If the process
attempts to crawl out of ring 0, the program verify_lock can determine whether
or not to call the online salvager.  The online salvager uses the lock array
to discover what directories to salvage.  If no directories in the lock array
have the modify switch on, then verify_lock does not call the online salvager.
.spf 25
.ifi fig "Format of Lock Array Entry"
.spb 2
.unl -5
Also of general interest in performing dump analysis is the trace table
(pds$trace) that is maintained by the system trace facility.  The system trace
facility was originally put in the system to both "trace" page faults in a
process as well as to provide data to the post purging mechanisms of the
system.  Since the original implementation, the trace has been extended to
include many events, other than page faults, that occur in a process.  A
complete list of traced events occurs below with the corresponding trace type
code.
.spb 2
.unl -5
The actual trace data is stored in a wrap-around buffer in the PDS at the
segdefed location pds$trace.  The data is stored into the trace buffer by the
primitive page$enter_data (part of the machine language kernel of page control
for efficiency reasons) for nonpage fault entries and by the subroutine
page$enter for page fault entries.
.spb 2
.unl -5
The buffer is managed by several indices enabling the system (and the user) to
determine the beginning and end of a quantum.  A bound of the trace buffer
is kept in the header of the buffer enabling correct wrap-around handling.
The actual buffer must be wired down (it is referenced at page fault time) and
hence must be limited in size.  In fact, the size of the buffer is a function
of other variables in the PDS and the buffer is assembled so that it fills out
the remainder of the first page of the PDS not needed by other wired variables
in the PDS.
.ifi l2h "Data Structure"
.unl -5
The following declarations describe the structure of the trace buffer.
.spb 1
.unl -5
declare 1 pds$trace aligned ext like trace_data;
.spb 1
.fif
.unl -5
dcl 1 trace_data aligned,
.inl 9
2 next_usable fixed bin (16) unal,
2 pad1 bit (19) unal,
2 first_unusable fixed bin (16) unal,
2 pad2 bit (19) unal,
2 time fixed bin (71),
2 first_used_in_quantum fixed bin (l6) unal,
2 pad3 bit (19) unal,
2 pad4 (3) fixed bin (35),
2 entry (0:1 refer (trace_data.first_unusable -1),
.unl -2
3 code_word,
.inl 13
4 astep bit (18) unal,
4 ring_number bit (3) unal,
4 segment_number bit (15) unal,
.unl 2
3 trace_word,
4 type bit (6) unal,
4 page_number bit (12) unal,
4 time fixed bit (17) unal;
.fin
.spb 2
.inl
The following descriptions refer to the items declared above:
.spb 1
.inl 30
.unl 25
next_usable              is an index into the trace array (trace_data.entry) to the next entry that is used by page$inter or page$enter_data.  This value may be zero but can never equal first_unusable.
.spb 1
.unl 25
first_unusable           is the number of entries in the trace array.  Since the trace array.  Since the trace array is indexed from 0, first_unusable cannot be used as a valid index into the array.  This item should be used when implementing a wrap-around technique for looking at the data.
.spb 1
.unl 25
time                     is the system clock reading at the time the most recent trace data was entered.  It is used to determine the (real) time difference between trace entries.
.spb 1
.unl 25
first_used_in_quantum    is an index to the first entry in the list for the quantum in which it occurs.  The variable only gets set if post purging is being done and usually points to the first entry after a rescheduling entry.
.spb 1
.unl 25
astep                    is used by the post purging (and process swapping) software to help locate the PTW associated with the page fault that caused this trace entry.  This is only set if type = "000000"b, i.e., for page fault entries.
.spb 1
.unl 25
ring number              is the validation level at the time of reference that caused the traced page fault, i.e., scu.trr.  This is only valid for page fault entries.
.spb 1
.unl 25
segment_number           is the segment number of the segment referenced at the time of the fault that caused the trace entry.
.spb 1
.unl 25
type                     is the coded type of the trace entries.  The following types are currently defined:
.spb 1
.fif
                    Binary  Decimal                    
T__y_p_e                V__a_l_u_e   V__a_l_u_e
.spb 1
page fault          000000     0
seg fault start     000010     2
seg fault end       000011     3
link fault start    000100     4
link fault end      000101     5
bound fault start   000110     6
bound fault end     000111     7
*signaller          001000     8
restart fault       001001     9
reschedule          001010     10
*user marker        001011     11
.spb 1
.fin
.unl 25
page_number              is the page number of the page faulted upon.  This value is only filled in for page fault entries.
.spb 1
.unl 25
time                     is the real (wall-clock) time that elapsed between this and the previous entry.  The time is in units of 64 microseconds.  If the time difference is too large (about 2**24 microseconds or 15 seconds) a value of 0 is placed in the entry.
.spb 2
.inl 0
.unl -5
For the starred trace types above, the code word should be declared as follows:
.spb 1
.unl -5
3 code_word char (4) aligned,
.spb 2
.unl -5
The trace data can be looked at (and modified in a limited way) from the user
ring.  To read the data the following code should be used:
.spb 1
.unl -5
declare hcs_$get_page_trace entry (ptr);
.spb 1
.unl -5
call hcs_$get_page_trace (addr (trace_data));
.spb 1
.unl -5
trace_data is declared as above.  (Output)
.spb 2
.unl -5
The primitive returns the entire trace structure and the caller must provide
enough storage.  Although the caller cannot know initially how much storage is
required, he can depend on it being less than 1024 words.  (On subsequent
calls the size of the structure is known.)
.spb 2
.unl -5
An interface exists to allow the user to place an entry in the trace structure
as a marker to delineate events.  This primitive is used as follows:
.spb 1
.unl -5
declare hcs_$trace_marker entry (char (4) aligned);
.spb 1
.unl -5
call hcs_$trace_marker (message);
.spb 1
where message is a user-specified character string that is placed in the trace
structure.
.ifi l2h "Trace Entry Types"
.unl -5
The following is a list of defined entry types and a description of the events
they represent.
.spb 1
.inl 30
.unl 25
page fault               means that a page fault occurred on the indicated page
of the indicated segment.  This entry is filled in for all page faults that occur in the process.
.spb 1
.unl 25
seg fault start          indicates the start of handling a segment fault.  The code word of the trace entry is filled in with the segment number of the segment faulted on.
.spb 1
.unl 25
seg fault end            indicates the end of handling a segment fault.  The code word is filled in with the segment number of the segment faulted on.
.spb 1
.unl 25
link fault start         indicates the start of handling a linkage fault on the start of processing or an hcs_$make_ptr call.  For linkage faults the code word contains the segment number of the procedure causing the fault.  For make_ptr entries the code word is set to 0.
.spb 1
.unl 25
link fault end           indicates the end of handling a linkage fault or the end of processing an hcs_$make_ptr call.  For linkage faults the code word is filled in with a packed pointer value equal to the snapped link.  For make_ptr calls the code word is set to 0.
.spb 1
.unl 25
bound fault start        indicates the start of processing of a bounds fault.  The code word is filled in with the segment number of the segment faulted on.
.spb 1
.unl 25
bounds fault end         indicates the end of processing for a bounds fault.  The code word is filled in with the segment number of the segment faulted on.
.spb 1
.unl 25
signaller                indicates the occurrence of an event signalled by the supervisor.  The first four characters of the condition name of the signalled event are placed in the code word.
.spb 1
.unl 25
restart fault            indicates an attempt to restart or continue a signalled event.
.spb 1
.unl 25
reschedule               indicates that the process was rescheduled.  It indicates the time the process's quantum expired.
.spb 1
.unl 25
user marker              is one generated by the user with the hcs_$trace_marker primitive.  The code word is specified (and interpreted) by the user but is generally a character string four characters long.
.inl
.brp
.fin



		    s6.an53.compin                  10/22/84  1058.2rew 04/15/81  1152.7      239823


.ifi init_plm "AN53-01"
.srv section "6"
.ifi l0h "Types Of Crashes"
.unl -5
This section describes so me heuristics one might use when faces with various
types of crashes.  It is impossible to provide a complete list since much of
dump reading involves an intuitive process that is not easily defined.
.ifi l1h "Loops"
The type of crash termed as a loop generally covers two cases.  The first is
an "idle loop" where all processors are just idling even though there is work
to be done.  One can recognize when a processor is running an idle process
because an idle process displays a pattern that alternates in the A and Q
registers.  The pattern is an octal 777777000000 in the A register and an
octal 000000777777 in the Q.  In later releases, a "rotating chain" pattern
may be observed.  This pattern is flipped periodically (whenever an interrupt
occurs) so that if the processor display is set up to show the AQ, the idle
loop situation is easily verified.  The other type of loop is a loop is a loop
within ring 0.  This, of course, ties up the processor so that no useful work
can be done.  In both cases, an execute fault is the usual way to crash the
system so that a dump can be taken.
.spb 2
.unl -5
In the case of a ring 0 loop, little can be said here to allow one to
discover the cause of the loop.  What must be done is to see what program was
running at the time of the execute fault, see what the value of the stack
pointer was (pointer register 6 in the execute fault machine conditions), and
using this information examine the stack history and see what the program was
doing when it was stopped.
.spb 2
.unl -5
In the case of an idle loop, little can be said here to allow one to discover
the cause of the loop.  What must be done is to see what program was running
at the time of the execute fault, see what the value of the stack pointer
was (pointer register 6 in the execute fault machine conditions), and using
this information  examine the stack history and see what the program was doing
when it was stopped.
.spb 2
.unl -5
In the case of an idle loop, the execute fault data is propably of little use.
What one should look at first is the APT to discover why no process is
running.  Of interest are the wait events of those processes in wait state.
If the wait events are unique IDs (uid) of directories, one must find the
directory associated with the unique ID.  There are three ways to find this
out.  The first is to scan the look array in the PDS of one of the processes
waiting on that event.  If the uid is found, then that look array entry
contains a pointer to the lock and given the pointer, one can of course find
the lock to get the process ID of the process that did the locking.  The
second  way is to scan lock_seg looking for an entry with a wait event equal to
the uid.  If such an entry is found, the lock pointer in the entry can be used
to examine the lock.  The final method is to search the KST.  There is a
hashing algorithm to produce the address of the KSTE for a segment given its
uid.  However, the algorithm is difficult to do by hand and so the best
alternative is to scan the KST looking for a KSTE containing the uid (last
word of the four word KSTE).  There is sufficient information in the KSTE to
find the  the lock and to extract the process ID of the process that locked
that lock.  Once the process ID has been learned, the APTE for that process as
well as the PDS can be examined.  Hopefully, the PDS contains enough history
in its stack to allow one to determine the reason why that process did not
unlock the lock.  In a crash of this sort, a normal FDUMP that dumps only the
running process is not sufficient since the running processes are idle
processes.  Hence, one should use the SHORT option to dump the descriptor
segment, PDS, and KST of _a_l_l process.
.spb 2
P__a_g_e C__o_n_t_r_o_l C__r_a_s_h_e_s
.spb 2
In general, when there is a page control problem a syserr message is printed
Although there are a few cases where page control loops (TRA *) when an error
condition is encountered.  This causes a lockup fault that results in a crash
with no message.  A common syserr message is "Fatal error in page_fault at
location n" where n is the octal location within bound_page_control where some
error was noticed.  A listing provides further information as to the cause of
ethe crash.
.spb 2
.unl -5
There are several conventions one should be aware of when analyzing page
control problems.  The first is the coding coventions used in that portion of
page control that is written in ALM.  Subroutines are not called using the
normal call/push/return sequences due to the overhead involved.  Instead, all
"calls" are done via a TSX7 instruction and all subroutines within the ALM
portion of page control share the same stack frame.  In addition, page$done
calls pxss to perform a NOTIFY when a page has been read in and so pxss also
shares this stack frame.  The stack frame is defined by
pxss_page_stack.incl.alm.  There is a small save stack for use by page control
and one for use by pxss.  These setbacks are used to push and pop values of
x7.  The stack variable "stackp" is a tally word that points to the next place
to store a value of x7 on the page control x7 save stack, save_stack.  Hence,
when reading a dump, the value in stackp is the upper bound on what x7 values
are valid in save_stack.  The value in the word of save_stack just before the
word pointed to by stackp is the address of the last subroutine that was
"calle" via a TSX7.  The same conventions are true of the pxss x7 save stack,
pxss_save, and its stack pointer, pxss,_stackp.
.spb 2
.unl -5
Also of interest in the stack frame are save areas for four sets of index
registers.  The index registers saved in notify_regs are stored when the
done_subroutine of page_fault calls pxss to do a NOTIFY when a page read is
complete.  Index registers are also saved in notify_regs when meter_disk is
called even if no disk metering is going on.) The registers stored in bulk_reg
are stored there when bulk_store_control calls page$done after a bulk store
read (or write in some cases) has completed.  The registers stored in
page_reg_bs are stored when bulk_store_control is entered.  The final items of
general interest in the stack frame are the variables did (the device ID),
devadd (the device address), and ptp_astep, which holds the address of the
ASTE for the page being read (lower half).  This cell is all used by the RWS
code.
.spb 2
.unl -5   
Certain conventions have been established for the use of index registers by
page control.  The following register assignments are used when running in all
parts of page control written in ALM except for bulk_store_control:
.spb 2
.fif 
x0	temporary (may be used at any time)
x1	pointer to PDME and temporary
x2, pr2	Pointer to PTW
x3	pointer to ASTE
x4	pointer to CME, also used in PDME hashing code.
x5	tempory
x6	temporary
x7	used for subroutine calls
pr3	pointer to the base of the SST segment
.fin
.spb 2
.unl -5
The register conventions for bulk_store_control are not of general interest
and are documented in the listing.
.ifi l1h "Attempt To Terminate Initializer Process"
.unl -5
Whenever a process takes a fatal error (e.g., runs off its stack), it is
terminated by the signaller.  The signaller accomplishes this by referencing
through a pointer with a segment number of -2 and a word offset of -4 means
that the signaller or resturant_fault faulted while processing a fault.  A
word offset of -8 means the user' stack is inconsistent state.  Other values
can be found in the listing of terminate_proc, which is the program called by
the fim when an attempt is made to reference through a pointer with a segment
number of -2.  If terminate_proc is called to terminate the initializer
process, it crashes the system.  If this happens, examine the fault data on
the PDS of the initializer.  The computed addressin the fault data is the word
offset of the pointer with the -2 segment number.  This is an indication of
why the initializer was terminated.  The other way to discover this is to find
the stack frame of terminate_proc.  It takes one argument that is either a
standard error_table_error code or the negative word offset from the fault
caused by the attempt to reference through the pointer described above.uite
often, the initializer is terminated due to an overflow of its stack caused by
recursive faults.  In this case, the _f_i_r_s_t set of fault conditions on the ring
4 initializer stack is for the original fault that caused the problems.  These
fault conditions can be located by tracing the stack in a forward direction
(starting at the stack location pointed to by the stack begin pointer in the
stack header), looking for a frame with a return pointer for the program
return_to_ring _0_.
.spb 2
.unl -5
It is unfortunate that the data from the fault that resulted from the use of
the pointer to segment -2 overwrites pds$fim_data.  Thus, the machine
conditions for the original fault cannot be found there.  With luck, a
heuristic search of the PDS for old stack frames owned by fim or
return_to_ring_0_, or data that appears to be fault data may be of use.  The
30000 stack words skipped by syserr when crashing help facilitate this
search.  It may also prove useful to inspect pds$signal_data as well.
.spb 2
.unl -5
Once the original cause for the termination of the initializer process has
been determined, there remains the task of discovering why the fault occured
in the first place.  Since it would be impossible to list all the possible
 data bases are described so that they may be checked for consistency  in a
dump.  It is assumed that the reader has previously read the material in the
S__y_s_t_e_m A__d_m_i_n_i_s_t_r_a_t_i_o_n PLM, Order No. AN72.
.ifi l1h "Teletype DIM Problems"
.unl -5
Teletype DIM (ttydim) problems usually fall in one of two classes.  Either
the FNP crashes or the ttydim within the Multics Process notices a problem and
crashes.  In the former case, a syserr mesage of the form "Emergency
Interrupt from 355 A" or "dn355: mailbox timeout, please dump the 355."
indicates that the FNP has crashed.  In the case of the ttydim, the most
common errors are of the form "tty_free n" where n is an error number.  It is
not the intent of this material to describe the internal structure of the FNP
software or the ttydim software since it is described in the S__u_p_e_r_v_i_s_o_r
I__n_p_u_t/O__u_t_p_u_t PLM, Order No. AN65.  What is given here are a few hints to offer
some direction to the crash analyzer.
.spb 2
.unl -5
When an FNP fdump is printed (using od_355), the dump is broken up into three
sections.  The first section gives the cause for the crash and the registers
at the time of the crash.  Registers are listed as IC (instruction counter),
IR (indicator register), A (A register), Q (Q register), X1 (index register
1), X2 (index register 2), X3 (index register 3), ER (interrupt enable
register) and ET (elapsed timer register).  Possible crash causes are:
.spb 2
.fif
power off
power on
mem parity   memory parity error
ill opcode   illegal (invalid) operation code
overflow
ill mem op   illegal (invalid) memory operation
divide chk
ill prg in   illegal (invalid) program interrupt
unexp int    unexpected interrupt
iom ch flt   iom channel fault
console
.fin
.spb 2
.unl -5 
Many of these faults are self-explanatory.  The fault "ill mem op" refers to
the fact that one of the following conditions has occured:
.spb 2
.inl +10
.unl 5
1.    The memory controller on the FNP times out (hardware error).
.spb 2
.unl 5
2.    There was an invalid command to the memory controller (hardware error).
.spb 2
.unl 5
3.    Out of bounds address.
.spb 2
.unl 5
4.    Attempt to alter storage in a projected region  (protection not used by
Multics currently).
.spb 2
.unl 5
5.    A character address of seven.
.spb 2
.inl -10
     The fault "ill prg in" refers to a hardware error in which the processor
attempts to answer an interrupt when there was no interrupt present or a valid
interrupt occured but the interrupt sublevel word for that interrupt was all
zeros..  The error "unexpint" means that an interrupt from an unconfigured
device occured.  The message "iom ch flt" indicates that an iom channel fault
occurred.  The fault word can be found in locations 420-437 of the FNP.  The
location selected is 420 (8) + iom channel number (0-17).  Figure 6-1 below
depicts the format of the fault status word.  Finally, the message "console"
indicates that an ABORT command was typed on the control console for the FNP
(see the Supervisor I/0 PLM).
.spb 2
.unl -5
The next section of the dump is the formatted contents of the internal trace
table.  This table contains the last fifty or so events printed in ascending
chronological order.  Each trace entry consists of a type and the value of the
elapsed timer at the time the trace entry was made.  The elapsed timer
increments every millisecond.  Listed below in Table 6-1 are the items printed
for each trace type.
.spb 2
.unl -5
The final section consists of the FNP dump itself.  The dump is formatted
eight words per line.  Preceding the octal dump of the eight words is the
absolute location being dumped, the module (if any) being dumped, and the
relative location being dumped within that module.  The Supervisor I/0 PLM
describes the internal logic of the software within the FNP.
.spb 2
.unl -5
As far as ttydim problems within the Multics Processor go, the two problems
most commonly found are syserr crashes of the form "tty_free error n" or "tty
inter error n".  Tables 6-3 and 6-4 list the meanings of these errors.  A
brief statement should be made at this point about the layout of the buffer
pool.  The unpaged segment tty_buf is used to hold several data bases
necessary to the ttydim.  Figure 6-4 depicts the format of tty-buf.  The
include file tty.incl.pl1 describes, among other things, the header area of
tty_buf.  The bleft variable contains the number of free buffers remaining in
the buffer pool and the free variable contains the address of the first free
buffer.  All free buffers are threaded together in a forward direction only by
an address in the first 18 bits of each buffer.  his address is relative to
the base of tty_buf.  All free buffers are marked by a 36-bit pattern of
alternating binary 1's and 0's in the last word of the 16-word buffer.  When a
buffer is allocated, this 36-bit pattern is changed to alternating binary 0's
an d 1's.  The program tty_free makes certain checks whenever a buffer is
allocated or freed.  Table 6-3 lists the possible errors that can be found.
.spb 2
.unl -5
The other common form of t tydim crashes is a message of the form "tty inter
error n".  Table 6-4 lists all the error codes from the program tty_inter.
.spb 2
.fif
D__a_t_a C__o_m_m_a_n_d_s     I__n_t_e_r_r_u_p_t C__o_m_m_a_n_d_s
.spb 2
000 none          000 None
001 P             001 Unconditional
010 Store         010 Conditional or TRO
011 Add           011 Conditional or PTRO
100 Subtract      100 Conditional or Data Neg.
101 Add  	        101 Conditional or Zero
110 Or	        110 Conditional or Overflow
111 Fault	        111 Fault
.spb 2
F__a_u_l_t
.spb 2
0000	None
0100	All other Memory Illegal actions
1000	Memory Parity
1100	Illegal command to IOM
1101	IOM bus channel parity error
1110	IOM adder parity error
1111	IOM priority break
.fin
.spb 2
.unl -5
The next section of the dump is the formatted contents of the internal trace
table.  his table contains the last fifty or so events printed in ascending
chronological order.  Each trace entry consists of a type and the value of the
elapsed timer at the time the trace entry was made.  The elapsed timer
increments every missisecond.  Listed below in Table 6-1 are the items printed
for each trace type.
.spb 2
.unl -5
The final section consists of the FNP dump it self.  The dump is formatted
eight words per line.  Preceding the octal dump of the eight words is the
absolute location location being dumped, the module (if any) being dumped, and
the relative location being dumped within that module.  The Supervisor I/0 PLM
describes the internal logic of the software within the FNP.
.spb 2
.unl -5
As far as ttydim problems within the Multics Processor go, the two problems
most commonly found are syserr crashes of the form "tty_free error n".  Tables
6-3 and 6-4 list the meanings of these errors.  A brief statement should be
made at this point about the layout of the buffer pool.  he unpaged segment
tty buf is used to hold several data bases necessary to the ttydim.  Figure
6-4 depicts the format of tty_buf.  The include file tty.incl.pl1 describes,
among other things, the header area of tty_buf.  The bleft variable contains
the number of free buffers remaining in the buffer pool and the free variable
contains the address of the first free buffer.  All free buffers are threaded
together in a foreard direction only by an address in the first 18 bits of
each buffer.  This address is relative to the base of tty_buf.  All free
buffers are marked by a 36-bit pattern of alternating binary 1's and 0's in
the last word of the 16-word buffer.  Then a buffer is allocated, this 36-bit
pattern is changed to alternating binary 0's and 1's.  The program tty_free
makes certain checks whenever a buffer is allocated or freed.  Table 6-3 list
the possible errors that can be found.
.spb 2
.unl -5
The other common form of ttydim crashes is a message of the form "tty inter
error n".  Table 6-4 lists all the errors from the program tty_inter.
.brp
.ifi tab "Trace Types"
.spb 2
.inl 30
.unl 25
T__r_a_c_e T__y_p_e		D__a_t_a
.spb 1
.unl 25
1 (I/0 Interrupt)	     Instruction counter at time of interrupt and a
coded word indicating what device interrupted (see Figure 6-2).
.spb
.unl 25
2 (Idle)		     Interrupt enable mask at time processor went idle.
.spb
.unl 25
3 (HSLA Activity)        See Table 6-2
.spb
.unl 25
4 (DIA Activity)         Subtype and varying data.  For subtype 1, the data is the
transaction control word (see Figure 6-3).  For subtype 2, the data is either
a Multics Processor address and line number, or an error code and transaction
control word.  This is seen only when the FNP is crashing due to a bad opcode
in a mailbox.
.spb
.unl 25
7 (LSLA Activity)        The data consists of a sybtype (1 only), a Terminal
Information Block (TIB) address, the value of the flag word in the TIB, the
value of the line status word in the TIB, the value of the LSLA status
character which occasioned the trace entry, and the value of the flag word in
the RMX table entry for the terminal.
.spb
.unl 25
10 (System Crash)      The data consists of a coded word, the rightmost 15
bits of which are the instruction counter at the time of crash, and the value
of the indicator register at the time of the crash.
.inl 0
.brp
.ifi tab "HSLA Trace Subtypes"
.brp
.ifi tab "Format of tty_buf"
.brp
where:
.spb 2
.inl +15
.unl 10
a	is 4 bits representing the iom channel number.
.unl 10
.spb 2
b	is a 7-bit device number, coded for specific devices, i.e., HSLAs
and LSLAs.
.unl 10
.spb 2
c	is a 6-bit module number, indicating which module should handle this
interrupt.
.inl -15
.ifi tab "Format of Coded Interrupt word"
.spb 3
D_I_A_ A__c_t_i_o_n F__l_a_g_s
.spb 2
B__i_t N__u_m_b_e_r		M__e_a_n_i_n_g
.spb
     0			Performed list service.
     1			Read data.
     2			Wrote data.
     3			Sent special.
     4			Read mailbox.
     5			Processed a get status command.
     6			Processed a connect.
.spb 2
F__o_r_m_a_t _o_f L__i_n_e N__u_m_b_e_r
.spb 1
.inl +20
.unl 15
B__i_t_s      	M__e_a_n_i_n_g
.spf
.unl 15
   8		0=LSLA line, 1=HSLA line
.unl 15
 9-11		LSLA number (0-5) or HSLA number (0-2)
.unl 15
12-17		HSLA subchannel (0-31) or LSLA slot (0-52)
.inl -20
.spb 2
.ifi tab "Format of Transaction Control Word for DIA"
.spb 2
.inl 25
.unl 20
E__r_r_o_r N__u_m_b_e_r (8_)     C__a_u_s_e
.spf 1
.unl 20
1.                out of buffers.
.unl 20
2.                Free buffer does not have the free pattern in it.
.unl 20
3.                Thread in a buffer is not 0 mod 16 (buffer size).
.unl 20
4.                Buffer being freed has the free is not 0 mod 16.
.unl 20
5.                Buffer being freed has the free pattern in it (already free).
.unl 20
6.                The thread in one buffer of a chain being freed is not 0 mod 16.
.unl 20
7.                 A buffer in a chain being freed has the free pattern in it (already free).
.unl 20
10.               The thread in a free buffer is not 0 mod 16.
.unl 20
11.               The address of a buffer being freed is not in the buffer
pool area in tty_buf.
.unl 20
12.               The address of the first buffer in a chain being freed is
not in the buffer pool area in tty_buf.
.unl 20
13.               The address in the transfer dcw of one buffer in a chain
being freed is 0.
.inl 0
.brp
.ifi tab "Errors from tty_inter"
.spb 2
.inl 30
.unl 30
E__r_r_o_r N__u_m_b_e_r    C__a_u_s_e 
.unl 25
.spf
1.                       Global lock (tty_buf.slock) not locked prior to causing time out.
.unl 25
2.                       Global lock not locked prior to clearing it.
.unl 25
3.                       Per tty lock (fctl.lock) not locked prior to clearing it.
.unl 25
4.                       FNP still processing read dcws when requst made to free the read buffer chain.
.unl 25
5.                       FNP still processing write dces when request made to free the write buffer chain.
.unl 25
6.                       Transfer dcw in read buffer does not point correctly to an other read buffer.
.unl 25
9.                       There is an extra read block in the read chain.
.unl 25
10.                      Attempt to put more than eight dcws in a dcw block.
.unl 25
11.                      Global lock prior to clearing it in order to process a status word.
.unl 25
12.                      Apparent bad fixed control block (fctl).
.inl 0
.spb 2
.ifi l1h "Hardware Problems"
.unl -5
This section cannot hope to describe all possible hardware problems but is
instead offered as a first step to be taken by the person performing system
problem analysis prior to consulting with engineering support personnel.
.spb 2
.ifi l1h "Bulk Store Problems"
     Bulk Store Problems usually can be designated as hardware status errors,
data errors, or lack of response to a connect.  In the last case, the message
"page: bulk store timeout" is printed and the operation is retired.  If it
fails three times, a fatal read or write error is reported.  In the case of a
hardware status error, a message is printed of the form:
.spb 2
.brf
      "bulk store err, status = x x x"
      "csb status = y,addr = n"
.spb 2
.unl -5
The first line of the message prints the bulk store status, next address of
data transfer, tally residue, and hardware indicators (the first three words
of the status block).  The second line of the message prints the status bits
in the Current Status Block (CSB), followed by the bulk store address that was
acessed.  In release 2.2 and later systems, only one word of DCM status is
given.
.spb 2
.unl -5
Another type of bulk store is a checksum error.  If the proper option is
specified on a DEBG configuration card, the bulk store software performs
checksum calcuation and checking for every bulk st ore operation.  If an error
is detected, a message of the form "bulk store cksm err, addr = x, core = y"
is printed where x is the bulk store address and y is the 24-bit absolute
memory address.
.spb 2
.unl -5
The final type of bulk store error to be described is the nonfatal error
detection and correction (EDAC) error.  This is a 1-bit error detected and
corrected by the bulk store.  All such errors are counted by the bulk store
software and are reported by the metering program file_system_meters that is
described in the Tools PLM.  Further informationis kept by the bulk store
software on which Core Storage Module (CSM) is getting the EDAC dumping the
words (one per CSM) starting at the symbol mbx.edac_buckets as defined in the
include file bulk_store_mailbox.incl.alm.  There is one word (EDAC corrected
error counter) for each 256k CSM of bulk store.
.spb 2
.ifi l1h "IOM Problems"
.unl -5
No attempt is made to try to list possible IOM problems.  Instead, all that is
presented is the format of the various control words used by the IOM and the
format of an IOM status word and IOM system fault word.  M ore information may
be found in the Supervisor I/O PLM.
.spb 2
.ifi l1h "Disk Problems"
.unl -5
Disk errors are reported by a syserr message in the following:
.spb 2
      "dnnn error:    ch=c, cmd=cm, stat=s"
                      area=a, sect=sc, cyl=cy, hd=h, addrr=ad"
      "dnnn_control:  detailed status =xxxxxxxxx"
.spb 2
.unl -5
If the major status is "device attention," the message "dnn_control:device
attention -- please check disk unit" is printed following the area, sector,
etc., information.  In the rest of the message, c is IOM channel number, cm is
IOM command, s is IOM status, a is area or logical drive number, sc is sector,
cy is cylinder, h is head, and ad is the Multics address.  In each message,
dnn is actually d190, d191, etc.  depending on which type of dick got the
error.  The last line of the message nine bits may be found in the
documentation for each type of disk subsystem.  Figure 5-1(DSS181 Extended
Status), Figure 5-2 (DSS190a Extended Status), and figure 5-3 (DSS190B
Extended Status) in the Debuggers Handbook PLM describe the meanings of the
various major status and substatus bits to be found in a disk status word.
More information about the disk software data bases, etc., may be found in
the Supervisor I/O PLM.
.spb 2
.ifi l1h "Memory Parity Errors"
.unl -5
Multics tries to recover from parity errors in almost all cases.  If a parity
error occurs while "running" on the PRDS however, (using the PRDS as a stack)
Multics crashes.  Parity errors are reported on the operator's console after
first reading the locations of the instruction and operand to see which
location (if any) had the parity error.  The results are printed in a message
as follows:
.spb 2
.brf
      "parity fault in process-group-id."
      "xxxx"
      "xxxx"
      "abs tsr loc:   n, contents; c"
      "abs psr loc:   n, contents; c"
.spb 2
.unl -5
Where process-group-id is the Person_id.  (e.g., Fred.Bassett.a), the
eight x 's are the eight words of SCU data.
.brp 
.fin
.inl 0
 



		    s7.an53.compin                  10/22/84  1058.2rew 04/15/81  1158.1       51183


.ifi init_plm "AN53-01"
.srv section "7"
.ifi l0h "System Performance Degradation"
      This section is intended to give some direction to the reader who is
trying to discover why Multics is running, but is running with very poor
response.  As is the case with crashes, the possible causes for poor system
performance are myriad but it is often possible to discover _w_h_a_t is wrong (if
not _w_h_y) using one or more of the Multics metering tools.  All of these tools
are described wither within this PLM, the S__y_s_t_e_m M__e_t_e_r_i_n_g PLM, Order No.
AN52, or within the Tools PLM.  A generally useful tool is total_time_meters
(ttm).  This metering command indicates how the processor(s) is being
allocated in terms of what percentage of total processor ti me is being used
for interrupt processing the time for page fault processing, etc.  Hence, if
some device is generating excessive interrupts, this time will show up in the
output produced by ttm.  If ttm shows that an excessive paging percentage is
probably what is causing the performance degradation, then the metering
command page multilevel_meters (pmlm) shold be used to check to see what
percentage of page faults is being satisfied by a page on the paging device.
If this number is abnormally low, the paging device map is probably
inconsistent.  If the speed of the paging device or disks is suspect, the
device_meters (dvm) command should be invoked.  This command indicates if
excessive device errors are occurring, if the paging device or disks are b
eing overloaded, or if the paging device or disks are running abnormally slow.
.spb 2
.unl -5
If ttm shows that most of the system is being tied up in the process of
interrupts, the interrupt_meters (intm) command should be used to see what IOM
channel appears to be tying up the system.  If ttm shows that the system is
spending much of its time in an idle condition, this is probably an
indication of a poorly tuned system.  The print_tuning_parameters (ptp)
command prints out all the generally setable tuning parameters.  If
max_eligible is too low, this can cause excessive idling (similarly, when it
is too high, it can cause excessive thrashing).
.spb 2
.unl -5
If the working set factor is too large, the system idles and again the
converse is true.  If the double write switch is on this means that all pages
are being written to disk as well as the paging device which will, of course,
slow down the system.
.spb 2
If the output of ttm is not conclusive, one can try running the
system_performance_graph (spg) tool to get a graph of system performance in an
attempt to pick up patterns over a period of time.  If this leads to the
conclusion that one user has managed to "steal" much of the system, try
running the traffic_control_queve (tcq) command several times.  This shows up
any user who is getting an inordinately high percent age of the available
processor time.  The command print_apt_entry (pae) (see Section 7) may be
used to print the APT entry of that user for further examination.
.spb 2
If much of the system time seems to be spent in overhead activities, the
file_system_meters (fsm) command indicates if that over head is due to the
thrashing caused by too many pages being wired or incorrest allocation of AST
entries (as indicated by the AST grace time).  If this proves fruitless, run
the meter_gate (mg) command to see if some ring 0 gate entry appears to be
using a vast portion of the processor time.  If there are still no indicators,
perhaps a processor itself is running incorrectly.  The set processor
required (sprq) command can be used to force execution on only particular
processor.  Then, execution of the instr_speed command shows if that processor
is running below normal performance levels.  It should be noted
parenthetically that the EIS tester program, et, and the test_cpu program can
be used to discover if a processor is in fact working _c_o_r_r_e_c_t_l_y if not
slowly.  (See H__a_r_d_w_a_r_e D__i_a_g_n_o_s_t_i_c A__i_d_s, Order No. AR97, for the use of these
programs.)
.spb 2
.unl -5
There are several other tools available to the investigator of system
problems.  If the paging device map is suspected of being in an inconsistent
state, it can be copied out of ring 0 using the copy_out (cpo) command (see
Section VIII) and then the dump_pdmap command can be used to confirm or deny
these suspicions.  Another command, check_sst, can be used to perform
consistency checks on the core map, and the various AST pools.  he command
ring_zero_dump (rzd) (see Section VIII) can be used to dump various data
bases in octal format for quick examination.  If a patch to a ring 0 data base
will restore the system to proper operation, the patch_ring_zero (prz)
command (see Section VIII) can be used as lo ng as the user process has access
to the hphcs_gate.
.spb 2
.unl -5
One last note should be made here about another type of system problem.  When
Multics crashes and the Salvager is run, some key system directories may be
partially destroyed so that it is impossible to bring Multics up again.  If
the system can be brought up to command level in the initializer process, the
command comp_dir_info (described in the Tools PLM) can be used to see what is
missing from certain critical directories.   This, of course, presumes that
the command save_dir_info (described in the Tools PLM) is run regularly on
these critical directories.  If a directory has been changes, then the command
rebuild_dir (described in the Tools PLMf) may be used to reconstruct the
directory, preventing a large amount of system down time for a restore or a
reload.
.brp
.fin
.inl 0 



		    s9.an53.compin                  10/22/84  1058.2rew 03/16/81  1654.2        2691


.ifi init_plm "AN53-01"
.srv section 9
.ifi l0h "Subroutine Descriptions"
     This section contains the subroutine descriptions needed to analyze dumps.
Some of these subroutines s have been referenced in previous sections.
.spb 2
     The subroutine descriptions are arranged alphabetically.
.brp
 



		    convert_v1_fdump.an53.compin    10/22/84  1058.2rew 04/14/81  1624.0       12015


.ifi init_plm "AN53-01"
.cba
.ifi l1h "N__a_m_e:  convert_v1_fdump"
.unl -5
This program converts a version one FDUMP image into version two.  All Multics
dump analysis tools were changed to understand only version two FDUMPs in
MR9.0, requiring that any existing version one FDUMP images, created prior to
MR9.0, be converted to version two.
.ifi l2h "Usage"
.unl -5
convert_v1_fdump comp_zero_path {equal_path}
.spb 2
where:
.spb
.inl 12
.unl 
1.   comp_zero_path
.brf
is the pathname of component zero of the FDUMP image, such as
>dumps>030881.1341.0.243.
.spb
.unl
2.   equal_path
.brf
is the pathname in which will be placed the version two FDUMP image created by
convert_v1_fdump.  It must end in an equal name, since it is applied to the
name of each segment in the original FDUMP image.  If this argument is not
supplied, the FDUMP image is converted in place.
.inl
.ifi l2h "Notes"
.unl -5
Access required: read access to the original FDUMP image, and write access to
the output segments.
.ifi l2h "Examples"
.unl -5
To convert ERF 243 to version two, in place:
.spb
.unl -10
convert_v1_fdump >dumps>030881.1341.0.243
.spb
.unl -5
To convert ERF 258 to version two, renaming it to ERF 259 and retaining the
old FDUMP image for checking:
.spb
.unl -10
convert_v1_fdump >dumps>040181.2318.0.258 >dumps>==.259
.cbf
.brp
.inl
 



		    appb.an53.compin                10/22/84  1058.3r w 02/24/82  1657.2      998127


.ifi init_plm "MR8.2 SRB"
.srv section "B"
.srv draft ""
.srv draft_date ""
.ifi l0h "Dump Analysis Documentation"
     The infomration in this appendix has been provided as an interim step
until it can be incoporated in a Multics techinical manual.  It is subject to
change.
.ifi l1h "Crash Procedures"
     This section covers the information necessary to understand how Multics
crashes (i.e., returns to BOS), how dumps are taken, and how these dumps are
processed.
.ifi l1h "Returning to Bos"
     There are six ways in which Multics can crash.  The first, and most
common, way is for some type of fatal error to be discovered by Multics, and
reported via the syserr (operator's console message) facility.  When this
happens, a message that describes the surface cause of the crash (e.g., LOCK
NOT EQUAL PROCESSID) is typed on the operator's console and the system
"returns to BOS".  This means that all processors other than the "BOS
processor" halt, and the BOS processor transfers control to the BOS toehold.
To effect this reentry, syserr (which typed the message on the operator's
console) calls privileged_mode_ut$bos_and_return.  This program sets the flag
scs$sys_trouble_pending, and issues a connect fault (via a cioc instruction)
at the CPU on which it is running.
.spb 2
     The handler for connect faults is the module wired_fim.  Upon seeing the
flag scs$sys_trouble_pending set, wired_fim perceives a system crash in
progress, and transfers to sys_trouble$sys_trouble.  The ALM module
sys_trouble is charged with the responsibility of organizing crashes and
returns to and from BOS.  All processors will be made to enter this code, and
either stop or enter BOS as appropriate.  The first processor entering
sys_trouble$sys_trouble notices that it is the first processor to do so, (the
flag scs$trouble_flags is used to establish first entry into this code) and
"broadcasts" connect faults to all other processors.
.spb 2
     Processors entering sys_trouble$sys_trouble copy the machine conditions
(see Section 2) of the connect fault from prds$fim_data (in the per-processor
data segment prds) to prds$sys_trouble_data.  The fault vector for the connect
fault stored the SCU data there, and wired_fim stored the rest of the
particular processor's machine state, before the interpretation of the connect
fault (i.e., "system crash") was determined.  This interpretation having now
been made, the data is moved to prds$sys_trouble_data.  This data area (see
Section 2 for heuristics on finding it) is of _c_r_u_c_i_a_l importance the Multics
crash analyst; for every process running on a processor at the time of the
crash, it tells _p_r_e_c_i_s_e_l_y where that process was.  In this fashion, erroneous
process behavior can be identified, potential misinteraction between processes
can be understood.
.spb 2
     Each processor entering sys_trouble$sys_trouble determines whether or not
it is the "BOS processor", meaning the one which will actually enter and
execute BOS.  The BOS identity of the BOS processor is defined by
scs$bos_processor_tag.  The "BOS processor" is initially defined at bootload
time as the CPU on which BOS last executed; however, should this processor be
deleted, reconfiguration will assign the responsibility of being BOS processor
to some other CPU.  One criterion for being the BOS CPU is having an SCU mask
assigned to the potential BOS CPU: SCU-directed mask-setting instructions in
BOS must actually work, and not fault.  This requires having an actual SCU
mask directed at the potential BOS CPU.
.spb 2
     The processor which determines it is the BOS CPU proceeds to enter BOS,
as the others execute a DIS instruction in sys_trouble.  The processor which
enters BOS need not be the same as the processor which started the crash.  DIS
(Delay until Interrupt Signal) is an effective HALT instruction.  If a GO (the
BOS command which restarts Multics) is subsequently executed, connects will be
re-broadcast to all CPU's, "interrupting" them out of their DIS state.  Note
how the copying of the machine conditions to scs$sys_trouble_data was
_n_e_c_e_s_s_a_r_y to prevent this possible subsequent connect fault from overwriting
the first set of machine conditions (the "sys trouble data").
.spb 2
    The method used to enter BOS is as follows: the BOS CPU loops for a while
to allow all pending I/O operations to quiesce.  Since this loop is inhibited,
the lockup fault vector is patched to ignore any lockup faults taken during
this loop.  Once the loop is completed, the two instructions, an SCU and a TRA
instruction pair, as picked up from location 4 of the BOS toehold (absolute
location 10004, accessed via the segment bos_toehold$) are patched into the
DERAIL fault vector.  The previous contents of this fault vector are saved
prior to doing so.  Finally, a DERAIL instruction is executed.  Since the
DERAIL fault vector was patched, the SCU is done and BOS is entered in
absolute mode via the TRA instruction.
.spb 2
     The second way that Multics can crash and enter BOS is for an error to
occur that cannot be reported by a syserr message on the operator's console.
These errors are the arrival of an interrupt while the processor is using the
PRDS as a stack, a page fault while the processor is using the PRDS as a
stack, certain faults while the processor is using the PRDS as a stack, or in
an idle process, or premature faults during initialization.  In these
situations, it is not clear that enough of the Multics environment in Ring 0
of the erring process is operative to run the Operator's Console software, a
bulky and complex mechanism requiring many things to operate properly.  In
these cases, the low-level code of the operating system (usually ALM code, or
some PL/I code in initialization) places a "magic number" in the low bits of
scs$sys_trouble_pending, and intiiates the "sys trouble (interpretation of)
connect" as above.  When sys_trouble sees this non-zero number, the BOS
processor copies the correct canned message (see the listing of sys_trouble
for an enumeration of them) into flagbox.message, and sets the bit
flagbox.alert in the "BOS flagbox" segment.  BOS will print out this message
(all in upper case: this is a good way of identifying such messages) under its
own control.  When such a crash occurs, the contents of the flagbox
(bos_flagbox$) hold the clue to what happened, and all the sys_trouble_data's
in the PRDS's of all CPU's tell the whereabouts of the CPU's when the error
occurred.  The synchronization of crashing CPU's and return to BOS are
effected as above.
.spb 2
     Another way that BOS can be entered is by an execute fault.  An execute
fault is caused by the depression of the EXECUTE FAULT button on any processor
maintenance panel.  The handler for this fault is wired_fim$xec_fault, and the
fault is treated in exactly the same way as a system trouble connect (which is
to say, the first entering CPU broadcasts connects, etc.) A "magic number" as
above stored in scs$sys_trouble_pending identifies this as the crash reason.
The fault data stored by the execute fault is moved to prds$sys_trouble_data
in the same way as a sys_trouble connect.
.spb 2
     Another way that BOS can be entered is by the manual execution of an XED
instruction directed at location 10000 octal (the BOS toehold).  This
operational tactic is known as "executing the switches".  The two instructions
in the toehold are executed by placing an inhibited XED 10000 in the processor
DATA switches (octal pattern 010000717200).  The processor on which the
switches are executed indeed returns to BOS at once, oblivious to other
activity in the system.  On a running multi-CPU system, this is a very
dangerous technique for just this reason: it is _c_r_u_c_i_a_l to place ALL CPU's in
MEM STEP before executing the switches.  Recommended operational technique for
executing the switches to return to BOS is given in the following section. 
.spb 2
     It should be pointed out that of the last two ways mentioned for entering
BOS, the execute fault is the normal way used to crash the system.  This would
be done for example when it is noticed that Multics is in a loop.  The execute
fault method ensures that all processors are stopped via system trouble
connects before returning to BOS.  The manual execution of switches should
only be used when running one processor, in a debugging or development
situation, or when an execute fault has appeared not to work.  If done
carefully on a single-CPU system, it is possible to start Multics again with a
BOS GO command after perhaps doing some patching or dumping of Multics from
BOS.  Manual execution of the switches is to be considered a last resort on a
multi-processor system, and should be performed with the issues outlined above
fully in mind.
.spb 2
     It is also possible to return to BOS in a restartable manner during
initialization, under control of patterns in the processor DATA switches.
This causes the sending of a sys_trouble connect as described above.  In
particular, bootstrap1 will place messages directly in the BOS flagbox, and
other initialization programs will call syserr$panic to place messages there.
The syserr mechanism will place messages there in those phases of
initialization executed before the operator's console (and implicitly, those
mechanisms upon which it relies) is operable.  For more details see the S__y_s_t_e_m
I__n_i_t_i_a_l_i_z_a_t_i_o_n PLM, Order No.  AN70.
.spb 2
     The final way that BOS can be entered is via an explicit call to
hphcs_$call_bos, which may be invoked by the Initializer "bos" command, or
certain failures in administrative ring initialization.  Any sufficiently
privileged process can call this entry from the user ring.  The effect is
precisely the same as if the syserr mechanism (the first case covered above)
had initiated a crash.
.ifi l2h "Technique for Executing the Switches"
     The procedure for executing the switches is as follows: ALL PROCESSORS
ARE PUT IN MEM STEP.  A CPU is chosen to return to BOS.  If at all possible,
this should be the system's choice of BOS CPU (the original bootload CPU, if
it is still configured, and had not been deleted is best), or some CPU known
to have SCU masks assigned via EIMA switches on the SCU's.  For the reasons
outlined above under the discussion of choice of BOS (bootload) CPU, failure
to use a processor with mask will cause BOS to fault and fail.  Under extreme
emergencies, SCU switches can be reorganized to assign a mask at RTB (return
to BOS) time, but the implicit danger grows.  On a multi-cpu system, there is
no restarting (GO) from a manual execution of switches.
.spb 2
     Now the processor selected to return to BOS, which should be in MEM STEP,
is approached, and the correct configuration (010000717200) verified in the
DATA switches.  The EXECUTE SWITCHES/EXECUTE FAULT switch is verified to be in
the EXECUTE SWITCHES position, and then the EXECUTE SWITCHES pushbutton is
depressed _o_n_c_e.  The STEP button is then depressed several times.  Then the
processor is taken out of MEM step mode and the STEP button is depressed once
more to cause BOS execution to continue.  The console should unlock and appear
at BOS command level.  If it does not, the procedure should be repeated.  By
time this level of desperation is necessary, the chances of a successful ESD
(emergency shutdown) are to be considered slim.
.spb 2
     Due to the design of the Level 68 processor, manual execution of the
switches at an arbitrary stopping point does not always succeed: there is no
way around this.  Success of executing the switches is indicated by the
pattern in the processor DATA switches appearing in the _t_o_p _r_o_w of the Control
Unit display on the left-hand door of the processor maintenance panel.
Failure is indicated by a mostly-zero pattern appearing.  If you press
"EXECUTE SWITCHES" and see this pattern, allowing continued execution will
_g_u_a_r_a_n_t_e_e that an illegal procedure fault will follow and overwrite critically
needed (for analysis) fault data.  Should the former occur (by time the latter
happens, it is too late), _i_n_i_t_i_a_l_i_z_e the CPU (some state will be lost, but not
as much as if you would proceed as before), and re-execute the switches.
.spb 2
     One technique which has been found useful to increase the probability of
successful switch execution is getting the Level 68 Processor Control unit
into an INS FCH (Instruction fetch) state before executing the switches; after
the CPU is placed in MEM STEP, if the INS FCH light on the control unit state
display on the left-hand maintenance panel is not on, STEP a few times until
it is.  T__h_e_n execute the switches.
.spb 2
     One very dangerous trap, which will lose machine state and possibly the
chance of a successful ESD, is laid by the INITIALIZE CONTROL/ INITIALIZE DATA
& CONTROL switch next to the EXECUTE SWITCHES/EXECUTE FAULT switch on the CPU
maintenance panel.  This switch M_U_S_T_ be in the INITIALIZE CONTROL position at
all times.  Should it be in the other position at the time the switches are
executed, dozens of critical machine registers will have their contents
irretrievably replaced by zeros when the switches are executed.
.ifi l1h "On Entering Bos"
     As perceived by BOS, there are three entries from Multics to BOS.  They
are performed by the absolute-mode execution by the BOS processor of the
instruction pairs in locations 0, 2, and 4 of the BOS toehold (in Release 9.0,
locations 10000, 10002, and 10004 absolute).  Location 10000 is intended to be
used by manual execution of the switches, as described above.  Location 10004
is intended to be used by Multics when returning to BOS under program control,
as described above.  It is necessary to have two different entries because the
"from Multics" entry requires incrementing the stored location counter (which
will indicate the location in sys_trouble of the "DERAIL" instruction
described in the previous section) to the instruction _a_f_t_e_r the derail
instruction if Multics is to be successfully restarted.  The instruction
counter stored by executing the switches should be usable for restarting as
is.
.spb 2
     The use of the instruction pair in 10002 will be explained below.  For
the while, suffice it to say that it is there to facilitate debugging BOS.
.spb 2
     Upon entry, the toehold writes the first 60000 (octal) locations of main
memory to a fixed location in the BOS partition.  This is to allow BOS
programs to be read in and execute without destroying Multics data.  The BOS
main control program (SETUP) and the BOS communications area are read into
this low area of memory, and the BOS command loop is entered.  The combination
of the saved area on disk, the remainder of main memory, and machine
conditions saved by the toehold upon entry constitute (to BOS) an entity
called the "Main Memory Image" (formerly (and still sometimes called) "Core
Image").  Restarting Multics involves copying the region saved on disk back to
main memory (overwriting all of BOS except the toehold) and carefully stuffing
the saved machine conditions into the BOS processor.
.spb 2
     A significant subset of BOS commands (e.g., PATCH, DUMP, FDUMP, GO) are
concerned with inspecting, modifying, or dumping the core image.  The ESD and
BOOT commands are actually implemented as commands which modify the machine
state in the core image to a fixed or partially fixed quantity, and restart
the core image (via "chaining" to the "GO" command).  The BOOT command, for
example, zeroes the core image, and reads the first record of the MST into
location 10000, and sets this as the location counter (among a few other
frills) before GOing.
.spb 2
     Some BOS commands (e.g., SAVE, RESTOR, LOADDM, FWLOAD) require more main
memory (often for buffers) to operate than provided in the initial main memory
area saved upon entry to BOS.  These commands attempt to obtain an area of
main memory usually used for active Multics data for their buffers.  If,
however, there is an active core image, BOS utilities will inform the operator
of this fact in strong language and query him as to whether he wishes to
proceed.  If he does, the Multics core image will be destroyed, and will no
longer be restartable, patchable, dumpable, or capable of emergency shutdown.
The BOS CORE comand (CORE SAVE, CORE RESTOR) can be used to save and restore
the entire core image to/from magnetic tape should it become essential to run
one of these core-image destroying commands when Multics has not been shut
down.  It is our recommendation that _s_e_v_e_r_a_l copies (CORE SAVE) be made if you
must do this; the integrity of your file system and possibly many, many hours
of down time hinge on this.
.spb 2
     If the switches are executed while BOS is in control, with the normal
pattern (010000717200), BOS will _n_o_t save its region of main memory to the BOS
partition when the toehold is re-entered.  This is to say, it will _n_o_t
consciously overwrite a Multics main memory image.  Thus, execution of the
switches may be used to break BOS out of loops, problems, and perceived loops.
This would seem to make it impossible to obtain dumps of BOS; it is for this
reason that the instruction pair in location 10002 exists.  Execution of the
switches directed at this instruction pair _w_i_l_l _a_l_w_a_y_s overwrite the main
memory image disk buffer.  (otherwise, the effect of BOS entry via 10002 is
identical to entry via 10000).  For this reason, one must be particularly
careful not to leave this dangerous pattern lying around in the DATA switches
after debugging BOS or otherwise using it.  (This pattern can also be used as
a very last resort technique if entry from Multics via 10000 fails, but full
appreciation of this paragraph ought be kept in mind before so doing).
.ifi l1h "Taking a Dump"
     Taking a Multics Dump means producing a partial snapshot of the Multics
virtual memory, including the data bases of "interesting" processes, for later
analysis, at the time of an entry to BOS.  There are three places one can put
such dumps, viz., paper, magnetic tape, and disk.
.spb 2
     The first two media are dealt with by the BOS DUMP command (see the MOH,
Order #AM81 for a full usage description).  This command is intended for
developers and system programmers either debugging new hardcore systems, or in
machine room "disaster" situations where FDUMP, the normal program for crash
dumps (see below) is inoperable.  One enters the DUMP command from BOS command
level.  Its "attention" is directed at the process in which the processor
which entered BOS was executing (i.e., the DBR value at BOS entry times
defines the default process).  Requests to dump specific segments to the
printer are interpreted relative to that process.  The DBR request to DUMP
(See the MOH for usage) directs DUMP's "attention" to other processes.  If one
has to dump many processes to analyze a problem, it is usually a good idea to
dump the segment "tc_data" first, and inspect the Active Process Table (APT)
to obtain the DBR values of "interesting" processes.  The DBR value is
currently at locations 44 and 45 (octal) in the APT entry, and the
Initializer's APT entry is always the first one.  In release 9.0, location
1260 starts the APT array.
.spb 2
     The TAPE request of the DUMP command directs its output to tape; the
default tape drive is Drive 1, see the MOH for more details.  The DMP355
command can be used to dump communications processors to printer or tape.
.spb 2
     If you are attempting to analyze a crash (i.e., a return to BOS) with the
DUMP command, you will probably want to know the state of the processor which
entered BOS (see Section 2 for more detail on chasing down crash causes).  The
REG request to DUMP prints this (or puts it on tape).  The output of the REG
request includes an interpreted display of the descriptor segment identified
by the selected process' DBR: the names of segments (pathnames or SLT names as
appropriate) in the BOS processor's process' address space are displayed in
this way.
.spb 2
     You will also want to dump the PRDS's of interesting processes, for they
contain prds$sys_trouble_data, which indicates where the "sys trouble connect"
was encountered.  In a single-cpu system, the only PRDS in the system is the
one currently in the process that returned to BOS, and the request "STACK
PRDS" will dump this out.  (Although we could use "SEG PRDS", the PRDS is also
used as a stack for interrupt and page-fault processing, and if either of
these were an issue at the time of the crash, the DUMP "STACK" request affords
the maximum automatic interpretation available).  For multi-CPU systems, you
must analyze tc_data to find the other "interesting" processes, and request
"SEG PRDS" (or "STACK PRDS") after using the "DBR" request to get there.
.spb 2
     You will also likely want to dump the Process Data Segment (PDS) of
interesting processes; "SEG PDS" in the appropriate processes will get you the
right data.  The Ring 0 stack is almost always important too, as it tells what
the "call side" of the supervisor (actions in response to hcs_ and other
system calls) was doing in that process.  The request "STACK RING 0" to BOS
DUMP will get you this.  (Segment and linkage fault processing as well as
calls are handled by Ring 0 using the "Ring Zero Stack", as opposed to the
PRDS: the issue as to which stack is appropriate is rooted in the design of
the locking hierarchy, and is not relevant here).q
.spb 2
     What constitutes an "interesting process," or an "interesting segment"?
We can't answer that here.  If you are debugging a new version of the
supervisor, you will likely know what segments of the supervisor you will want
to dump.  If you are dumping a crash via DUMP for reason of FDUMP failure, in
an unknown situation, dump to tape, and use one of the PROC keywords to DUMP
to produce a large, comprehensive dump.
.spb 2
     Tapes produced by BOS DUMP (or other BOS dumping tools) can be printed by
the Multics print_dump_tape (pdt) command, described in Section VIII.  The BOS
"PRINT" command can also be used to print such tapes.
.spb 2
     The more usual procedure for taking dumps is to use the BOS FDUMP (Fast
DUMP) command.  FDUMP writes selected segments from selected processes in the
crashed Multics to the DUMP partition on disk, where online Multics tools may
be used to inspect the "fdump" (as the resulting compound image is called) or
produce a DUMP-like dprintable output file.  The format of the fdump consists
of a header followed by images of segments.  The header describes what
segments (by number and length, processes are dumped sequentially) have been
dumped, and contains the machine conditions from the BOS core image.  The
exact selection of which processes and which segments are dumped is controlled
by the keywords supplied to the FDUMP command (see the MOH).  The FDUMP
command scans the APT (Active Process Table, located in the segment tc_data)
of the crashed Multics, and selects processes based on the criteria specified
by the keywords.  Normally, all of the supervisor data bases are dumped, as
well as supervisor data in running processes.  Pure segments (procedures and
fixed data) are never dumped, for they cannot contain clues to what went
wrong.  (Now obviously it is conceivable that severe hardware failure could
damage a "pure" segment, but the extremely rare need to dump them does not
validate the vast expense of dumping them regularly).
.spb 2
     To dump a process, the FDUMP command scans the descriptor segment of that
process and dumps appropriate segments (as selected by the keyword arguments
to the FDUMP command).  FDUMP tries to avoid dumping segments multiple times,
dumping unpaged segments (which must be supervisor data bases) only in the
first process dumped, and maintaining a bit map of the page table area (the
Active Segment Table, or AST, in the segment sst_seg) for all other segments.
FDUMP calls BOS utilities (APND) to simulate paging, for not all pages of each
segment are in main memory.  Thus, FDUMP interprets each Page Table Word (PTW)
for a segment.  If the fault bit is off, FDUMP interprets the secondary
storage or Paging Device address that is stored in the PTW and it reads the
page from that location and dumps it.  Thus, whole segments (not just "in core
portions") are placed in the fdump.
.spb 2
     Figure 1-1 below depicts the layout of the DUMP partition following
execution of the BOS FDUMP and FD355 commands.  Once the FDUMP and/or FD355
commands are executed, standard crash recovery procedures can be initiated
(e.g., Emergency Shutdown (ESD), and reboot).  To process the fdump, the
command copy_dump (described in Section VII) must be used.  This command uses
the gate hphcs_ and therefore is generally executed by Initializer.SysDaemon,
to determine whether the DUMP partition 
.spb
.fif
       -------------------------------------------
       |              segment map    	         | FDUMP header
       |------------------------------------------
       |   			         |
       |				         |
       |				         |
       |				         |
2000   -------------------------------------------
       |				         |
       |        DATANET 6600 Front-end           |
       |        Network Processor	         |
       |        core image (optional)            |
       -------------------------------------------
       |				         |
       |        segment image		         |
       |                                         | copies of
       ------------------------------------------- segments of
       |  			         | processes
       |        segemnt image		         | dumped
       |				         |
       |-----------------      -------------------
       |  			         |
       |				         |
        
       |  			         |

       |				         |
 
       |				         |
.ifi fig "Format of the DUMP Partition"
.fin
.spb
.inl 0
contains a valid dump (dump.valid = "1"b--see include file bos_dump.incl.pl1).
If it does, the Multics fdump is copied into one or more segments in the
directory >dumps.  These segments have the name date.time.n.erf_no where date
is in the form MMDDYY, time is in the form HHMMSS, n is a number, starting at
0, incremented by one for each segment of a multi-segment dump, and erf_no is
a number (the "Error Report Form" number, from Multics' historical past at
MIT) incremented each time an FDUMP is taken, as extracted from dump.erfno.
If there is a valid FNP dump (dump.valid_355 = "1"b), it is copied into a
segment in >dumps named date.time.0.erf_no.355.  The error report form number
is maintained (incremented each time) by the BOS FDUMP command.  The number
can be set to a new value (e.g., 1) at any time by typing FDUMP n where n is
the new error report form (ERF) number (crash number).
.ifi l1h "Dumping the Initializer Process"
     Perceived problems in the Initializer process (nobody can log in or out,
"sac" fails, etc.) leading to crashes (usually manual crashes via execute
fault) require special handling.  The Initializer process is not dumped by
FDUMP by default; if you wish it to be dumped, your site RUNCOM's should
include the INZR keyword] to FDUMP.  We strongly recommend this.  If it does
not, and you are about to crash the system intentionally due to a perceived
Initializer problem, change the console switches to prevent RUNCOM execution,
and perform an appropriate FDUMP "manually".
.ifi l1h "Processing an Fdump"
     If a crash occurs, the Multics and possibly the FNP core images are in
several segments in the directory >dumps.  There are two commands that can be
used to print these dumps.  One command, online_dump, or od, is used to print
the Multics dump.  The other command, online_dump_355, or od_355, is used to
process the FNP dump.  These command descriptions can be found in Section
VIII.
.spb 2
     If it is desired to merely examine portions of the fdump from a terminal,
the command ol_dump (see Section VIII) should be used.  Online dump analysis
is highly preferable to analysis of printed octal dumps: powewrful tools
running in the Multics environment greatly simplify the task of analyzing the
millions of words of data involved.  Paper dump analysis should only be
necessary in machine-room debugging situations where new versions of the
supervisor are being debugged, or hardware failure is involved.
.spb 2
.ifi l1h "Examination of Registers"
     The first block of information available in either an fdump or a dump
printed by the BOS DUMP command is the state of various processor registers,
excerpted from the BOS machine image.  These will be the registers of the
processor which actually returned to BOS.  On a multi-CPU system, this may
well _n_o_t be the processor which encountered whatever problem caused the crash.
.spb 2
     The first register of interest is the Procedure Segment Register (PSR).
The PSR contains the segment number of the procedure that actually returned to
BOS.  In all but one case, this should be the segment number of
bound_interceptors, which contains the module sys_trouble.  The only case in
which this is not true is when BOS is entered by a manual execution of
switches, in which case it is whatever it was when the processor was stopped
to perform the manual execution of the XED instruction.  Listed along with PSR
is the instruction counter (IC), the Ring Alarm Register (RALR), the A and Q
registers, the exponent register, the Interval Timer register, and the index
registers.  In analyzing program loops, the values printed here should be
carefully correlated with the object code identified by the PSR and
instruction counter.
.spb 2
     Since Multics can enter BOS and be subsequently restarted with a GO
command, BOS saves all registers.  It also saves all interrupts that come in
after Multics has entered BOS.  These interrupts are set again (via a SMIC
instruction) when the core image is restarted via a GO command.  The
interrupts are printed in the dump in the form of the word INTER followed by
12 octal digits.  The first 32 bits of the 12 octal digits correspond to the
setting of interrupt cells 0-31.  (This is the hardware format required by the
SCU for the SMIC instruction).
.spb 2
     Following the interrupt word in the dump are the values in each of the
eight pointer registers.  When BOS is entered from sys_trouble (in Multics,
which is to say, by any kind of RTB other than manual execution of the
switches) pointer register 2 (bp) points to the actual machine conditions that
were stored when the cause of the crash actually happened.  These machine
conditions are usually stored in prds$sys_trouble_data, and, for a
fully-running Multics, will always be the conditions for a "sys trouble
connect" or an execute fault.  On a system being bootloaded, which crashed due
to a premature fault, the actual fault data will be pointed to by Pointer
Register 2 as reported by BOS.
.spb 2
     After the pointer registers, the contents of the PTW and SDW associative
memories are printed.  This data is printed in an interpreted format.  Figure
1-10 (SDW Associative Memory Register Format), and Figure 1-13 (PTW
Associative Memory Register Format), and Figure 1-13 (PTW Associative Memory
Match Logic Register Format) in the System Formats PLM, (Order #AN87) contain
the layout of the associative memories as stored in memory.  Generally, the
associative memory contents are of little use except in debugging hardware
problems.  One thing to check for if associative memory problems are suspected
is nonunique usage counts (i.e., two associative memory slots having the same
usage number).  Another possibility is for two slots to have the same contents
(e.g., two slots in the SDW associative memory pointing to the same segment).
.spb 2
     Following the associative memory printout is an interpreted description
of what memories are attached to the bootload processor and the size of each
memory.  The information is printed in two columns.  The first column contains
the beginning address, in 64-word blocks (two digits of zeroes droped off the
end), of a memory.  The second column contains the size of that memory in
64-word blocks.  There are eight entries in each column, one for each
processor port.  Listed below is a sample printout for a system with 128k on
each of the first three processor ports.
.spb
.fif
     Coreblocks:	 First	 NUM
		     0	 4000
		  4000	 4000
		 10000	 4000
		 NO MEM
		 NO MEM
		 NO MEM
		 NO MEM
		 NO MEM
.fin
.spb
Following the display of the memory layout is a printout of the memory
controller masks for the memory on each processor port.  A memory mask is
stored as 72 bits.  Bits 0-15 contain the settings of bits 0-15 of the
interrupt enable register for a memory.  Bits 32-35 contain bits 0-3 of the
port enable register for a memory.  Bits 36-51 contain the settings of bits
16-31 of the interrupt enable register.  Bits 68-71 contain bits 4-7 of the
port enable register.
.spb 2
     The last set of registers are the history registers.  There is one set of
history registers for each of the four operational units of the processor: the
Operations Unit (OU), Control Unit (CU), Appending Unit (APU), and Decimal
Unit (DU) or EIS portion of the processor.  See Figure 1-34 (CU History
Register Format), Figure 1-36 (APU History Register Format), and "DU History
Register Format" in Section I in the Formats PLM, for formats of these history
registers.  In the case of a manual execution of switches, they may contain
information leading to valuable insight as to what the BOS processor was
doing.
.spb 2
     The last set of information that is printed with the registers is an
interpretive layout of the descriptor segment.  Each SDW is printed in an
expanded format.  SDWs that are all zero (directed fault zero or segment
fault) are not printed.
.spb 2
     Along with each SDW a name is printed.  For a segment in the Storage
System hierarachy, this name is a pathname reconstructed from the names of
segments and directories retrieved from the "SST Name Table".  These names
were copied out from the VTOC entries of active segments by the BOS "SSTN"
utility (that which prints out "FILLING SSTNT" at FDUMP time).  The names in
the VTOC entry are put there by Multics for this reason only, and thus,
Multics does not go through much expense to keep them accurate.  Therefore,
they reflect the primary name of segments at the time they was created.  This
should be kept in mind when "shriek" names and other apparent anomalies appear
in the Descriptor Segment listing.  For supervisor segments not in the
hierarchy, the principal segment name as stored in the SLT (Segment Loading
Table, the effective "directory" of supervisor segments) is printed.
.ifi l1h "Crash Analysis in the Ring Zero Environment"
     This section provides some basic knowledge about the supervisor
environment necessary to anyone analyzing a dump regardless of the cause for
the crash.  Some fundamental knowledge about Multics process environments
relevant to this task is reiterated here as well; the reader should be
reasonably conversant with most of the material in the MPM Reference Guide
[ORDER] and the MPM Subsystem Writer's Guide, Order #AK92.
.ifi l1h "Segment Numbers in the Supervisor"
     The entire supervisor environment is built by system initialization at
bootload time.  Because the supervisor is at a lower level than the standard
linkage fault mechanism, it cannot use it, and thus, all of the supervisor
must be "prelinked" at bootload time.  The supervisor (unless constructed
improperly while debugging) will never take linkage faults.
.spb 2
     Thus, the correspondence of segment number to segment in the supervisor
is fixed at bootload time, and is in fact known at system tape generation time
(the check_mst tool will display this correspondence under the heading
"Loading Summary").  The segment numbers of all parts of the supervisor are
the _s_a_m_e in _a_l_l _p_r_o_c_e_s_s_e_s running under the same system tape.  When a process
is created, the "supervisor region" (lower-numbered segments) of its
descriptor segment is copied from that of the Initializer (with the exception
of a handful of per-process SDWs, to be discussed below).  This ensures that
pointers made and stored by the supervisor in supervisor data bases, including
links in the supervisor's linkage sections, have the same meaning in all
processes.  Therefore, while one must qualify a statement like "The segment
number of bound_pl1_ is 332" with a statement of which process this
correspondence was observed in, one can say unambiguously "the segment number
of bound_page_control is 34" given a single system tape.  The correspondence
of segment number to segment is defined in the Segment Loading Table (SLT).
.spb 2
     The supervisor has one set of linkage segments, built at bootload time.
Of the two which survive past initialization time, a distinction is drawn
between the segments active_sup_linkage and wired_sup_linkage.  The latter is
fully resident in main memory at all times, being the combined linkage of
procedures which must (at times) operate without taking page faults.  The
linkage sections of the supervisor are found via the supervisor's LOT (Linkage
Offset Table), which, like the linkage sections themselves, is one per system.
All stacks used in ring zero designate this LOT as the LOT to be used.  Since
there is only one set of supervisor linkage sections, symbolic references such
as "tc_data$initializer_id" designate the same object, no matter in what
process the supervisor code runs.
.spb 2
     As said above, a small set of segment numbers used by the supervisor do
_n_o_t designate the same object in every process.  For example, the SDW for the
descriptor segment itself is placed into a fixed slot in a descriptor segment
being created.  This is currently (but _n_o_t _o_f _n_e_c_e_s_s_i_t_y ) slot 0, which is the
segment number corresponding to the name "dseg".  Therefore, a reference to
"dseg$" in any supervisor program will always reference segment #0 in whatever
process it runs, although this will be a different physical _s_e_g_m_e_n_t in each
process.  Similarly, the process data segment (PDS) of a process, which
contains identifying information and critical state information (see below),
the Known Segment Table (KST), and the various multiplexed stacks (see below)
also fall into this category.  Although "they" have the same segment numbers
in each process, the segment numbers refer to different objects.  A reference
to "pds$processid" in a program in the supervisor is a reference to (for
example) segment 56, word 320, no matter what process it runs in, but as to
what segment 56 _i_s, which is to say, _w_h_o_s_e PDS, this is a function of process.
.ifi l1h "Per-Process Supervisor Segments"
     Each process has several segments used by the supervisor unique to that
process.  They are created at the time the process is created, by the
Initializer, as segments in the new process' process directory.  They are
forced to be permanently (until process destruction) active, so that SDW's
describing them may be used without the possibility of segment fault.
.ifi l2h "The Descriptor Segment"
     Perhaps the most important of these is the Descriptor Segment, which is
used by the hardware to define exactly which segments constitute the process.
Because the descriptor segment is used by the hardware, there is no space in
it for auxiliary information about the process or its segments; it consists
only of Segment Descriptor Words (SDW's), some put there at process creation
time, some swapped at scheduling time, and the rest managed by segment control
in response to segment faults, deactivations, and the like (See the Storage
System PLM, Order# AN61 for a full description of these activities).
.spb 2
     The first page of the descriptor segment is wired (locked in main memory)
when a process is made eligible by the traffic controller.  The wiring of this
page (along with the first page of the PDS, see below) at eligibity-awarding
time is called _l_o_a_d_i_n_g, and is a complex multipass operation involving a
subtle interaction between traffic control and page control (see AN61 for page
control's view of it)
.spb
.inl 10
.unl 5
1.   Since a page fault or interrupt (or connect fault or timer runout) can
occur at any time in a process without warning, the SDW's for those segments
that must be involved in the handling of these faults and interrupts must be
in main memory _i_n _a_d_v_a_n_c_e.  Thus, we group all SDW's for such segments in the
first page of the descriptor segment (segment numbers 0 to 511 decimal) and
ensure the residency of that page in main memory when a process runs.
.inl 0
.ifi l1h "The PDS (Process Data Segment)"
     The most important supervisor software data base in a process is surely
the PDS (Process Data Segment).  Like the Descriptor Segment, it is created
with the process, its SDW placed in the Descriptor Segment at process creation
time, and referenced symbolically by the supervisor by a system-wide segment
number, viz., the SLT value of the symbol "pds$".  The PDS contains _a_l_l
supervisor information about a process which
.spb
.inl 10
.unl 5
1.   Is not needed except when running in that process (Globally needed info
would be in tc_data)
.spb
.unl 5
2.   Is not the actual SDW's for segments (these are in the descriptor segment)
.spb
.unl 5
3.   Is not the segment number to segment mapping (that is stored in the KST of
the process)
.spb
.unl 5
4.   Is not the stack frames of procedures (stacks are used for that, and
discussed below.
.inl 0
.spb 2
     Among the information stored in the PDS is the identity of the process
(its 36-bit process ID and "group ID" (e.g., "Qaddafi.Libya.a")), machine
conditions for page faults and pre-empts (see the discussion "Finding the
right fault data" below), the per-process page-fault trace, the per-process
lock array, and so on.  The first page of the PDS is wired when a process is
loaded, as a process can take a page fault at _a_n_y _t_i_m_e, and page fault data
can be stored in this page of the PDS
.spb
.inl 10
.unl 5
1.   Loading is not actually performed at eligibility grant time, but at the time
a process is picked up by the traffic controller which is not loaded, yet
ought be.  The code of pxss is the only reference source for those wishing to
master this subtlety.
.spb
.inl 0
with no advance warning or preparation.  The end of the PDS trails off in an
array of SCU data and EIS data for unrestarted faults which have been
signalled.  It is used to validate user changes to signalled machine
conditions prior to restart.  A discussion of what to look for in the PDS when
analyzing crashes appears in Section V.
.inl 0
.ifi l1h "The KST (Known Segment Table)"
     The remaining per-process data base of the supervisor is the Known
Segment Table, or KST.  Like the PDS and Descriptor Segment, it is created in
a nascent process' process directory, by the Initializer, and its SDW placed
in a fixed slot in the new descriptor segment.  In addition to a small amount
of header information describing its contents, the KST contains an array
defining the non-supervisor segment numbers in the process.  It is managed by
address space management in the process, in response to (sometimes implicit)
making-knowns and terminations of segments.  It is used by the segment fault
mechanism to resolve segment faults, which is to say, to find out what segment
in the permanent Storage Hierarchy is meant when the hardware faults on a
missing SDW the first time a segment is referenced in a process.  More
information about the KST and its format is found in Section V.
.ifi l1h "Stacks"
     The Multics PL/I environment which is used not only by all Multics user
programs, but all parts of the Operating System as well, requires a LIFO
organization of procedure activation records.  This is implemented as a _s_t_a_c_k,
a dedicated segment upon which an array of _s_t_a_c_k _f_r_a_m_e_s, or activation
records, are layed out.  A _s_t_a_c_k _h_e_a_d_e_r resides at the beginning of the stack
segment (see stack_header.incl.pl1), and not only defines the extent of valid
data on the stack, but contains pointers needed by the PL/I program
environment to locate critical data (for example, the LOT (Linkage Offset
Table), which locates the linkage section for each procedure in a process).
.spb 2
     Stack frames contain a fixed header (see below), which includes threads
to the previous and next stack frame.  The stack header defines where the
first frame is, or is to be placed (stacks may be empty) via the pointer
stack_header.stack_begin_ptr, and where the next frame is to be placed
(stack_header.stack_end_ptr).
.spb 2
     A process has one stack segment for each ring it has ever used, other
than Ring Zero.  The obviously crucial case of Ring Zero will be dealt with
shortly.  A set of segment numbers defined by the hardware value of
dbr.stack_base (this times eight to that plus seven) defines the stack segment
numbers of a process (currently, this is the same for all processes, and is a
number over 200 octal).  When the hardware crosses into an inner ring, Pointer
Register 7 (SB) is loaded with a pointer to the base of the appropriate stack
segment for the target ring, this segment number being computed from
dbr.stack.  The first time the process tries to reference a segment number in
this range, the segment fault handler (seg_fault) notices this, (from a null
pointer in the relevant element of the array pds$stacks) creates a stack
segment and initializes it properly.  Stacks are normal Storage System
segments named stack_1, stack_2 etc.  (per ring) in the process directory of
the process to which they belong.
.spb 2
     A variety of segments of different types are used by the supervisor as
stacks in ring zero.  This is because unusual requirements, resulting from the
nature of what it is that the supervisor does, are placed upon such segments.
For instance, at the time a page fault is taken, a stack which is resident in
main memory is needed.  At the time the page fault occurs, it is too late to
"wire down" (assure main memory residency dynamically for) the segment to be
used as a stack (what shall we use for a stack during this latter proposed
operation?).  Thus, there must be a wired (resident in main memory) stack for
each processor(1)
.spb 2
     The same argument applies to segment faults.  At the time a segment fault
occurs, the supervisor needs a segment to use as a stack during segment fault
handling.  It cannot deal with a stack segment which may not be active: what
shall it run on as a stack while activating _t_h_a_t?  So it would seem that we
need an "always-active" segment per process, for use as a stack too.
.spb 2
     The argument stops there.  There are basically two ring zero stacks
needed by a process, one which is simply guaranteed active and present at all
times, and one which is additionally guaranteed to be resident in main memory.
(Were consumption of main memory not an issue, these two could as well be one,
totally wired in main memory).  As to the rest of the PL/I environment, such
as the linkage sections and data bases of Ring Zero, these things are dealt
with on a per-system basis, and do not require the dynamic treatment afforded
stacks.
.spb
.inl 10
.unl 5
1.   Only a processor can take a page fault: processes that are not currently
on a processor can't take page faults.  Thus, with a bit of cleverness, only
one "wired" stack per processor is needed, which requires us not to lose a
processor, or put process state-information in it, while using it.  Note that
the very early versions of Multics indeed had a wired stack per process.
.fin
.inl 0
.ifi l2h "The PRDS as Stack"
     The PRDS, the per-processor data segment, is multiplexed for use as a
stack by page fault and interrupt handlers.  This is because it is both
per-processor and guaranteed fully resident in main memory.  This makes it
ideal for such use.  All of the per-processor data items in the PRDS of a
processor reside in a region between the stack header and the place where the
first frame would go (as pointed to by prds|stack_header.stack_begin_ptr).
.spb 2
     Since the PRDS is a per-processor resource, those programs that use it
must ensure that the binding between process and processor of the process in
which they are running does not change while they are using it: thus,
interrupts occur, set up stack frames on the PRDS (the interrupt interceptor
module, ii, does this), are processed, and return, abandoning the PRDS stack
history.  Similarly page faults.  To ensure this condition, as well as for
other reasons, procedures using the PRDS as a stack may not take page faults,
and must mask against interrupts.(1)
.spb 2
     The case of traffic control's use of the PRDS as a stack is a special
one.  There are two cases of interest, those entries to traffic control which
can give up the processor (e.g., block, wait) and all others.  In the latter
case, the traffic controller entry mechanism "switches" to the PRDS as a
stack(2) as a cheaper technique for what is usually accomplished by
stack-wiring (see below).  As a price for this optimization, it must be coded
in ALM and go through great pains to ensure that it can reference its
arguments without taking page faults after it has switched stacks.  (this is
the purpose of the PDS "variables" pds$arg_1, pds$arg_2, etc.) It must go
through even greater pains to ensure the semantic validity of its arguments
because of the technique just described.
.spb 2
     In the case of those entries that _c_a_n give up the processor, the traffic
controller encodes the state of the process in a slew of variables
(apte.savex7, pds$last_sp, pds$pc_call,
.spb
.inl 10
.unl 5
1.   Note that timer runouts and connects are not only acceptable, but
necessary, if page control is to be able to manage the associative memory,
processors reconfigured, etc.  Connect faults and timer runouts received while
"running on the PRDS" are simply a case of "timer runouts and connect faults
received while running in ring zero", and not a special case at that.
.spb
.unl 5
2.   As a matter of fact, the name of the traffic controller, pxss, originally
stood for "process exchange switch stack": there was another module named "px"
(process exchange) to which it interfaced.  pds$page_fault_data,
pds$ipc_block_return), making sure that _n_o _p_a_r_t _o_f _t_h_e _s_t_a_t_e _o_f _t_h_e _p_r_o_c_e_s_s is
embedded in the PRDS stack history.  This enables the traffic controller to
switch into another process, place the SDW for the PRDS it was using in the
new process' DSEG (thus ensuring the continuity of the PRDS/processor
association), and continue using that _p_r_o_c_e_s_s_o_r's stack history on the PRDS.
.spb 2
.inl 0
     The case of page control abandoning its stack history on the PRDS is a
subcase of the traffic controller PRDS policy described above.
.spb 2
      The length of the PRDS (all PRDS's) is fixed at bootload time.
Occasionally, it is "not enough".  Given the Site-dependent responsibility of
writing communications demultiplexers, which can consume arbitrary extents of
stack at interrupt time, a site can easily encounter "boundsfaults on the
PRDS" (which will be detected by fim and crash the system with a BOS-reported
message).  Occasionally pathlogical call sequences in Ring Zero can cause
this, too.  The proper solution to such an occurence is modification of the
MST header and tc_data$prds_length.
.ifi l2h "Paged Ring Zero Stacks"
     Segments to be used by the supervisor as stacks when in Ring Zero, and
there is no wiredness requirement, must still not be subject to segment
faults, as discussed above.  Before Release 8.0, the PDS had a region used as
a stack in this way, for it satisfied the requirements in this regard.
However, the need for these segments to be maintained active, as well as the
paging implications of paging new PDS's in when processes became eligible, let
to the design of the current strategy, for performance reasons.
.spb 2
     A group of segments, named stack_0.nnn, nnn from 1 to the largest number
(tc_data.max_max_eligible) of processes allowed to be simultaneously eligible
during the bootload, is created in >system_library_1 at initialization time.
They are forced active, and their SDW's placed in the segment stack_0_data.
They are assigned to processes as processes become eligible, and taken back as
eligibility is lost.  As the free list is a LIFO stack, the chances of a stack
just freed being largely paged-in when it is given to a new process are
intentionally high.
.spb 2
     This policy relies upon the fact that only eligible processes can have a
ring zero stack history.  While it is true that only _r_u_n_n_i_n_g processes can
have a processor, or take page faults, and need a PRDS, processes which are
_w_a_i_t_i_n_g on page faults in Ring Zero, or locks, may well have a Ring Zero stack
history, and not be running.  Except for the Syserr Log Daemon process (which
must be special-cased, discussed below), processes do not call the traffic
controller "block" entry except from outer rings.  Processes running in Ring
Zero cannot be pre-empted (wired_fim delays a potential pre-empt via the Ring
Alarm register in this case).  Thus, with the noted exception, it is
impossible for a process running in ring zero to lose eligibity.  Therefore,
only eligible processes have Ring Zero stack histories, and thus, only
eligible processes need Ring Zero stacks.
.spb 2
     When a process is given eligibility, a usable SDW for a stack_0 is placed
in its descriptor segment in the slot that the hardware assigns to the Ring
Zero stack, as defined by dbr.stack (see above).  Processes abandoning stacks
at eligibility loss time, or terminating, have a responsibility to reset these
stacks appropriately.  The header information in the stack_0 segments is valid
across all processes.
.spb 2
     The processing of calls to hcs_$block must be complex to allow this trick
to work, i.e., the abandonment of the stack_0 and its return to the pool when
the traffic controller goes blocked.  The target of the gate hcs_$block is
fast_hc_ipc, an ALM program that does not use stack_0 at all to do its
processing.  If, when it so determines from its various PDS flags, it must
call hc_ipc_, a large PL/I program that copies ITT messages to outer rings, it
establishes a stack frame on the process' stack_0, calls this utility
subroutine, and upon return, reverts to its previous state of not using a
stack.  When it actually calls the traffic controller block entry, it has no
stack history on stack_0, and the traffic controller is aware of this.  The
return of the traffic controller upon wakeup is to a prearranged location in
fast_hc_ipc.
.ifi l1h "Stack Wiring"
     Often, supervisor programs have to call page control, or other programs
that lock very-low-level locks, for services.  These programs are always
running using the "Ring Zero Stack" just described.  Page control, or whatever
is being called, must ensure that it will not take page faults (or otherwise
lose the processor) while it has its low-level locks locked.  As a
consequence, the region of the Ring Zero stack to be used as a stack during
these activities must be temporarily wired.
.spb 2
     For this reason, privileged_mode_ut$wire_and_mask is provided.  When
called, by some procedure which is about to lock a critical lock, it attempts
to wire the current and next three pages of the current stack, and page 0,
which contains the stack header (which will be needed during call, return, and
entry sequences.  It also "masks the current process against interrupts"(1)
Wire_and_mask avoids wiring and masking if it senses that Page 0 of its
current stack is already wired; this facilitates recursive use.  It also
avoids any action if it senses that it was invoked while the PRDS was being
used as a stack.  Wire_and_mask encodes its state in two arguments that it
expects to be passed back to pmut$unwire_unmask, which undo this action.  If
unwire_unmask senses that wire_and_mask did nothing for the reasons just
discussed, it too will do nothing.
.spb 2
     The technique used by wire_and_mask to wire is simply turning on the
ptw.wired bits of the stack's PTW's and touching the pages.  Page control is
aware that this can happen without the page-table lock being locked; there is
no problem here (see AN61 for more detail on why this is valid).
.spb 2
     The amount of stack wired by wire_and_mask must be known at the time it
is called.  Since no generally-written program can know who its ultimate
callees will be, this number is a system constant, currently 4.  If, via some
pathological call sequence, or bug, or site modification, more pages are found
to be needed in the midst of processing, nothing can be done, and the system
will crash, either with a page control MYLOCK ON GLOBAL LOCK or miscellaenous
esoteric hangs.  If, in analyzing a crash, the system is found to be hanging,
or all processes waiting,and some eligible process' pds$page_fault_data
implicates priviliged_mode_ut, this may be your problem.  The fixed number has
grown over the years; there is no way to determine it dynamically.
.spb 2
1.   This means, "set to zeroes that Interrupt Mask of the bootload memory
(which receives all system interrupts) that is directed at the processor we
are running on".  The smcm instruction does this.  As this mask is read by the
traffic controller before a process loses a processor, and stored and set back
when the process ispicked up, an effective "process mask" is maintained.  For
processors which have no masks assigned (and thus cannot process interrupts),
a simulation of mask processing is accomplished via ldaq/staq instructions
into prds$simulated_mask.  Executing the smcm/rmcm or ldaq/staq in a processor
dependent way is accomplished by xec'ing an array of processor-dependent
instructions in the SCS.
.ifi l2h "Initialization/Shutdown Stack"
     Bootload uses the supervisor segment inzr_stk0(1) as a stack.  It cannot
use stack_0's, for they are not created until late in the bootload (when the
Storage System is fully operative).  The Initializer process uses inzr_stk0
until it comes out to Ring One, and then never uses it again.  Of course,
during page faults and interrupts in the course of initialization, the PRDS is
used as normal, for the stack-switching mechanism are operative.
.spb 2
     The segment inzr_stk0 is kept for use at emergency shutdown time.  When
the system is re-entered from BOS for Emergency Shutdown, the PRDS of the
processor which re-entered Multics is used as a stack while the I/O mechanism
and page control are reinitialized and consisitentized.  No page faults are
taken during this processing.  When Emergency Shutdown must enter the Paged
Environment, inzr_stk0 is switched to as a stack.  It is used because it is
always there (i.e., at all phases of initialization and system running), and
has an SDW (and identical segment number) in all processes, as it is a
supervisor segment.
.spb 2
     Privileged_mode_ut$wire_and_mask operates upon inzr_stk0 in the same way
as it does upon stack_0 segments; it sees no distinction.
.ifi l1h "Syserr Daemon Stack"
     As the Ring Zero syserr-log Daemon(2) indeed calls pxss$block during its
processing, it cannot lose its ring zero stack when it loses eligibility.
Thus, it does not participate in the stack-sharing mechanism, and uses the
supervisor segment syserr_daemon_stack as a stack.  The bit apte.hproc informs
the traffic controller not to allow it to participate in stack sharing.
.spb
.inl 10
.unl 5
1.   The cryptic acronym which clearly means "initializer_stack_0" is this way
because of the bootstrap1 name-finding mechanism.  Segments that must be
located by bootstrap1 must have names unique to 8 characters.
.spb
.unl 5
2.   The process which picks up syserr messages from the wired syserr data
segments and places them in the paged syserr log.  It is woken up by syserr
when messages are placed in the wired syserr data segment.  Since wired and
masked code can call syserr, the caller cannot in general touch the paged
syserr log, and hence, the Daemon.
.inl 0
.ifi l2h "Treatment of Stacks in Idle Processes"
     Idle processes have no stack.
.spb 2
     Idle processes have no call side.  They take no segment or linkage
faults.  Thus, they do not need a ring zero stack.  When they take interrupts,
they use the PRDS as a stack, as do other processes.  When not processing
interrupts, a null pointer value is maintained in SP, the Stack Pointer
Register.
.ifi l1h "Stack Frames"
     The principal and most important artifact of a stack is the stack frame.
The stack frame contains the automatic variables and call-out return point of
an active procedure.  All PL/I programs "push" (establish) stack frames on the
stack in use at the time they are called, when they are called, and "pop"
(abandon) them when they return.  Some ALM programs push and pop stack frames,
but others do not.  Those ALM programs that have no automatic variables and do
not call other programs via standard call need no stack frames.  Traffic
control and page control have a complex stack-frame sharing discipline: see
AN61 for more detail on this.
.spb 2
     The first 50 (octal) words are reserved in each stack frame (see the
Subsystem Writer's Guide for the layout thereof).  The first sixteen words are
used by ALM programs making "full standard calls" to other programs to save
the pointer registers.  The ALM "call" macro saves the pointer registers here
and restores them upon return.  An ALM program need not save its pointer
registers if it cares not to; such a program uses the "short_call" macro
instead.  Since PL/I in general does not depend upon registers across a call,
it does not save all the pointer registers; PL/I uses words 0 through 17
(octal) for internal working storage of the PL/I runtime and operators.
Locations 40-47 (octal) is used similarly by ALM programs executing "full
calls" to store the A, Q and index registers, to be restored upon return.
PL/I does not use this area (stack_frame.regs) at all.
.spb 2
     Words 20 through 23 (octal) contain the pointers that thread the stack
frames together on a stack in a forward and backward manner.  One can start
from the stack begin pointer in the stack header and using the forward pointer
in each frame, it is possible to "trace" the stack.  Similarly, starting at
any given stack frame, it is possible to trace forward or backward using the
forward or backward pointers in each stack frame.  The stack header pointers
(stack_header.stack_begin_ptr and stack_header.stack_end_ptr, respectively)
define where the first frame lies and where the next available place for a
frame to be laid down are.
.spb 2
     Pointers in stack frames (the standard part thereof) are constrained to
be standard ITS pointers.  In the case of some special frames, bits 24-29 of
stack_frame.prev_sp are used for flags.  Listed below are some of the more
important ones to notice:
.spb
.fif
     BIT	      OCTAL,dl   Meaning, if O_N_
     24	      4000       This frame belongs to signal_
     26	      1000 The _n_e_x_t frame was laid down by signaller.
     28	      0200 This frame belongs to an environment support
	     procedure.
     29	       0100 At  least one  condition is	 established in
	     this frame.
.spb
.fin
.inl 0
All  of  the stack	frame  flags may  be  found in  the Subsystem
Writer's guide.
.spb 2
     At location 24 (octal) a program stores the address to which it expects
to be returned control when it calls another program.  This quantity
(stack_frame.return_ptr) is only valid if the program had called out to
another program at the time the dump was taken, and is not even meaningful if
the program has never called another program.  The PL/I entry operator
initializes this variable to a pointer to the base of a procedure's text
segment at procedure entry time; this case can be recognized by a pointer to
location zero of a segment here.  When tracing stacks to determine call
histories, this quantity is of primal significance.  By interpreting this
quantity, i.e., identifying the segment pointed to, and resolving the pointer
with respect to the bindmap of the appropriate version of the segment, the
problem analyst can identify the particular statement at which each active
procedure was exited.
.spb 2
     At location 26 (octal) is stack_frame.entry_ptr, which is set by the PL/I
and ALM entry sequences to point to the actual entry point which was called.
By resolving these pointers with respect to the linkinfo's of segments they
describe (the online interative dump analysis tools do this automatically),
the identities of procedures and entrypoints active at the time of the crash
may be determined.  The analyst should be aware that some ALM programs do NOT
set this field, in particular, those that do not push a stack frame.  While
stack_frame.entry_ptr is to be considered invaluable for identifying PL/I
procedures active, it is _n_o_t to be considered authoritative; only
stack_frame.return_ptr can be relied upon.
.spb 2
     Location 32 (octal) (stack_frame.op_ptr) contains a pointer to the
operator segment for most translators.  However, ALM programs use this double
word as a place to store the linkage section pointer.  When an ALM program
does a call, the call operator reloads pointer register 4 (the linkage section
pointer) from this location (it is saved there by the ALM push operator).  The
reason it is reloaded before calling is in case an ALM program is calling a
Version 1 PL/I program that is bound into the same segment as the ALM program.
In this case, the standard Version 1 entry sequence that loads the linkage
section pointer register is not invoked so that the ALM program must ensure
that it is correct.  When Version 1 PL/I programs cease to exist, this will no
longer be a requirement.
.spb 2
     Following the entry pointer is a pointer to the argument list of the
procedure that owns the stack frame.  The format of an argument list is
discussed below.
.spb 2
     At location 36 octal in the stack frame are two 18-bit relative
addresses.  These addresses are relative to the base of the stack frame.  The
first relative address points to a series of 16-word on unit blocks in the
stack frame.  Each on unit block contains a 32-character condition name, a
chain pointer, and some flags.  Listed below is the PL/I declaration for an on
unit block:
.spb
.fif
     dcl 1 on_unit based aligned,
         2 name  ptr,
         2 body ptr,
         2 size fixed bin,
         2 next bit (18) unaligned,
         2 flags unaligned,
	 3 pl1_snap bit (1) unaligned,
	 3 pl1_system bit (1) unaligned,
	 3 pad bit (16) unaligned,
         2 file ptr;
.spb 2
.fin
     At location 31 in the stack frame header is a word known as the operator
return offset.  In fact, this word really consists of two 18-bit halves.  The
left-most 18 bits contain the translator ID.  This is a number that tells what
translator compiled the program that owns the stack frame.  The various IDs
are as follows:
.spb
.fif
     0   Version 2 PL/I
     1   ALM
     2   Version 1 PL/I
     3   signal caller frame
     4   signaller frame
.fin
.spb
The right half of the word holds the address in the program at which a called
operator returns.  This is useful in debugging, for it describes the return
address for certain calls to pl1_operators_.  If a crash occurs and the
machine conditions show that some fault or interrupt was taken by
pl1_operators_, XO contains the return address for the operator that took the
fault or interrupt.  If the operator was forced to use XO, then the operator
return pointer in the stack frame contains the return address.  If this cell
is nonzero, it contains the return address.  If zero, XO contains the return
address.  Given this, one can look at the program that called the operator to
determine why the fault occured.  This applies only to PL/I (and FORTRAN)
programs.
.spb 2
     It is to be noted that PL/I programs reserve the first 100 octal (64
decimal) words of the stack frame for use by the operators and runtime.
Automatic variables for PL/I programs begin at location 100 of the stack
frame, and all PL/I programs have a minimum stack frame size of 100 (octal)
words.
.ifi l1h "Argument Lists"
     An _a_r_g_u_m_e_n_t _l_i_s_t is an array of pointers (to locations in a process), set
up by one procedure when it calls another, specifying the addresses of the
_a_r_g_u_m_e_n_t_s of the first procedure to the second, which are also known as the
_p_a_r_a_m_e_t_e_r_s of the second procedure.  PL/I, which provides the basic runtime
semantics of Multics, specifies call-by-reference as an argument-passing
methodology, and thus, a procedure's parameters are referenced via indirection
through elements of an argument list prepared by its caller.
.spb 2
     Every standard Multics call must construct a standard argument list.  All
PL/I code in the supervisor calling PL/I or other code, or any code at all
calling a PL/I program, uses the standard Multics call.  The only time
non-standard calling sequences are used are within complex ALM-coded
subsystems in the supervisor, such as page control and traffic control.
PL/I's calling of "quick" internal procedures is also non-standard, but
standard argument lists are used in such calls.
.spb 2
     A standard argument list begins on an even word boundary.  It consists of
a two-word _a_r_g_u_m_e_n_t _l_i_s_t _h_e_a_d_e_r, followed by a pointer to each argument, in
order, spaced two words apart.  The address of the pointer to the n'th
argument, where the first argument is number 1, is thus at an offset of 2*n
from the address of the argument list.  The _a_r_g_u_m_e_n_t _p_o_i_n_t_e_rs may thus be
two-word or single-word pointers, although by far the most common case is that
of two-word ITS pointers.  Usually, argument lists and pointers are prepared
"on the fly" at the time of procedure call, but applicable PL/I internal
calls, as well as some calls in ALM-coded subsystems, take advantage of the
ability to specify single-word and ITP pointers to present constant argument
lists, which do not contain ITS pointers.  Argument (and descriptor, see
below) pointers _a_r_e constrained to be indirectable-through by the hardware:
they must be valid indirect address words of any form.  Thus, unaligned
(packed) pointers may not be used in an argument list.
.spb 2
     The argument list header gives the number of arguments in the argument
list it heads: twice this number (being the offset to the last argument) is
stored in the uppermost halfword thereof.  The argument list header also
specifies if _a_r_g_u_m_e_n_t _d_e_s_c_r_i_p_t_o_r_s are provided in the argument list.  When
parametric-length strings (e.g., char (*)) or adjustable arrays are passed, or
an "options (variable)" entry is called, the callee must have sufficient
information to determine at runtime the extents (or lengths, and sometimes
data-types) of its parameters.  If descriptors are passed, the argument list
header, in the upper halfword of its second word, contains twice the number of
descriptor pointers: the descriptor pointers follow the argument pointers, and
are subject to the same restrictions.  The only useful datum to bear in mind
about descriptors is that the low halfword of descriptors for strings tells
the length of the string.  When a PL/I internal entry (an entry point of a
non-quick internal procedure) is called, the argument list includes a quantity
known as the _d_i_s_p_l_a_y _p_o_i_n_t_e_r after the argument pointers, but before the
descriptor pointers (if any).  This quantity is a pointer to the correct
active stack frame of the procedure's containing block,(1) and allows it to
access the variables of its lexically containing blocks.  If a display pointer
(also called an _e_n_v_i_r_o_n_m_e_n_t _p_o_i_n_t_e_r) is provided, the last halfword of the
argument list header contains a 10 (octal).
.spb 2
     The format of the argument list, as well as the format of the header and
all descriptors, may be seen in the Multics Subsystem Writer's Guide, Order
#AK92.
.spb
.inl 10
.unl 5
1.   A PL/I internal procedure can only be called from the block (procedure or
begin block) in which it appears, or from some block contained in that block,
or via an entry variable.  In the first two cases, these are the only blocks
in a program (or set of programs) in which the name of the procedure is known
(it is not declared, and thus undefined in other blocks).  Thus, in the first
case, the stack frame of the calling block is the "correct" stack frame.  In
the second case, inductively, a contained block of the procedure's containing
block was active, and this implies that the containing block was active: the
stack frame of the containing block that called the procedure that called the
procedure..  etc., is the "correct one".  In the third case, the entry
variable had to be assigned a value in an environment such as either of the
above cases, and can only be used validly when the block that assigned it is
active: the correct display pointer is embedded at "closure time" (i.e.,
assignment to the entry variable) in the value of the entry variable.
.spb 2
   

                                          -----------------------------------------------------------


Historical Background

This edition of the Multics software materials and documentation is provided and donated
to Massachusetts Institute of Technology by Group BULL including BULL HN Information Systems Inc. 
as a contribution to computer science knowledge.  
This donation is made also to give evidence of the common contributions of Massachusetts Institute of Technology,
Bell Laboratories, General Electric, Honeywell Information Systems Inc., Honeywell BULL Inc., Groupe BULL
and BULL HN Information Systems Inc. to the development of this operating system. 
Multics development was initiated by Massachusetts Institute of Technology Project MAC (1963-1970),
renamed the MIT Laboratory for Computer Science and Artificial Intelligence in the mid 1970s, under the leadership
of Professor Fernando Jose Corbato. Users consider that Multics provided the best software architecture 
for managing computer hardware properly and for executing programs. Many subsequent operating systems 
incorporated Multics principles.
Multics was distributed in 1975 to 2000 by Group Bull in Europe , and in the U.S. by Bull HN Information Systems Inc., 
as successor in interest by change in name only to Honeywell Bull Inc. and Honeywell Information Systems Inc. .

                                          -----------------------------------------------------------

Permission to use, copy, modify, and distribute these programs and their documentation for any purpose and without
fee is hereby granted,provided that the below copyright notice and historical background appear in all copies
and that both the copyright notice and historical background and this permission notice appear in supporting
documentation, and that the names of MIT, HIS, BULL or BULL HN not be used in advertising or publicity pertaining
to distribution of the programs without specific prior written permission.
    Copyright 1972 by Massachusetts Institute of Technology and Honeywell Information Systems Inc.
    Copyright 2006 by BULL HN Information Systems Inc.
    Copyright 2006 by Bull SAS
    All Rights Reserved