09/21/87 hardcore Known errors in the current release of hardcore. # Associated TR's Description 922 phx20933 Some hardcore module needs to know if the disk is operative. This is done by calling disk_control$test_disk or dctl$test_disk. The module loops until the IO is complete. The problems come when the hardware is broken in such a way that the IO never completes. Therefore the pvte.testing is never reset. I gess that this is another place that the disk dim should give up. Because it knows that the IO did not complete and that it is a test type IO. One of the problems with makeing disk_control smarter is that more pages need to be wired in ring 0. 921 phx20930 During a BCE restore tape record sequence errors are occuring at the end of the tape. Sometimes the sequence error shows an actual disk record missing and others appear to only show the tape record numbers in error with the disk record numbers still in sequence (no gap). 920 phx20152 vacate_pv is setting pvte.pc_vacating and pvte.vacating. The use of pvte.vacating is to keep new segments from being created on this pv. pc_vacating will inhibit and new pages being created on this pv. The contract of vacate_pv is only to keep new segments from beeing created. Therefore pvte.pc_vacating should not be set in vacate_pv.pl1. 919 phx20908 Another call in disk_queue to code in >udd>m>lib. The fix will be to remove the -interpret support from the disk_queue command. This should not present any grate problems because it could not be working at sites other than system M. 918 phx20868 The TR claimes a 17th level can exists in the hierachy and problemes exists if the pack is demounted when this segment is active? decativate_for_demount.pl1 line 261. 917 phx20922 disk_control will, on certain types of disk errors such as MPC data alerts, continuously retry the failing IO. The main complant is for bootload_io type at BCE. this includes such things as copy_disk and save and restore commands. The reason for this is disk_control determines that this is a "bad_path" status, its job is to delete this channel and then another will be tried until all channels, save one have been deleted . Then add them all back and just keep doing it over and over again. 897 phx13424 phx17773 phx17819 Problems with directory quota management/enforcement. 895 No automatic hierarchy salvage is occuring when "boot rpvs" or "boot rlvs" is done. 894 phx20661 Linkage error at bce early loading firmware in mpcs. 891 delete_ calls hcs_$get_segment_ptr_path to determine if a segment is known in the calling ring (it wants to call term_ only when known segments are being deleted). The hcs_ gate target is initiate_$get_segment_ptr_path, which currently calls dc_find$obj_initiate to find the object's directory entry. This can cause a superfluous GRANT audit message, since $get_segment_ptr_path only returns a pointer to the segment if it is already known (in any ring) to the process. And it can cause a superfluous DENY audit message, since no operation is performed unless the segment is known. The fix involves creating a new entrypoint, dc_find$obj_initiate_priv, which bypasses access checks and auditing, and changing initiate_$get_segment_ptr_path to call this new entrypoint. The intent of the fix would be to never audit the operation of hcs_$get_segment_ptr_path. This is true even if the caller asks about a segment known only in a ring other than the caller's ring. Since the original audit message included the ring brackets of the segment, it documents the caller's access to the segment from all rings within those ring brackets. 890 phx19527 ioa_$ioa_stream prints garbage or blows up when no control string is given. 887 phx19986 The disk_control$test_drive entry does not wait for an interrupt for its I/O, but polls the status word. For FIPS devices or those on a DAU, this will not work since the status words are not valid under the interrupt is sent. 885 The program install_ttt_ does no auditing. 884 The hcs_$truncate_file entry logs a DENIED message even though other entries log GRANTED, as the reason the call fails (this operation is not allowed for a directory) has nothing to do with access control. 882 It appears that hcs_$make_entry does not null its output argument when it returns an error code, although the documentation states that it does. Since it doesn't modify the output argument at all in this case, this is not a security problem. 881 Several problems with hcs_$fs_move_file and hcs_$fs_move_seg. They return an error code if the caller has rw access to both the source and destination segments, but null access to the directory in which they are contained. The audit messages show various GRANTED and DENIED fs_obj_prop_read's. The reason is that the inner ring module attempts to get the status on the destination to find out its current length. Unfortunately it uses an entry in status_ which returns more information (which requires S on the parent). Since the entries are considered obsolete, it's not worth fixing this silly restriction. Another, more serious problem with hcs_$fs_move_file is that if the user does not have RW access to the destination, error_table_$no_move is returned, but no DENIED is logged. It audits GRANTED read of fs_obj prop, and GRANTED initiation of FS_obj. This was in a case where the user's authorization was greater than the access class of the existing destination segment, so the process had R effective access to the segment and S effective access to the containing dir. This bug should be fixed, but it requires a new entry into dc_find. 880 Many filesystem operations consist of a name lookup followed by an access check. The way dc_find implements these, an operation which requires more than S access to a directory can fail (with error_table_$namedup or error_table_$seg_not_found) and generate no audit message, even though the caller has insufficient access to perform the operation. This occurs when the eventual failure of the operation can be determined from the name lookup. 879 The hcs_$tty_get_name returns a channel name for a channel belonging to a process other than the caller. 877 None of the entries in the dm_hcs_ gate do any auditing. 876 Several file system attribute setting operations generate audit messages which say GRANTED even though the operation is later denied. This happens when M access is required to the parent and the process must be in the write bracket of the entry. Worse, no DENIED audit message is ever generated. The entries in question are: set_$(copysw volume_dump_switches safety_sw_ptr safety_sw_path synchronized_sw max_length_ptr max_length_path entry_bound_ptr entry_bound_path) With the fixing of the bug described in entry 23, the entries set$(damaged_sw_path damaged_sw_ptr dnzp_sw_path dnzp_ptr) must be added to the list. 875 Upgraded directories created under dir privilege are left in a process's address space after dir privilege is turned off. The suspected cause is that the pathname associative memory is not being flushed when dir privilege is turned off. This poses no security problems since only a person with privileges could have gotten into this position. 874 log_read_$position_time will not find any messages later then the latest message in the log at the time that the log was opened. log_read_$position_sequence has the equivalent problem. 872 There is an ambiguity in the definition of "security auditing" that is particularly apparent in the case of append. The ambiguity is this: some system operations make both security-related and non-security-related checks. Either check can fail. If the security check passes, but the non- security check fails, it is unclear what the "correct" security audit message is: Grant, or Deny? The ideal implementation would probably be to indicate the exact situation in the audit message: that access would have been granted, but was not. The current implementation of append (and others) is to audit the access grant, but later abort the operation if the non-security check fails. This is particularly confusing in the case where the requested multi-class max authorization is above the process authorization or in the case that the requested authorization is below the containing directory access class. This is considered to be a non-security related failure (no attempt was made to access information or destroy it) but the error code, ai_restricted, appears security-related. Nonetheless, the audit is a GRANT. This behavior should be documented in MDD004 and in the MDD on Ring 0 Auditing and Logging. 863 phx19695 If Data Management has not yet been used during a bootload, and a fault while in ring-0 causes verify_lock to be invoked, a ring-0 loop will result because verify_lock attempts to reference dm_journal_seg without first checking the switch sst$dm_enabled to determine if data management has been enabled. 861 phx19582 The entry dc_find$dir_move_quota performs an superfluous and incorrect AIM check. It is superfluous because the KST access modes will ensure there is no writedown path and it is incorrect because the call to aim_check_ attempts to compare the access class in the directory header with the access class in the entry for the directory -- both of these should always be equal. The check may be safely removed. 858 phx19491 The alarm_clock_meters command is missing its addname, "acm". The documention claims the addname exists. 856 phx19472 ioi_page_table$ptx_to_ptp may return an invalid pointer it the supplied ptx is invalid. The verify_ptx internal suboutine causes a non-local return (via procedure quit) if the ptx is invalid, this will result in a return to the caller of ioi_page_table$ptx_to_ptp with an invalid return pointer. 852 phx19433 The check_vtoce dir salvager and the volume retriever can both produce segments whose security out of service switch is set on. reset_soos, however, refuses to work on non-directory segments. 851 phx19285 sys_trouble.alm lacks message documentation for "Fault while in masked environment" 850 phx16984 Nothing in MDC will replace missing add-names in >lv. This can cause various inconsistencies., 849 phx17979 Disk MPC's get confused when individual drives generate many, many, errors, and begin to report errors for other drives. This is reported here to cover the TR and to record it for future reference. 841 phx19270 Because page control will not decrement a quota through zero, this can invalidate the assumptions made by fix_quota_used with respect to the constancy of the quota error during operation. 839 phx19254 initiate_ does not distinguish calls from phcs_$initiate's gate target (ring0_init_$initiate) from calls to hcs_$initiate. For the former, attempts to initiate a directory should return error_table_$moderr if user does not have proper ACL or AIM access to the directory. For the latter, it should return the "traditional" error_table_$dirseg, since directories can never be initiated (via hcs_$initiate) from an outer ring. Fixing this may require a change to dc_find$obj_initiate and $obj_initiate_raw since these entrypoints currently map error_table_$moderr into error_table_$dirseg. And the fix may require separating the entrypoint in initiate_ used by ring 0 modules (eg, ring0_init_) from that used by hcs_$initiate. 836 phx19180 vtoc buffer allocation and usage can too easily crash the system from lack of buffers. A more graceful way to warn about pending doom appears in the TR, along with a suggestion for avoiding the problem at ast flush time. 835 phx15923 hc_ipc$send_wakeup should protest if a non-null info pointer is supplied for a fast channel. 833 phx19071 quota uses error_table_$invalid_qmax for any error. It should be more informative. 832 phx19073 You can set maxe as high as max_maxe. Unfortunately, this is too high (does not count max stopped stack_0's) and therefore crashes the system when the system runs out of stack_0's. 831 phx19074 The two calls to range in hc_tune for setting mine are out of order. As such, attempts to set mine above maxe produces the wrong error message. 829 phx18779 add_bit_offset_ (and the corresponding addbitno pl1 builtin) do not properly handle negative bit offsets. Similarly, add_char_offset_ (and the corresponding addcharno pl1 builtin function) do not properly handle negative character offsets. The failure lies in the abd and a9bd instructions, which assume that only positive offsets will be used. These instructions assume that negative offsets will be handled by negating the offset and using the sbd or s9bd instruction to subtract from the bit or character displacement. The proper solution is to detect negative offsets, negate the offsets and use the sbd or s9bd instruction. 828 phx15340 terminate_proc should not truncate the ring 0 stack; it should leave it around for analysis. terminate_proc needs clean up in general. 827 phx18873 Inner rings should not be allowed to set search rules or working dirs. 825 phx15219 Attempts to type start after a call to sub_err_ with the can't restart option causes an illegal return. 822 phx18837 make_msf_ copies the IACL from a dir onto the components of an MSF it creates. If the IACL does not give the specified user w access to these components, then copy/move will fail to be able to copy/move the MSF into the directory. 815 phx18756 Having any AIM privilege on makes RCP think that you are a system_high process. 810 phx18607 If a SCU's size (as correctly described by its config card) is less than the port switches on the CPUs (i.e., it is 3M whereas the CPU says 4M, as it must), running ISOLTS (memory tests) in this case can crash the system with a store fault. 809 phx18517 The system has been known to crash in ioi_masked while processing a channel time-out. 806 phx18566 Typos in fim, et al, misinterpret the hregs bits associated with parity faults. 805 phx18565 The history registers for a parity fault that crashes the system do not appear in the pds. See the TR for details. 798 phx18352 sct_manager_$get is supposed to return a null pointer for non-set sct values. However, it checks for the sct entry being null after converting the null value (a zero packed pointer) into a unpacked pointer. This unpacked pointer is not all zero so sct_manager_'s zero check fails. The fix is to check for zero before the pointer assignment. 783 phx09958 The default potential attributes for a resource in the RTDT can be mistreated when the RTDT is installed. The symptoms are that the attributes are shifted in the attributes word, causing all attempts to access the resource to fail. 775 phx17026 The limit and process_limit fields in the rtdt are ignored. (Actually, only the values for the fields in the default_rtdt on the MST are used.) 765 phx18243 The ring zero derail fault mechanism needs improvement. In particular, it should save as much information as other faults (fault_time especially) so that azm displays this fault in proper order with the others. 760 phx18185 Calling hcs_$grow_lot makes your lot of max size. Calling it again causes a FPE even if you have more room left in the lot. 754 phx17875 It has been experienced, on single physical volume logical volumes, that, when the volume becomes full (and a user encounteres the logical volume full error), that deleting segments from the volume does not seem to reset the logical volume full condition for some number of minutes afterwards. This is not well understood. 751 phx17482 msf_manager_ does not understand multiclass msfs. For such an msf, msf_manager_ will add new components at the aim level of the dir that is the msf, rather than at the aim class of the components of the msf. 749 phx17981 ips signals are not correctly masked in mrd_util_. As a result, it is possible to hit QUIT or have other conditions which can cause operations to fail, killing off the daemon in question. A fix is known. 744 phx17943 phx18054 status_ won't allow the allocated return structures to be in a different segment than the segment supplied as the return area (that is, it doesn't allocate into extensible areas). 742 phx17838 The volume salvager should report page and vtoce bit map inconsistencies. 735 phx17815 set_mdir_quota correctly sets the quota in the vtoce, but incorrectly sets the value in the aste, when inferior dirs have terminal quotas. 733 phx17690 If an error is indicated when an i/o completion of a volmap page is posted, volmap_page does not strip the state away from the page number producing a bogus error message. 732 phx15640 Hardcore sets damage switches for directories and there is no way for users to turn them off. The Salvager should be changed to salvage directories that have the damage switch set and turn it off once salvaging is complete. 731 phx17688 Hardcore should validate pds$stacks (validation_level) before using it. 730 phx17662 A second call to delmain to delete a frame previosuly deleted will cause the calling process to hang on a bogus page wait event. 723 phx17551 More errors in hdx (not copying args, not terminating segments correctly). 722 phx17553 More errors in mdx (not copying parameters, not terminating disk_table_). 721 phx17615 init_disk_pack_ (actually, calling countervalidate_label_) produces an error message not documented within init_disk_pack_. 720 phx17614 init_disk_pack_ references an unreferenced variable when looking for the undocumented copy option. 718 phx17552 mdc_status_ does not properly copy all of its args. For that matter, it doesn't even compile. 717 phx17597 io_syserr_msg is declared to be three words long, but is overlaid with a structure which is five words long. 712 phx17186 You will die if another process deletes your working dir. 711 phx16992 A page error uses mc.errcode to encode the relevant information. Unfortunately, system_startup_ cannot decipher this and crashes the system (which would have happened probably anyway). 708 phx17416 hcs_$status_mins does not work on the root. 707 phx17413 act_proc uses the wrong value when determining maximum possible access class. 705 phx17394 A timeout from resetting a channel from a timeout will cause a fault while in masked environment, crashing the system. 704 phx17374 hcs_$quota_read returns "Some directory in path..." instead of "Entry not found" when the target does not exist but its parent does. 701 phx17302 hcs_$fs_get_brackets will not return the ring brackets of an inner ring object. 700 phx17259 attach_lv references the non-existant error_table_$notacted. 699 phx17257 scavange_vol refers to the non-existant error_table_$no_arg. 698 phx17219 disk_rebuild examines too many bits in a vtoce file map when examining it to see if it is free, when performing volmap compression. This sometimes causes the compression to fail. 696 phx17141 The aste/vtoce.dtm fields are examined to set the dbm_map bits used by the volume dumper when dumping objects. For directories, these fields lead to an incorrect interpretation as to whether a directory has been modified, leading to extraneous directory dumping. 694 phx17132 The volume retriever does not collect enough AIM related information. To process a retrieval request, it needs to store, in ring 1, the user auth, and max auth. Now it only stores the auth, which is automatically stored by message_segment_. The volume retriever needs its own gate to ring 1 which will store the ring, auth, and max auth securely in the message. 693 phx17132 append$retv_append cannot possibly append a multi-class object, since it only has two of the three quantities user auth user max auth desired object max acc THe structure passed to it needs to be changed. 692 phx17141 The volume dumper examines the wrong field when determining if it should dump a directory, thus dumping unneeded directories. 691 phx16992 A page_fault_error occuring at the Initializer's ring-1 command level causes a crash, but the attempt to produce the crash message itself produces a crash because the ring-1 condition handler cannot interpret the mc.errcode value. 690 phx15255 The SCU can return the same value for the clock twice. Some software uniquification isa needed. 689 phx14716 When the directory salvager determines that the sons LVID in a directory header is different from the value in the branch for the directory, it mindlessly copies the value from the branch into the directory header. This has the effect that if the value is wrong in the branch, it will be wrong everywhere afterwards. At least, the salvager should check the value to see whether it's zero (and obviously invalid) before propagating it. This is a genuine problem, and not already on the hardcore error list. The particular problem that provoked this report has been fixed elsewhere, and is no longer relevant, but the general problem remains. 685 phx17055 Various modules, in particular sys_trouble, are missing some error message documentation. 684 phx15585 A situation (not understood) exists in which the records used exceeds the current length, preventing further access to the segment. 682 phx15752 core flushing (for pleasure from the as) should not flush pdir segs. Also, thew scheduling of the core flush is not at precise times. 681 phx15833 reclassify_seg should avoid the work if what it is reclassifying is already at the level it needs to be. 679 phx15852 Both illegal_procedure.pl1 and the documentation suggest that illegal op_code, illegal addr/modifier and other illegal procedure faults should be audited. This third group, however, is not. 678 phx15172 syserr_real should check its error code parameter for non-zero-ness when producing the message text. 676 phx14420 The ascii_to_ebcdic_ and ebcdic_to_ascii_ tables and routines should handle the 256 character ebcdic set and map it onto some extended ascii set. 664 phx17116 The vtoce_checksum implementation is hamstrung by two problems: 1) the "checksum_valid" flag is quite likely to be turned off by damage, causing the checksum to be recalculated for invalid data. 2) part 3 has no checksum, and disk damage quite frequently fries it. 663 phx17010 Hot buffers can fill up vtoc_buffer_seg, crashing the system. The retry fix for 662 reduces the problem, but not all if it, since an authentically broken disk can fill up the segment. 657 phx17050 No gullibility checking, checksumming, or other protection against damage exists for Record 6 -- the vtoc map Record 0 -- the label (except for "Multics Storage System Volume") Damage to these areas can cause widespread disaster, due to confusion as to the location of the paging region! We need: 1) Sentinels on all records of the label 2) Checksums on all records of the label 3) A (or multiple) safe-store records that store only permanent information for recovery from damage to one of the records (like the vtoc map) that contain both permanent and dynamic information. 656 phx17052 Detaching a device with I/O in progress can cause a fualt while in wired environment due to an uninitialized pointer in the reset_device entry of the program ioi_masked. 654 phx16046 No re-verification of the label of an offline disk is made when it comes back online. As a result, mistakes with patch plusgs are extremely dangerous. disk_control should not declare a disk back to life unless the label checks out in some simple fashion. 652 phx16979 The ring 0 portion of the three-ring circus (volume management) is not protected by a cleanup handler, and can leave pvtes in an inconsistent state. 647 phx16929 See the TR for a complete exposition of this. When all 4K aste's have a page in memory get_aste behaves very badly (very slowly). 643 phx16592 master directory acs checking should use raw access. Otherwise, it is impossible to get e access to work right in both ring 4 and ring 1. 642 phx16743 disk_pack.incl.pl1 has the wrong include file listed as the home of the dumper bit map. 634 phx16905 boundsfault.pl1 does not recognize the case where the bound is less than the msl but still within the page table size. This breaks setting the max length within the page table but larger for active segments, since the 10.2 performance optimization for set_max_length took out the setfaults in this case. 628 phx14990 Volume backup to a IO disk does not work with the current implementation of rdisk_ stream IO. The current version has no buffering ability and no sense of logical End of Space (ala EOT on tape) and physical End of Space on the pack, which is needed to allow flushing of IO when this(EOT) is detected. 626 phx16692 append$retv_append has a bug wherein it misuses the "max_authorization" field of the structure. It should just consider that the max to put in the multi-class segment max. There is a companion bug in the retriever (volume) that fills in the structure wrong to begin with. The field has to be filled in with the authorization out of the message segment for the retrieval request. 625 phx16548 When you try to terminate a segment with more than about 250 ref names, the call aborts with the message "The RNT is in an inconsistent state." 623 phx02779 Because of a problem with accepting a zero buffer size, it has been found that a returned hardware status that contains channel or central fault status is being overlooked and assumed to be good. 614 phx16489 ring0_get_ miscdeclares code parameters as fixed bin. 613 phx16351 set_bc should not let you set a negative bit count. (set or change). 611 phx16506 append only checks mountedp when segments are appended, not dirs or links. While this may be convienient, it is inconsistent. The marginal utility of creating dirs and links on unmounted LV's is outweighed by: 1) the inconsistency: some operations work, some don't. 2) for private LV's: the desire to have NOTHING happen to the LV when unmounted. Even if your access to attach a private logical volume has been taken away, you can still append links and dirs. 3) If we ever move dirs onto the LV that they describe, this will clearly have to have the restriction. 4) LV aim restrictions cannot be enforced if the LV is not munted. 605 phx16501 check_mdcs does not salvage quota inconsistencies between master directories and their registration in the mdcs. Only register_mdir does this. This requires the administrator to run register_mdir over each mdir on a logical volume to be sure that everything is consistent. Also, check_mdcs does not validate that a master directory actually has the correct sons logical volume. 604 phx16500 Master directory control allows up to fixed bin (35) worth of quota for an entire logical volume, but many fields are only declared fixed bin; This creates periodic disasters in the control segments. 603 phx16499 Master directory control was not updated when quota was increased to 18 bits. This can cause a wide variety of misbehaviors. 593 phx16015 The file system should log or meter invalid quota changes (attempts to decrement used below 0). 592 phx16093 quota_received is not supported very nicely. The TR complains that it is not reported by any existing hcs_ entry. There are other problems, such as failure of salvagers to correct it, a way to forcibly set it. 587 phx15298 phx16005 peruse_crossref bugs: does not detect LV not mounted; does not initialize brief_sw; does not print satisfactory message when module is not referenced. 583 phx15258 phx15275 Invalid iacl terms cause append to fail. asd_ allows acl terms that are invalid, like R..*, to be added to an initial acl. append fails trying to copy then the assumption that the entire RVL will be mounted, else you will be doing 1pack recovery (a risky assumption). This is a limitation rather than a suggestion since we really aught to have such a mechanism. 581 phx15044 fim should not save history registers that have just been freshly cleared by fim_util. 572 phx14942 act_proc$create fails to return the empty APT entry in almost all error cases. 569 phx14225 Incorrect warning message from scas_init. 568 phx14877 It is impossible to run hc_pf_meters without phcs_ access; metering gate access should be sufficient. 566 phx14824 sweep_pv (segment_mover actually) cannot move rpv-only segments. This makes it difficult or impossible to compress the RPV VTOC. 565 phx14875 When the operator does an x deny (using RCPRM at site) the process still thinks it has the drive. 561 phx14705 The accept_fs_disk check for partitions overlapping gets confused by HIGH hardcore partitions. 557 phx14657 ebcidic_to_ascii_ and ascii_to_ebcidic_ should be in the same bound segment, and not bound in with anything that uses them. This will allow prople to replace them when reading tapes with nonstandard (or, nonMultics) EBCDIC encoding. 529 phx10098 save_dir_info fails if any of the entries in the dir are connection failures. 527 phx08068 Strange things are done with the IC for certain faults in the FIM. Perhaps they should be improved. In particular, the IC reported in the machine conditions for dfmp taking underflows is unexpected. 523 phx05319 ioa_ ^( and ^) execute at least once, instead of zero times, when fed zero things to iterate over. 520 phx14440 page_error displays an erroneous disk address in the error message for an I/O error on the volume map. The fix is to ANA -1,dl before saving the Areg, which contains the disk address in the lower. 518 phx14405 print_configuration_deck does not display negative numbers correctly. It prints them as very large positive numbers. This is not currently a problem, since the BOS command parser does not understand negative numbers completely (and marks them as octal in the config deck). It will be a problem when BOS is fixed or superceded. 516 phx14381 copy_out will fail is requested to copy a segment whose length is larger than 255K. In this case, it should attempt to set the max length to 256K via phcs_ (or hcs_$something, when this operation becomes non-privileged). 514 phx14387 rebuild_disk for the RPV may not copy the root directory correctly. Specifically, modified pages in memory will not be copied - instead, the earlier instances on disk will be copied instead. This may cause a crash during the subsequent initialization until the root in salvaged (due to bad_dir_). The problem is that disk_rebuild (the ring-0 module which does the rebuild) does not call pc$cleanup for entry-held segments (indeed, it should not do so in general). The root directory is entry-held, and so it goes. 513 phx14276 If a trouble fault occurs at a point where it is not caught by fim_util$check_fault, the history registers from the trouble fault will be over-written by those from the subsequent sys_trouble connect. This destroys potentially useful diagnostic data. 501 phx14181 There is a window in ring-0 ITT message processing. If a fault occurs in that window, ITT entries are lost for the bootload. Further, they are lost in a way which disables the logic in pxss which prevents ITT overflow. The likely result is a crash in pxss when the system runs out of ITT entries. 498 phx05686 The time-record product maintained for a directory with a terminal quota account is only an approximation to an ideal space-time integral of disk usage. This approximation is reasonably accurate for accounts which have stable usage, but it has several anomalies for more volatile accounts. The problem is that the cumulative time-record product is updated only when the directory VTOCE is updated (it is incremented by the product of the instantaneous quota used and the delta-time since the last update). If, for example, a large amount of space was used and returned in the interval between updates, there is no accounting for that space. A visible anomaly results from a further approximation when get_quota is invoked. At this time, the time-record product is reported as the value it would have if the VTOCE were being updated at that time (although it is not). For the reasons cited, this can cause time-record product to decrease with time. The only reasonable solution is to maintain time-record product continuously. This would not be expensive computationally, but it would require significantly more wired storage per active segment. 497 phx14069 Most store faults should be recorded into the Syserr Log, as they are usually indicative of faulty hardware [sic.]. hardware_fault should filter out store faults in BAR mode, however, as they are caused by program error. 490 phx13931 Values for select_switch parameters to hcs_$star_XXX entries in star_structures.incl.pl1 are declared as fixed bin (2) (e.g., star_LINKS_ONLY). They should be fixed bin (3). 487 phx13896 It should be possible to change the size of the AST pools while the system is running (well, it should be possible to increase them, anyway). If the SST is expanded to multiple segments, this could be done with moderately more work. 486 phx13897 phx14320 A volume which is inoperative cannot be demounted. There should be a way to do this, such as abandoning everything associated with the volume which is in memory (VTOC buffers, ASTEs, pages, etc.) and marking it as demounted. Also, disk I/O error processing should be smarter about detecting inoperative devices, particularly devices which appear operative but cannot do I/O without errors. Note that this is the one case where it is safe to abandon VTOCE buffers, since nobody will do an await_vtoce afterwards and lose (if demounting does things in the proper order). If there are I/O errors and the volume remains mounted, it is never safe to abandon VTOCE buffers. 468 phx13716 The various tables used in disk volume management (ring-0, ring-1, and ring-4) can become inconsistent. Several instances of this problem have been corrected. One which has not shows itself after an "alv" followed by an "av -all". The ring-4 copy of the disk table is not updated after the second command, preventing pdir_volume_manager_ from knowing that the logical volume is mounted (and hence eligible for pdirs). 460 phx13544 master directory control can become confused if a master directory has a subordinate directory with quota. A set_mdir_quota {plus or minus} X will cause the page control quota of the master directory to be the same as the master directory quota. 448 phx12864 KST overflow has strange effects, not readily traceable to this problem. KST overflow should probably be signalled, rather than indicated by an error code. 436 phx05497 When signaller.alm pushes a stack frame, it first extends the previous frame by 48 words to allow for interrupted push operations. If a non local goto is used to transfer control back into that extended stack frame, it never gets shrunk. Repeated occurences of this will eventually use up the stack. The fix should be to change signaller.alm to put the new frame 48 words up the stack without doing an extension of the existing frame. This requires hand-coding the push, but thats not too hard. The alternative is to try to use a cleanup handler to shrink it, which would be awfully hard since the cleanup handler would be associated with the frame above, which would still be on the stack. Its hard to shrink your callerr's stack frame. 429 phx12689 When cpt is invoked with the -lg control argument, it does not print full pathnames in the summary report. It does, however, print full pathnames in the detailed trace file if -trace is also specified. 410 phx12355 Attempted logins to ring-6 or ring-7 fail, since makestack requires non-null effective access (at the validation level of the initial ring) to signal_, unwinder_, operator_pointers_, and pl1_operators_. These have ring brackets of 0,5,5. The general solution is not clear. Rings 6 and 7 are supposed to be available for totally encapsulated subsystems, with only facilities provided explicitly by the subsystem available. The difficulty is to balance this against the need to provide a rudimentary environment to initialize the subsystem. 409 phx12251 A more compact method of logging I/O errors is needed. Currently, each I/O error is logged into the syserr log. This can flood the log with largely meaningless I/O error messages (for example, when reading a tape of marginal quality. An approach is to write summary records, periodically (based on time or on error thresholds), and optionally record detailed messages. 407 phx12250 Deletion of a segment with wired pages causes the segment not to be deleted, left active, with PTWs for the wired pages having nulled addresses and wired bits on. Under some circumstances, this can cause a system crash. This situation can be caused by a user wiring pages (through hphcs_). This can also happen if a process terminates with an active ioi buffer. 399 phx12134 append$retv should validate the entry supplied more carefully. An instance An instance of the problem is that the cross-retrieval of an object with multiple names will contain a non-null forward name thread in the primary name field. 393 phx12070 phx10495 Segments should be created with access of r to *.SysDaemon, rather than rw. 383 There should be a system-maintained database which keeps track of recent crash history, and types of shutdowns. Possibly it could be as simple as logging, at bootload, the time and type of the last shutdown. The syserr log is probably robust enough, and can easily be scanned to find the information. 382 phx04847 fix_quota_used should also adjust TRP totals in accordance with the adjustment being applied to quota used and the length of time since the last ESD failure crash. This should be automatically driven from the last crash info, and be manually overridable if necessary. 378 phx12013 setfaults should have a recovery strategy for page_fault_errors on a target dseg; probably it should kill the other process, rather than crashing the system with a crawlout with AST lock set. 376 phx12003 trace_mc should use a hardcore segment for the buffer, to avoid problems with recursive faults caused by flushing trailers or dseg ptw misses. 364 phx01612 The iocb structure in iocb.incl.pl1 contains an implicit word of padding between iocb.name and iocb.actual_iocb_ptr, which should be explicitly declared as pad. 362 phx11904 verify_lock should check all ring-0 locks which could be held on call-side. It should not allow a process to crawlout with any ring-0 lock held. For some locks detected by verify_lock, the system should be crashed immediately; for others (vtoc buffer lock), some recovery is possible. 360 phx11870 On a multi-process salvage, one of the processes may take an unexpected error (page_fault_error, for example). This will cause the process to go to a new command level and wait for terminal input. Eventually, all other processes will hang (blocked) waiting for this process to respond to the dispatch wakeup. The solution is probably for do_subtree to establish an any_other handler and do something appropriate on unexpected signals. 357 phx11839 The supervisor should take more pains to ensure that a setfaults operation is performed on segments dynamically marked as damaged, either when the damage is detected, or soon thereafter. 356 phx10004 The primitive for setting the damaged switch should perform a setfaults operation, since it operates in a better environment than page control does when doing so, and it is desirable to provide damage notification as quickly as possible to other processes. 352 phx11831 If a directory hash table overflows while the directory is being rebuilt by salv_dir_checker_, some names on the entry which caused the overflow may not be hashed in correctly. This is because the special-case code to keep hash from faulting on the partially rebuilt directory does not ensure that all the names already processed are rehashed. 306 phx11600 The entry structure (dir_entry.incl.pl1) is misdeclared; the structure takes only 37 words, despite the comment claiming that it takes 38. This seems to be benign, but should be rectified. 305 phx11593 Although there are hcs_ entries to set it, the DNZP switch is not reported by any status_ entrypoints. 303 phx11555 phx06112 phx04846 The quota salvager should correct inconsistencies in quota allocated and quota received fields, as well as quota used. There is presently no way to repair these fields other than BOS PATCH. 300 phx11553 Damage to >lv and >disk_table_ should be detected and acted upon automatically at bootload, rather than requiring use of BOOT NOLV and NODT. 272 phx11009 traffic_control_queue should never be reporting a negative value for tssc. It does so because the snap of the APTEs consumes non-negligible time (due to paging) with no locks held. A fix is to read the current time immediately after copying out the APTEs. 260 phx10996 A volume administrator can adjust the quota on a master directory of which he is not the owner, if he has sma access. This use charges the quota account of the Initializer, which is clearly bogus. 239 phx10114 Although the salvager can set the security-out-of-service bit for segment branches as well as directories, the privileged gate entry to reset the switch works only on directories. It should work on segments as well. 229 phx09675 There should be a mechanism for establishing hardcore crash handlers which would be executed by sys_trouble before crashing the system, so that (for instance) the IMPDIM could shut itself down, by establishing a handler to send a going-down connect to the IMP. 223 phx09383 Attempting to add a memory which is already online causes an OOB fault in reconfigure (line 193) because it fumbles one of the error codes. 222 phx09341 The error message for incorrect access should be specific about the type of access which the process lacks: ACL, ring bracket, or AIM. Presently, some primitives distinguish between ring bracket and ACL violations, and others do not. AIM violations would have to be detected specially; there is no error code for this today. See also entries 78 and 157. 219 phx09240 phx11009 system_performance_graph cannot properly represent more than 100 logged-in users. It should use a different scale, or wrap around. 217 phx09162 When walking the AST to demount a volume, demount_pv gives up upon encountering very minor anomalies, causing ESD to fail completely when it should have almostr succeeded. It needs a better way of walking the AST, to eliminate the "demount_pv: AST out of sync" message. The AST pools should be described by pointers and counts kept in the SST, rather than just by count. 215 phx09082 phx12302 Checking of CPUs which are being added should be both more complete and more flexible. Proper settings for both cache and associative memories should be checked. It should also be possible for a site to over-ride these checks (by arguments to add_cpu). 214 phx09047 There should be a DRL instruction at the beginning of page_fault, so that history registers would be saved if a wild transfer occurred. 213 phx08965 There should be more state recorded in the PVT when a volume cannot be accessed, such as the real fsdisk error coderather than just pvte.device_inoperative. This lack causes add_vol to be unable to distinguish between "drive in protect" and "drive offline". 212 phx08963 The check_trailers procedure can only be enabled by recompilation. It should be possible to simply patch something. 211 phx10123 Messages from hardcore (disk_control, get_aste, hc_dmpr_primitives, etc.) should include the physical volume name where appropriate. This must be preceded by putting the name into ring zero. (see entry 210) 210 phx11769 phx08952 The ring one volume management tables should be direct copies of the ring zero PVT and LVT, which should be changed to include all the information (names and special flags) now only in the disk_table. This is the only real way to fix the problems due to inconsistencies between these databases. 203 phx11765 hcs_$fs_get_mode always returns the 4 bit set in directory modes. It should leave this bit off, like hcs_$get_user_effmode. 199 phx11761 The ioa controls ^e and ^f have difficulty formatting integers. For instance, ^.2f gives completely inappropriate results when given 1234567, though it does fine with 1234567.12 193 phx08451 phx11705 There should be special entries to status_ for the primary name, the link path, and the list of names. The existing status_ interfaces are seriously defective here (see entry 192). See phx11705 for interface details. 189 phx08286 There should be a way to turn on the audit flag in the branch. A primitive mechanism, but better than nothing. Now that the audit flag does nothing, this will become a limitation until a proper per branch audit mechanism is created. 188 phx08284 The privileged quota-setting primitives should log a message when used, to aid in keeping track of the operations. 187 phx08076 When a process running ISOLTS is temrinated abnormally, the CPU and memory is was using for the test are not released. This, despite the code in deact_proc which appears to do just that. 186 phx08263 phx03859 phx06694 There should be a way to interrupt the Initializer process, "no matter what". Perhaps a tiny debugging environment entered on receipt of an execute fault. 184 phx10589 The MPC error counters should be read out and stored in the syserr log when a pack is mounted or dismounted; this would make it much easier to keep track of per-drive error histories. 183 phx07983 phx11700 The system should perform probabilistic verification of disk writes, checking some small fraction of them for success. The fraction would be increased if errors occurred, decreased as the drive was seen to operate, and be manually tunable, as well. 181 phx08237 There should be a way to change the time zone (CLOK card and sys_info correction constant) while the system is running. 179 phx07814 verify_lock will recurse, faulting, if it tries to unlock a directory which is no longer accessable due to seg_fault_error or page_fault_error problems. It should have condition handlers for this. 176 phx07711 The traffic_control_queue command should display the states of all the interesting APTE flags; pre_empt_pending, in particular. 170 phx06979 The system should further analyze the MOS EDAC error messages to the extent that it determines which pages in the SCU are affected by the error, so that the pages can be removed, either manually or automatically. This will also save syserr log space. 167 phx06374 When a hardware fault occurs as a result of an Illegal Action from an SCU, software should unlock the SCU history registers on that SCU, to allow data from a fault which crashes the system (later) to be retained. Unfortunately, it is not possible for software to read these registers. 166 phx06326 The hp_delete command tries to set some AIM flags in the directory it is trying to delete. This will not work if the directory is connection-failed. Since initiate was changes to activate directories immediately, this problem is masked, but hp_delete shouldn't do this anyway. 164 phx04854 phx05954 The UID generator and pxss should check the difference between the last clock reading and the current one periodically, and crash the system if it is too large. This situation arises when a clock makes a sudden jump, and could otherwise seriously damage the file system. 163 phx04854 phx05954 Dates in VTOCEs and directories should be corrected by the volume and hierarchy salvagers. Dates in the future should be set to the current time, and dates from before NSS should be set to some early date. This situation can arise either from damage, or because the clock was incorrectly set. UIDs should also be checked for validity, and reset to new UIDs (from getuid) if they fall outside the range of acceptable times. 161 phx07238 The system should make some attempt to determine whether all the configured IOMs can access a memory module being added. This is probably difficult to do, since it would have to be done by experiment, which might prove disasterous if the IOM configuration panel were not set properly. 157 phx06101 When attempting to append an entry, if the append cannot be performed because of containing directory ring brackets, the error message should be Validation level not in ring bracket, rather than Incorrect access to directory containing entry. 155 phx06075 When a name on a branch is changed, it should be changed in place, so it remains in the same place in the list of names, rather than behave as if it had been deleted and added back. 145 phx03708 The attach_lv command should accept -a as well as -all. 142 phx03109 The FIM should distinguish (via different error codes for termination) between an out-of-bounds on the ring zero stack and one on an outer ring stack, to aid in identifying situations which cause this particular ring zero error condition. 139 phx07240 When there is bad parity in memory, the resulting error messages are very verbose. Especially at ESD time, they should simply be flushed. This requires more specific info about the messages in question to solve well. 137 phx08082 The reclassify_sys_seg primitive doesn't work when system_high equals system_low, because it requires that the segment end up with an acccess class greater than that of the containing directory. This is a limitation derived from the implementation of multi-class segments, which are required by various modules of directory control to really be multi-class. 135 phx07543 When a directory is deleted from another process, strange things happen when it is referenced. Most often, lock takes a fault trying to look at the UID. Perhaps it should have a handler for that condition. 130 phx05245 It is possible for a users virtual CPU time to become very inaccurate as the result of a large number of faults, because of the adjustment which must be applied to compensate for fault processing time. There is no real way to fix this. 121 A crawlout may leave a directory initiated which really should be terminated, cluttering the KST. 119 Reference names for inner ring segments can be made available to outer ring programs; a violation of security. Not well understood. 118 copy_on_write makes the copy unencachable until the next setfaults restores access. Not well understood. 114 The messages in the syserr log describing page control errors are truncated when printed. This appears to be a problem in the printing routines, rather than in page_error or the log itself. 108 phx04071 phx04955 The cleanup handler in an absentee job is never executed if the absentee terminates by a call to cu_$cl. This mechanism should be considerably more robust. 102 phx03345 phx09268 The fim does not properly handle EIS decimal overflows and underflows, in that it does not respect the values to be reset, and also does not reset the IC properly. 95 phx03943 The machine conditions resulting from inability to add a processor should be saved somewhere for later analysis. Presently they are just discarded by init_processor. 80 phx03232 The write_limit is reset at each memory reconfiguration, resulting in the PARM WLIM value apparently being ignored if reconfigurations occur. Should fix it by having reconfiguration not reset it. 77 phx11596 The error code from hcs_$fs_move_xxx is not specific enough, partly due to the lack of a corresponding source/target switch. 73 Pathnames can be much longer than 168 characters (max is 16*32+1, 513). This causes problems for all the interfaces which use the standard char (168) declarations. Fortunately, find_ can handle it, but many user ring programs behave inconsistently. The solution is not easy. 69 phx03152 The initializer can "find" directories by its linker search rules, due to the special-casing in access_mode$effective. This leads to surprising, though harmless, behaviour. 68 phx11588 The structure for hcs_$create_branch_ has not kept up to date with file system changes, and no longer contains all the values which might want to be set when a branch is created. It should be upgraded whenever the file system is changed. 65 The SST, limited in size to but one segment, cannot be made large enough to optimally support the largest configurations available today, and this situation can only get worse. The fix is to split it up into several tables, possibly using more than one segment for the AST itself. This is very hard. 83-01-18: well, try this. Get a pointer register back by changing all references to sst|foo to sst$foo (use pr4, that is). Now, make a wired table of packed pointers to astes. Interpret the aste threads as ndexes into this table. This costs only 1 word per aste, as opposed to changing all 6 threads to packed pointers (3 words). It should just be grunt work to implement. 60 There is no general mechanism for determining how many pages should be wired by pmut$wire_and_mask, since error cases (calls to syserr, mainly) may use up a large amount of stack space not normally required. This has been partially fixed by changing syserr to run on the PRDS when called masked. 53 phx01533 phx01978 ESD will fail if an MPC is broken. Multics should be more robust about dealing with bad hardware, and delete the devices more rapidly. 32 Many system meters overflow when the system stays up for a long time. This causes faults in the idle process, and in various places in ring zero. This is a catch-all error list entry, to be reserved for the general solution if we ever invent one. Other specific entries address specific instances of the problem. 22 phx02203 The quota moving primitives sometimes fail to adjust things properly when working on active directories. More details are not known at this time. 19 If the HC partition on the RPV is not large enough, it may not be possible to boot with a partial RLV. 11 A bad error message is provided if process initialization fails; for instance, if the user has incorrect access to the process overseer. This is possibly an answering service problem, actually. 10 The linker and the fim look at instructions in the object segment itself, rather than in the SCU data. This is just one more reason why execute-only code does not work. 9 The system loops or otherwise misbehaves when the permanent syserr log is damaged. (>sc1>perm_syserr_log) This is partly a vfile_ problem in dealing with trashed keyed vfiles. Should fix syserr_log_man_ to be better about dealing with problems in >sc1>perm_syserr_log. If it has difficulty, it should rename the old one and create a new one, rather than simply giving up and not copying the partition. ----------------------------------------------------------- Historical Background This edition of the Multics software materials and documentation is provided and donated to Massachusetts Institute of Technology by Group BULL including BULL HN Information Systems Inc. as a contribution to computer science knowledge. This donation is made also to give evidence of the common contributions of Massachusetts Institute of Technology, Bell Laboratories, General Electric, Honeywell Information Systems Inc., Honeywell BULL Inc., Groupe BULL and BULL HN Information Systems Inc. to the development of this operating system. Multics development was initiated by Massachusetts Institute of Technology Project MAC (1963-1970), renamed the MIT Laboratory for Computer Science and Artificial Intelligence in the mid 1970s, under the leadership of Professor Fernando Jose Corbato. Users consider that Multics provided the best software architecture for managing computer hardware properly and for executing programs. Many subsequent operating systems incorporated Multics principles. Multics was distributed in 1975 to 2000 by Group Bull in Europe , and in the U.S. by Bull HN Information Systems Inc., as successor in interest by change in name only to Honeywell Bull Inc. and Honeywell Information Systems Inc. . ----------------------------------------------------------- Permission to use, copy, modify, and distribute these programs and their documentation for any purpose and without fee is hereby granted,provided that the below copyright notice and historical background appear in all copies and that both the copyright notice and historical background and this permission notice appear in supporting documentation, and that the names of MIT, HIS, BULL or BULL HN not be used in advertising or publicity pertaining to distribution of the programs without specific prior written permission. Copyright 1972 by Massachusetts Institute of Technology and Honeywell Information Systems Inc. Copyright 2006 by BULL HN Information Systems Inc. Copyright 2006 by Bull SAS All Rights Reserved