01/30/86 sort_seg, ss Syntax: ss path {-control_args} Function: orders the contents of a segment according to the ASCII collating sequence. Arguments: path specifies the pathname of an input segment. The star convention is NOT allowed. Control arguments: -output_file path, -of path places the sorted units in a segment whose pathname is path. The use of this control argument is incompatible with -replace. The equal convention is allowed. -replace, -rp replaces the original contents of the input segment with the sorted units. The default is to ask the user if the input segment should be replaced with its sorted contents. -all, -a makes the primary (and only) sort field the entire sort unit; i.e., the entire sort unit is considered when sorting. This is the default mode of operation. -block N, -bk N makes the sort unit a block of N strings where N must be a positive integer. The default for N is 1 (see "Examples" below). -delimiter /REGEXP/, -dm /REGEXP/ uses REGEXP as a regular expression as the string delimiter. Strings to be sorted are delimited by the characters which match the regular expression. See the description of regular expressions under the qedx command. -delimiter L, -dm L makes each L characters of the input segment a delimited string where L is a positive integer. This essentially divides the input into character strings of length L. -delimiter {-string} STR, -dm {-str} STR uses STR concatenated with a newline character as the string delimiter. The character STR can be any sequence of ASCII characters. It can be preceded by -string (-str) to distinguish it from an integer or a regular expression. The default is a single newline character (see "Examples" below). -ascending, -asc makes the sort in ascending order, according to the ASCII collating sequence. This is the default mode of operation. -descending, -dsc makes the sort in descending order, according to the ASCII collating sequence. The use of this control argument is incompatible with the use of the -ascending control argument. -case_sensitive, -cs makes the sort by comparing sort fields without translating letters to lowercase. This is the default. -non_case_sensitive, -ncs makes the sort by translating letters in the sort fields to lowercase when comparing one sort unit to another. The actual sorted results remain unchanged. -character, -ch makes the sort based on the character representation of the sort field. This is the default. -integer, -int makes the sort by converting the sort fields to fixed binary(71,0) integers when comparing one sort unit to another. (see "NOTES" below) -numeric, -num makes the sort by converting the sort fields to float decimal(59) numbers when comparing one sort unit to another. (see "NOTES" below) -field field_specs, -fl field_specs specifies the field (or fields) to be used when comparing two sort units. This allows units to be sorted based upon comparison of only a part of each sort unit. Refer to "Arguments for field specification" for a description of the field_spec operand. The use of this control argument is incompatible with -all. Multiple -field control arguments may be used to specify multiple fields. -duplicates, -dup retains duplicate sort units in the sorted results. This is the default. -only_duplicates, -odup only sort units which occur more than once in the segment appear in the sorted results. One unit from each set of duplicate sort units is placed in the output segment, in sorted order. -only_duplicate_keys, -odupk only sort units which have duplicate sort fields appear in the sorted results. All such units having duplicate sort fields are placed in the output segment, since the non-sort field portions of the units may differ. -only_unique, -ouq only sort units which are unique appear in the sorted results. Whenever a set of duplicate units are found, they are removed entirely from the output segment. -only_unique_keys, -ouqk only sort units which have unique sort fields appear in the sorted results. All units having duplicate sort fields are removed entirely from the output segment. -unique, -uq deletes duplicate sort units from the sorted results. For each set of duplicate sort units, only the first appears in the sorted results, along with nonduplicate sort units. -unique_keys, -uqk deletes sort units having duplicate sort fields from the sorted results. For each set of sort units having duplicate fields, only the first appears in the sorted results, along with nonduplicate sort units. Arguments for field specification: The field_spec arguments of the -field control argument define the fields within each sort unit by which the unit is sorted. The first field_spec defines the primary sort field, the second defines a secondary sort field, and so forth. Each field_spec consists of a field start location, field length, and optional sorting controls: -field field_start field_length {sort_controls} List of field_start formats: The field start location can be specified in one of the following formats: S a positive integer, giving the character position of the start of the field in the sort unit (eg, 1 if the field begins at the first character). If the sort unit contains fewer than S characters, then the unit is sorted as if space characters appeared in the sort field. -from S, -fm S where S is a positive integer giving the character position of the start of the field in the sort unit. -from STR, -fm STR where STR is a character string which identifies the beginning of the sort field. The field begins with the first character of the sort unit which follows STR. If STR does not appear in the sort unit, then the unit is sorted as if the sort field contains space characters. -from /REGEXP/, -fm /REGEXP/ where REGEXP is a regular expression which identifies the beginning of the sort field. The field begins with the first character of the sort unit which follows the part of the sort unit matching REGEXP. See the writeup of the qedx command for the definition of regular expressions. If no match for REGEXP is found in the sort unit, then the unit is sorted as if the sort field contains space characters. -from -string STR, -fm -str STR treats STR as a character string which identifies the beginning of the sort field, even though STR may look like an integer or a regular expression. For example, -from -string 25 identifies a sort field which begins with the character following "25" in the sort unit. List of field_length formats: The sort field length can be specified in one of the following ways: L a positive integer, giving the length of the sort field in characters. If the sort unit is too short to hold a sort field of L characters (that is, if the number of characters from the first character of the sort field to the end of the sort unit is less than L), then the unit is sorted as if the field were extended on the right with space characters to a length of L characters. Alternately, L can be -1 to indicate that the remainder of the sort unit is to be used as the sort field. -for L where L is a positive integer giving the length of the sort field in characters, or -1 to use the remainder of the sort unit as the sort field. -to E where E is a positive integer giving the character position of the end of the sort field in the sort unit (eg, 5 if the field stops after the fifth character of the sort unit). If the sort unit contains fewer then E characters, then the unit is sorted as if space characters were added on the right to extend the unit to E characters. -to STR where STR is a character string which identifies the end of the sort field. The field ends with the first character of the sort unit preceding STR. If STR does not appear in the sort unit after the starting position of the sort field, then the unit is sorted as if space characters appeared in the sort field. -to /REGEXP/ where REGEXP is a regular expression which identifies the end of the sort field. The field ends with the first character of the sort unit which precedes the part of the sort unit matching REGEXP. See the writeup of the qedx command for the definition of regular expressions. If no match for REGEXP is found in the sort unit after the starting position of the sort field, then the unit is sorted as if space characters appeared in the sort field. -to -string STR treats STR as a character string which identifies the end of the sort field, even though STR may look like an integer or a regular expression. Note that when -to is used to specify the end of the field, then sort_seg examines all sort units to determine the length of the longest instance of this sort field in any sort unit. It then sort units as if the sort field in each unit were extended on the right with space characters to the length of the longest sort field instance. List of sort_controls: The sort controls may be one from each of the following sets of arguments. If no sort control is given, then the default is specified by the corresponding control argument (-ascending or -descending, -case_sensitive or -non_case_sensitive, -character or -integer or -numeric). ascending, asc sort units with this field in ascending order. This sort control is incompatible with descending. descending, dsc sort units with this field in descending order. This sort control is incompatible with ascending. non_case_sensitive, ncs sort units by translating this field to lowercase. This sort control is incompatible with case_sensitive. case_sensitive, cs sort units by treating uppercase letters in this field as being different from lowercase letters. This sort control is incompatible with non_case_sensitive. character, ch sort units with this field by the character representation. This sort control is incompatible with integer or numeric. integer, int sort units with this field by converting the character representation to its integer value (fixed binary(71,0)). This sort control is incompatible with character or numeric. numeric, num sort units with this field by converting the character representation to its numeric value (float decimal(59)). This sort control is incompatible with character or integer. Notes: Using the control arguments, the segment is broken down into separate sort units, which are strings or blocks of strings. A string can comprise one or more lines. These sort units are then sorted, and the ordered units either replace the original segment or are placed in a new segment. If the sort_seg command is invoked without any control_args, the -replace, -ascending, -all, -character, and -delimiter control arguments are assumed, and the default delimiter of a newline character is used. That is, the sort_seg command, when invoked with path as the only argument, sorts the lines of that segment as character strings in ascending ASCII collating sequence, replacing the original segment with the sorted result. As a safety measure, the following question is asked when -replace is not specified: Do you really want to sort the contents of PATH? This helps avoid accidental sorting of segments. The start position of a sort field is calculated relative to the beginning of a sort unit. If the blocking factor is N = 1, the start position is calculated relative to the beginning of a string. If the blocking factor is N > 1, the start position is calculated relative to the beginning of the first string of a block. When calculating field specifications within a sort unit of N > 1 strings (blocking factor N > 1), string delimiters internal to the sort unit should not be considered (see "Examples" below). Sort fields/units of unequal length are compared by assuming the shorter field/unit to be padded on the right with space characters, immediately following the rightmost character. If a field/unit contains non-printing graphic characters (such as BS, HT, NL, VT, FF, CR, etc.) which precede the space character in ASCII collating sequence, they will be sorted accordingly, with sometimes unexpected results. The string delimiter is never considered when padding (see "Examples" below). The numeric sort mode converts the sort field character string to a float decimal(59) value for sorting purposes. Similarly, the integer sort mode converts the sort field character string to a fixed bin(71,0) value. The character string representation must be acceptable to the PL/I or Fortran language conversion rules. The actual sort field remains unchanged in the sorted results. If characters are detected in the input segment following the final delimited sort unit, they are ignored for the purposes of sorting, but appear in the sorted output immediately following the final delimited sort unit. An error message specifies the location of the first nondelimited character. A maximum of 261,119 units can be sorted. The sort is stable, i.e., duplicate units appear in the same order in the sorted segment as in the original segment. The input segment is sorted using temporary segments in the process directory. If the -output_file control argument is specified, and path is the pathname of an already existing segment, its contents are destroyed upon beginning the sort. If the sorted results are to replace the original contents of the input file, that replacement does not occur until the last possible moment. The -unique control argument deletes duplicate sort units from the sorted results. The determination of whether or not a sort unit is to be deleted is independent of sort field specifications; i.e., given a number of nonidentical sort units that contain identical sort fields, all the units do appear in the sorted results. The following groups of control arguments are mutually exclusive with other control arguuments in the same group. If more than one from a group is given in a single command, the last one specified in the command overrides the others. The groups are: -all, -field -ascending, -descending -case_sensitive, -non_case_sensitive -character, -integer, -numeric -duplicates, -only_duplicates, -only_duplicate_keys, -unique, -unique_keys -replace, -output_file In addition, if -delimiter is specified several times, the final specification overrides the earlier -delimiter control arguments. Examples: Suppose a segment contains the following lines (where nl represents the ASCII newline character): ABCDEFGHXYnl ABCDEFXYnl ABCDEFGHIJXYnl ABCXYnl The display below shows how the sort_seg command sorts the contents of this segment, according to the arguments specified in the first column (nl stands for the ASCII newline character and # stands for the ASCII space character). these | define these | sorted on | giving arguments | sort units | these fields | these results --------------|------------------|----------------|-------------- -dm XY |ABCDEFGH |ABCDEFGH## |ABCXYnl |ABCDEF |ABCDEF#### |ABCDEFXYnl |ABCDEFGHIJ |ABCDEFGHIJ |ABCDEFGHXYnl |ABC |ABC####### |ABCDEFGHIJXYnl --------------|------------------|----------------|-------------- -bk 2 |ABCDEFGHABCDEF |ABCDEFGHABCDEF |ABCDEFGHXYnl -dm XY |ABCDEFGHIJABC |ABCDEFGHIJABC# |ABCDEFXYnl | | |ABCDEFGHIJXYnl | | |ABCXYnl --------------|------------------|----------------|-------------- -fl 6 4 |ABCDEFGHXY |FGHX |ABCXYnl |ABCDEFXY |FXY# |ABCDEFGHIJXYnl |ABCDEFGHIJXY |FGHI |ABCDEFGHXYnl |ABCXY |#### |ABCDEFXYnl --------------|------------------|----------------|-------------- | |first second | -fl 1 4 7 2 |ABCDEFGHXY |ABCD GH |ABCDEFGHXYnl |ABCDEFXY |ABCD XY |ABCDEFGHIJXYnl |ABCDEFGHIJXY |ABCD GH |ABCDEFXYnl |ABCXY |ABCX ## |ABCXYnl --------------|------------------|----------------|-------------- -dm Y |ABCDEFGHXABCDEFX |FGHX DE |ABCDEFGHIJXYnl -bk 2 |ABCDEFGHIJXABCX |FGHI DE |ABCXYnl -fl 6 4 4 2 | | | | | |ABCDEFGHIJXABCX | | |ABCDEFXYnl --------------|------------------|----------------|-------------- | |first second | -fl 6 4 dsc |ABCDEFGHXY |FGHX CDE |ABCDEFXYnl 3 3 asc |ABCDEFXY |FXY# CDE |ABCDEFGHXYnl |ABCDEFGHIJXY |FGHI CDE |ABCDEFGHIJXYnl |ABCXY |#### CXY |ABCXYnl --------------|------------------|----------------|-------------- -fl 1 3 |ABCDEFGH |ABC |ABCDEFGHXYnl -unique_key |ABCDEF |ABC | -dm XY |ABCDEFGHIJ |ABC | |ABC |ABC | --------------|------------------|----------------|-------------- | |first second | -fl 1 3 5 2 |ABCDEFGH |ABC EF |ABCDEFGHXYnl -odupk |ABCDEF |ABC EF |ABCDEFXYnl -dm XY |ABCDEFGHIJ |ABC EF |ABCDEFGHIJXYnl |ABC |ABC ## | ----------------------------------------------------------- Historical Background This edition of the Multics software materials and documentation is provided and donated to Massachusetts Institute of Technology by Group BULL including BULL HN Information Systems Inc. as a contribution to computer science knowledge. This donation is made also to give evidence of the common contributions of Massachusetts Institute of Technology, Bell Laboratories, General Electric, Honeywell Information Systems Inc., Honeywell BULL Inc., Groupe BULL and BULL HN Information Systems Inc. to the development of this operating system. Multics development was initiated by Massachusetts Institute of Technology Project MAC (1963-1970), renamed the MIT Laboratory for Computer Science and Artificial Intelligence in the mid 1970s, under the leadership of Professor Fernando Jose Corbato. Users consider that Multics provided the best software architecture for managing computer hardware properly and for executing programs. Many subsequent operating systems incorporated Multics principles. Multics was distributed in 1975 to 2000 by Group Bull in Europe , and in the U.S. by Bull HN Information Systems Inc., as successor in interest by change in name only to Honeywell Bull Inc. and Honeywell Information Systems Inc. . ----------------------------------------------------------- Permission to use, copy, modify, and distribute these programs and their documentation for any purpose and without fee is hereby granted,provided that the below copyright notice and historical background appear in all copies and that both the copyright notice and historical background and this permission notice appear in supporting documentation, and that the names of MIT, HIS, BULL or BULL HN not be used in advertising or publicity pertaining to distribution of the programs without specific prior written permission. Copyright 1972 by Massachusetts Institute of Technology and Honeywell Information Systems Inc. Copyright 2006 by BULL HN Information Systems Inc. Copyright 2006 by Bull SAS All Rights Reserved