01/30/86  sort_seg, ss

Syntax:  ss path {-control_args}


Function:  orders the contents of a segment according to the ASCII
collating sequence.


Arguments:
path
   specifies the pathname of an input segment.  The star convention is
   NOT allowed.


Control arguments:
-output_file path, -of path
   places the sorted units in a segment whose pathname is path.  The
   use of this control argument is incompatible with -replace.  The
   equal convention is allowed.
-replace, -rp
   replaces the original contents of the input segment with the sorted
   units.  The default is to ask the user if the input segment should
   be replaced with its sorted contents.
-all, -a
   makes the primary (and only) sort field the entire sort unit; i.e.,
   the entire sort unit is considered when sorting.  This is the
   default mode of operation.


-block N, -bk N
   makes the sort unit a block of N strings where N must be a positive
   integer.  The default for N is 1 (see "Examples" below).
-delimiter /REGEXP/, -dm /REGEXP/
   uses REGEXP as a regular expression as the string delimiter.
   Strings to be sorted are delimited by the characters which match the
   regular expression.  See the description of regular expressions
   under the qedx command.
-delimiter L, -dm L
   makes each L characters of the input segment a delimited string
   where L is a positive integer.  This essentially divides the input
   into character strings of length L.


-delimiter {-string} STR, -dm {-str} STR
   uses STR concatenated with a newline character as the string
   delimiter.  The character STR can be any sequence of ASCII
   characters.  It can be preceded by -string (-str) to distinguish it
   from an integer or a regular expression.  The default is a single
   newline character (see "Examples" below).
-ascending, -asc
   makes the sort in ascending order, according to the ASCII collating
   sequence.  This is the default mode of operation.
-descending, -dsc
   makes the sort in descending order, according to the ASCII collating
   sequence.  The use of this control argument is incompatible with the
   use of the -ascending control argument.


-case_sensitive, -cs
   makes the sort by comparing sort fields without translating letters
   to lowercase.  This is the default.
-non_case_sensitive, -ncs
   makes the sort by translating letters in the sort fields to
   lowercase when comparing one sort unit to another.  The actual
   sorted results remain unchanged.
-character, -ch
   makes the sort based on the character representation of the sort
   field.  This is the default.
-integer, -int
   makes the sort by converting the sort fields to fixed binary(71,0)
   integers when comparing one sort unit to another.  (see "NOTES"
   below)


-numeric, -num
   makes the sort by converting the sort fields to float decimal(59)
   numbers when comparing one sort unit to another.  (see "NOTES"
   below)
-field field_specs, -fl field_specs
   specifies the field (or fields) to be used when comparing two sort
   units.  This allows units to be sorted based upon comparison of only
   a part of each sort unit.  Refer to "Arguments for field
   specification" for a description of the field_spec operand.  The use
   of this control argument is incompatible with -all.  Multiple -field
   control arguments may be used to specify multiple fields.
-duplicates, -dup
   retains duplicate sort units in the sorted results.  This is the
   default.


-only_duplicates, -odup
   only sort units which occur more than once in the segment appear in
   the sorted results.  One unit from each set of duplicate sort units
   is placed in the output segment, in sorted order.
-only_duplicate_keys, -odupk
   only sort units which have duplicate sort fields appear in the
   sorted results.  All such units having duplicate sort fields are
   placed in the output segment, since the non-sort field portions of
   the units may differ.
-only_unique, -ouq
   only sort units which are unique appear in the sorted results.
   Whenever a set of duplicate units are found, they are removed
   entirely from the output segment.


-only_unique_keys, -ouqk
   only sort units which have unique sort fields appear in the sorted
   results.  All units having duplicate sort fields are removed
   entirely from the output segment.
-unique, -uq
   deletes duplicate sort units from the sorted results.  For each set
   of duplicate sort units, only the first appears in the sorted
   results, along with nonduplicate sort units.
-unique_keys, -uqk
   deletes sort units having duplicate sort fields from the sorted
   results.  For each set of sort units having duplicate fields, only
   the first appears in the sorted results, along with nonduplicate
   sort units.


Arguments for field specification:
   The field_spec arguments of the -field control argument define the
   fields within each sort unit by which the unit is sorted.  The first
   field_spec defines the primary sort field, the second defines a
   secondary sort field, and so forth.

   Each field_spec consists of a field start location, field length,
   and optional sorting controls:

-field field_start field_length {sort_controls}


List of field_start formats:  The field start location can be specified
   in one of the following formats:
S
   a positive integer, giving the character position of the start of
   the field in the sort unit (eg, 1 if the field begins at the first
   character).  If the sort unit contains fewer than S characters, then
   the unit is sorted as if space characters appeared in the sort
   field.
-from S, -fm S
   where S is a positive integer giving the character position of the
   start of the field in the sort unit.


-from STR, -fm STR
   where STR is a character string which identifies the beginning of
   the sort field.  The field begins with the first character of the
   sort unit which follows STR.  If STR does not appear in the sort
   unit, then the unit is sorted as if the sort field contains space
   characters.
-from /REGEXP/, -fm /REGEXP/
   where REGEXP is a regular expression which identifies the beginning
   of the sort field.  The field begins with the first character of the
   sort unit which follows the part of the sort unit matching REGEXP.
   See the writeup of the qedx command for the definition of regular
   expressions.  If no match for REGEXP is found in the sort unit, then
   the unit is sorted as if the sort field contains space characters.


-from -string STR, -fm -str STR
   treats STR as a character string which identifies the beginning of
   the sort field, even though STR may look like an integer or a
   regular expression.  For example,

      -from -string 25

   identifies a sort field which begins with the character following
   "25" in the sort unit.


List of field_length formats:  The sort field length can be specified
   in one of the following ways:
L
   a positive integer, giving the length of the sort field in
   characters.  If the sort unit is too short to hold a sort field of L
   characters (that is, if the number of characters from the first
   character of the sort field to the end of the sort unit is less than
   L), then the unit is sorted as if the field were extended on the
   right with space characters to a length of L characters.
   Alternately, L can be -1 to indicate that the remainder of the sort
   unit is to be used as the sort field.
-for L
   where L is a positive integer giving the length of the sort field in
   characters, or -1 to use the remainder of the sort unit as the sort
   field.


-to E
   where E is a positive integer giving the character position of the
   end of the sort field in the sort unit (eg, 5 if the field stops
   after the fifth character of the sort unit).  If the sort unit
   contains fewer then E characters, then the unit is sorted as if
   space characters were added on the right to extend the unit to E
   characters.
-to STR
   where STR is a character string which identifies the end of the sort
   field.  The field ends with the first character of the sort unit
   preceding STR.  If STR does not appear in the sort unit after the
   starting position of the sort field, then the unit is sorted as if
   space characters appeared in the sort field.


-to /REGEXP/
   where REGEXP is a regular expression which identifies the end of the
   sort field.  The field ends with the first character of the sort
   unit which precedes the part of the sort unit matching REGEXP.  See
   the writeup of the qedx command for the definition of regular
   expressions.  If no match for REGEXP is found in the sort unit after
   the starting position of the sort field, then the unit is sorted as
   if space characters appeared in the sort field.
-to -string STR
   treats STR as a character string which identifies the end of the
   sort field, even though STR may look like an integer or a regular
   expression.


   Note that when -to is used to specify the end of the field, then
   sort_seg examines all sort units to determine the length of the
   longest instance of this sort field in any sort unit.  It then sort
   units as if the sort field in each unit were extended on the right
   with space characters to the length of the longest sort field
   instance.


List of sort_controls: The sort controls may be one from each of the
   following sets of arguments.  If no sort control is given, then the
   default is specified by the corresponding control argument
   (-ascending or -descending, -case_sensitive or -non_case_sensitive,
   -character or -integer or -numeric).
ascending, asc
   sort units with this field in ascending order.  This sort control is
   incompatible with descending.
descending, dsc
   sort units with this field in descending order.  This sort control
   is incompatible with ascending.


non_case_sensitive, ncs
   sort units by translating this field to lowercase.  This sort
   control is incompatible with case_sensitive.
case_sensitive, cs
   sort units by treating uppercase letters in this field as being
   different from lowercase letters.  This sort control is incompatible
   with non_case_sensitive.
character, ch
   sort units with this field by the character representation.  This
   sort control is incompatible with integer or numeric.
integer, int
   sort units with this field by converting the character
   representation to its integer value (fixed binary(71,0)).  This sort
   control is incompatible with character or numeric.


numeric, num
   sort units with this field by converting the character
   representation to its numeric value (float decimal(59)).  This sort
   control is incompatible with character or integer.


Notes:  Using the control arguments, the segment is broken down into
separate sort units, which are strings or blocks of strings.  A string
can comprise one or more lines.  These sort units are then sorted, and
the ordered units either replace the original segment or are placed in
a new segment.


If the sort_seg command is invoked without any control_args, the
-replace, -ascending, -all, -character, and -delimiter control
arguments are assumed, and the default delimiter of a newline character
is used.  That is, the sort_seg command, when invoked with path as the
only argument, sorts the lines of that segment as character strings in
ascending ASCII collating sequence, replacing the original segment with
the sorted result.  As a safety measure, the following question is
asked when -replace is not specified:

   Do you really want to sort the contents of PATH?

This helps avoid accidental sorting of segments.


The start position of a sort field is calculated relative to the
beginning of a sort unit.  If the blocking factor is N = 1, the start
position is calculated relative to the beginning of a string.  If the
blocking factor is N > 1, the start position is calculated relative to
the beginning of the first string of a block.  When calculating field
specifications within a sort unit of N > 1 strings (blocking factor
N > 1), string delimiters internal to the sort unit should not be
considered (see "Examples" below).


Sort fields/units of unequal length are compared by assuming the
shorter field/unit to be padded on the right with space characters,
immediately following the rightmost character.  If a field/unit
contains non-printing graphic characters (such as BS, HT, NL, VT, FF,
CR, etc.)  which precede the space character in ASCII collating
sequence, they will be sorted accordingly, with sometimes unexpected
results.  The string delimiter is never considered when padding (see
"Examples" below).


The numeric sort mode converts the sort field character string to a
float decimal(59) value for sorting purposes.  Similarly, the integer
sort mode converts the sort field character string to a fixed bin(71,0)
value.  The character string representation must be acceptable to the
PL/I or Fortran language conversion rules.  The actual sort field
remains unchanged in the sorted results.


If characters are detected in the input segment following the final
delimited sort unit, they are ignored for the purposes of sorting, but
appear in the sorted output immediately following the final delimited
sort unit.  An error message specifies the location of the first
nondelimited character.


A maximum of 261,119 units can be sorted.  The sort is stable, i.e.,
duplicate units appear in the same order in the sorted segment as in
the original segment.


The input segment is sorted using temporary segments in the process
directory.  If the -output_file control argument is specified, and path
is the pathname of an already existing segment, its contents are
destroyed upon beginning the sort.  If the sorted results are to
replace the original contents of the input file, that replacement does
not occur until the last possible moment.


The -unique control argument deletes duplicate sort units from the
sorted results.  The determination of whether or not a sort unit is to
be deleted is independent of sort field specifications; i.e., given a
number of nonidentical sort units that contain identical sort fields,
all the units do appear in the sorted results.


The following groups of control arguments are mutually exclusive with
other control arguuments in the same group.  If more than one from a
group is given in a single command, the last one specified in the
command overrides the others.  The groups are:

   -all, -field

   -ascending, -descending

   -case_sensitive, -non_case_sensitive

   -character, -integer, -numeric

   -duplicates, -only_duplicates, -only_duplicate_keys, -unique,
      -unique_keys

   -replace, -output_file

In  addition,  if  -delimiter  is specified  several  times,  the final
specification overrides the earlier -delimiter control arguments.


Examples:
Suppose a segment contains the following lines (where nl represents the
ASCII newline character):
   ABCDEFGHXYnl
   ABCDEFXYnl
   ABCDEFGHIJXYnl
   ABCXYnl

The display below shows how the  sort_seg command sorts the contents of
this segment, according to the  arguments specified in the first column
(nl stands for  the ASCII newline character and #  stands for the ASCII
space character).


     these    |   define these   |   sorted on    |    giving
   arguments  |    sort units    |  these fields  | these results
--------------|------------------|----------------|--------------
-dm XY        |ABCDEFGH          |ABCDEFGH##      |ABCXYnl
              |ABCDEF            |ABCDEF####      |ABCDEFXYnl
              |ABCDEFGHIJ        |ABCDEFGHIJ      |ABCDEFGHXYnl
              |ABC               |ABC#######      |ABCDEFGHIJXYnl
--------------|------------------|----------------|--------------
-bk 2         |ABCDEFGHABCDEF    |ABCDEFGHABCDEF  |ABCDEFGHXYnl
-dm XY        |ABCDEFGHIJABC     |ABCDEFGHIJABC#  |ABCDEFXYnl
              |                  |                |ABCDEFGHIJXYnl
              |                  |                |ABCXYnl
--------------|------------------|----------------|--------------
-fl 6 4       |ABCDEFGHXY        |FGHX            |ABCXYnl
              |ABCDEFXY          |FXY#            |ABCDEFGHIJXYnl
              |ABCDEFGHIJXY      |FGHI            |ABCDEFGHXYnl
              |ABCXY             |####            |ABCDEFXYnl
--------------|------------------|----------------|--------------
              |                  |first   second  |
-fl 1 4 7 2   |ABCDEFGHXY        |ABCD    GH      |ABCDEFGHXYnl
              |ABCDEFXY          |ABCD    XY      |ABCDEFGHIJXYnl
              |ABCDEFGHIJXY      |ABCD    GH      |ABCDEFXYnl
              |ABCXY             |ABCX    ##      |ABCXYnl
--------------|------------------|----------------|--------------
-dm Y         |ABCDEFGHXABCDEFX  |FGHX    DE      |ABCDEFGHIJXYnl
-bk 2         |ABCDEFGHIJXABCX   |FGHI    DE      |ABCXYnl
-fl 6 4 4 2   |                  |                |
              |                  |                |ABCDEFGHIJXABCX
              |                  |                |ABCDEFXYnl
--------------|------------------|----------------|--------------
              |                  |first   second  |
-fl 6 4 dsc   |ABCDEFGHXY        |FGHX    CDE     |ABCDEFXYnl
 3 3 asc      |ABCDEFXY          |FXY#    CDE     |ABCDEFGHXYnl
              |ABCDEFGHIJXY      |FGHI    CDE     |ABCDEFGHIJXYnl
              |ABCXY             |####    CXY     |ABCXYnl
--------------|------------------|----------------|--------------
-fl 1 3       |ABCDEFGH          |ABC             |ABCDEFGHXYnl
-unique_key   |ABCDEF            |ABC             |
-dm XY        |ABCDEFGHIJ        |ABC             |
              |ABC               |ABC             |
--------------|------------------|----------------|--------------
              |                  |first   second  |
-fl 1 3 5 2   |ABCDEFGH          |ABC     EF      |ABCDEFGHXYnl
-odupk        |ABCDEF            |ABC     EF      |ABCDEFXYnl
-dm XY        |ABCDEFGHIJ        |ABC     EF      |ABCDEFGHIJXYnl
              |ABC               |ABC     ##      |


                                          -----------------------------------------------------------


Historical Background

This edition of the Multics software materials and documentation is provided and donated
to Massachusetts Institute of Technology by Group BULL including BULL HN Information Systems Inc. 
as a contribution to computer science knowledge.  
This donation is made also to give evidence of the common contributions of Massachusetts Institute of Technology,
Bell Laboratories, General Electric, Honeywell Information Systems Inc., Honeywell BULL Inc., Groupe BULL
and BULL HN Information Systems Inc. to the development of this operating system. 
Multics development was initiated by Massachusetts Institute of Technology Project MAC (1963-1970),
renamed the MIT Laboratory for Computer Science and Artificial Intelligence in the mid 1970s, under the leadership
of Professor Fernando Jose Corbato. Users consider that Multics provided the best software architecture 
for managing computer hardware properly and for executing programs. Many subsequent operating systems 
incorporated Multics principles.
Multics was distributed in 1975 to 2000 by Group Bull in Europe , and in the U.S. by Bull HN Information Systems Inc., 
as successor in interest by change in name only to Honeywell Bull Inc. and Honeywell Information Systems Inc. .

                                          -----------------------------------------------------------

Permission to use, copy, modify, and distribute these programs and their documentation for any purpose and without
fee is hereby granted,provided that the below copyright notice and historical background appear in all copies
and that both the copyright notice and historical background and this permission notice appear in supporting
documentation, and that the names of MIT, HIS, BULL or BULL HN not be used in advertising or publicity pertaining
to distribution of the programs without specific prior written permission.
    Copyright 1972 by Massachusetts Institute of Technology and Honeywell Information Systems Inc.
    Copyright 2006 by BULL HN Information Systems Inc.
    Copyright 2006 by Bull SAS
    All Rights Reserved