02/06/84  lex_string_

The lex_string_ subroutine provides a facility for parsing an ASCII
character string into tokens (character strings delimited by break
characters) and statements (groups of tokens).  It supports the parsing
of comments and quoted strings.  It parses an entire character string
during one invocation, creating a chain of descriptors for the tokens
and statements in a temporary segment.  The cost per token of
lex_string_ is significantly lower than that of parse_file_ because the
overhead of calling parse_file_ to obtain each token is eliminated.
Therefore, the lex_string_ subroutine is recommended for translators
that deal with moderate to large amounts of input.


The descriptors generated when the lex_string_ subroutine parses a
character string can be used as input to translators generated by the
reduction_compiler command, as well as in other applications.  In
addition, the information in the statement and token descriptors can be
used in error messages printed by the lex_error_ subroutine.

Refer to the Subroutines manual for details on the operation of the
lex_string_ subroutine.


Entry points in lex_string_:
   (List is generated by the help command)


:Entry:  init_lex_delims:  02/06/84 lex_string_$init_lex_delims


Function: constructs two character strings from the set of break
characters and comment, quoting, and statement delimiters: one string
contains the first character of every delimiter or break character
defined by the language to be parsed; the second string contains a
character of control information for each character in the first
string.  These two character strings form the break tables that the
lex_string_ subroutine uses to parse an input string.  It is intended
that these two (delimiter and control) character strings be internal
static variables of the program that calls lex_string_, and that they
be initialized only once per process.  They can then be used in
successive calls to lex_string_$lex.


Syntax:
declare lex_string_$init_lex_delims entry (char(*), char(*), char(*),
     char(*), char(*), bit(*), char(*) varying aligned,
     char(*) varying aligned, char(*) varying aligned,
     char(*) varying aligned);
call lex_string_$init_lex_delims (quote_open, quote_close,
     comment_open, comment_close, statement_delim, Sinit, break_chars,
     ignored_break_chars, lex_delims, lex_control_chars);


Arguments:
quote_open
   is the character string delimiter that begins a quoted string.
   (Input).  It can contain up to four characters.  If it is a null
   character string, then quoted strings are not supported during the
   parsing of an input string.
quote_close
   is the character string delimiter that ends a quoted string.
   (Input).  It can be the same character string as quote_open, and can
   contain up to four characters.
comment_open
   is the character string delimiter that begins a comment.  (Input).
   It can contain up to four characters.  If it is a null character
   string, then comments are not supported during the parsing of a
   character string.


comment_close
   is the character string delimiter that ends a comment.  (Input).  It
   can be the same character string as comment_open, and can contain up
   to four characters.
statement_delim
   is the character string delimiter that ends a statement.  (Input).
   It can contain up to four characters.  If it is a null character
   string, then statements are not delimited during the parsing of a
   character string.


Sinit
   is a bit string that controls the creation of statement descriptors
   and token descriptors for quoting delimiters.  (Input)  The bit
   string consists of two bits in the order listed below.
   Ssuppress_quoting_delims
      is "1"b if token descriptors for the quote opening and closing
      delimiters of a quoted string are to be suppressed.  A token
      descriptor is still created for the quoted string itself, and the
      quoted_string switch in this descriptor is turned on.  If
      Ssuppress_quoting_delims is "0"b, then token descriptors are
      returned for the quote opening and closing delimiters, as well as
      for the quoted string.


   Ssuppress_stmt_delims
      is "1"b if the token descriptor for a statement delimiter is to
      be suppressed.  The end_of_stmt switch in the descriptor of the
      token that precedes the statement delimiter is turned on,
      instead.  If Ssuppress_stmt_delims is "0"b, then a token
      descriptor is returned for a statement delimiter, and the
      end_of_stmt switch in this descriptor is turned on.


break_chars
   is a character string containing all of the characters that can be
   used to delimit tokens.  (Input).  The string can include characters
   used also in the quoting, comment, or statement delimiters, and
   should include any ASCII control characters that are to be treated
   as delimiters.
ignored_break_chars
   is a character string containing all of the break_chars that can be
   used to delimit tokens but that are not tokens themselves.  (Input).
   No token descriptors are created for these characters.


lex_delims
   is an output character string containing all of the delimiters that
   the lex_string_ subroutine uses to parse an input string.  (Output)
   This string is constructed by the init_lex_delims entry from the
   preceding arguments.  It must be long enough to contain all of the
   break_chars, plus the first character of the quote_open delimiter,
   the comment_open delimiter, and the statement_delim delimiter, plus
   30 additional characters.  This length must not exceed 128
   characters, the number of characters in the ASCII character set.
lex_control_chars
   is an output character string containing one character of control
   information for each character in lex_delims.  (Output).  This
   string is also constructed by init_lex_delims from the preceding
   arguments.  It must be as long as lex_delims.


:Entry:  lex:  02/06/84 lex_string_$lex


Function: parses an input string according to the delimiters, break
characters, and control information given as its arguments.  The input
string consists of two parts: the first part is a set of characters,
which are to be ignored by the parser except for the counting of
lines; the second part is the characters to be parsed.  It is
necessary to count lines in the part that is otherwise ignored so that
accurate line numbers can be stored in the token and statement
descriptors for the parsed section of the string.


Syntax:
declare lex_string_$lex entry (ptr, fixed bin(21), fixed bin(21), ptr,
     bit(*), char(*), char(*), char(*), char(*), char(*),
     char(*) varying aligned, char(*) varying aligned,
     char(*) varying aligned, char(*) varying aligned, ptr, ptr,
     fixed bin(35));
call lex_string_$lex entry (Pinput, Linput, Lignored_input, Psegment,
     Slex, quote_open, quote_close, comment_open, comment_close,
     statement_delim, break_chars, ignored_break_chars, lex_delims,
     lex_control_chars, Pfirst_stmt_desc, Pfirst_token_desc, code);


Arguments:
Pinput
   is a pointer to the string to be parsed.  (Input)
Linput
   is the length (in characters) of the second part of the input
   string, the part that is actually to be parsed.  (Input)
Lignored_input
   is the length (in characters) of the first part of the input string,
   the part that is ignored except for line counting.  (Input).  This
   length can be 0 if none of the input characters are to be ignored.
Psegment
   is a pointer to a temporary segment created by the translator_temp_
   subroutine.  (Input)


SLex
   is a bit string that controls the creation of statement and comment
   descriptors, the handling of doubled quotes within a quoted string,
   and the interpretation of a comment_close delimiter that equals the
   statement_delim.  (Input).  The bit string consists of four bits:
   Sstatement_desc
      is "1"b if statement descriptors are to be created along with the
      token descriptors.  If Sstatement_desc is "0"b, or if the
      statement delimiter is a null character string, then no statement
      descriptors are created.
   Sscomment_desc
      is "1"b if comment descriptors are to be created for any comments
      that appear in the input string.  When Scomment_desc is "0"b,
      comment_open is a null character string, or statement descriptors
      are not being created, then no comment descriptors are created.


   Sretain_doubled_quotes
      is "1"b if doubled quote_close delimiters that appear within a
      quoted string are to be retained.  If Sretain_doubled_quotes is
      "0"b, then a copy of each quoted string containing doubled
      quote_close delimiters is created in the temporary segment with
      all doubled quote_close delimiters changed to single quote_close
      delimiters.
   Sequate_comment_close_stmt_delim
      is "1"b if the comment_close and statement_delim character
      strings are the same, and if the closing of a comment is to be
      treated as the ending of the statement containing the comment.
      It could be used when parsing line-oriented languages that have
      only one statement per line and one comment per statement.


quote_open
   is the character string delimiter that begins a quoted string.
   (Input).  It can contain up to four characters.  If it is a null
   character string, then quoted strings are not supported during the
   parsing of an input string.
quote_close
   is the character string delimiter that ends a quoted string.
   (Input).  It can be the same character string as quote_open, and can
   contain up to four characters.
comment_open
   is the character string delimiter that begins a comment.  (Input).
   It can contain up to four characters.  If it is a null character
   string, then comments are not supported during the parsing of a
   character string.


comment_close
   is the character string delimiter that ends a comment.  (Input).  It
   can be the same character string as comment_open, and can contain up
   to four characters.
statement_delim
   is the character string delimiter that ends a statement.  (Input).
   It can contain up to four characters.  If it is a null character
   string, then statements are not delimited during the parsing of a
   character string.
break_chars
   is a character string containing all of the characters that can be
   used to delimit tokens.  (Input).  The string can include characters
   used also in the quoting, comment, or statement delimiters, and
   should include any ASCII control characters that are to be treated
   as delimiters.


ignored_break_chars
   is a character string containing all of the break_chars that can be
   used to delimit tokens but that are not tokens themselves.  (Input).
   No token descriptors are created for these characters.
lex_delims
   is the character string initialized by lex_string_$init_lex_delims.
   (Input)
lex_control_chars
   is the character string initialized by lex_string_$init_lex_delims.
   (Input)
Pfirst_stmt_desc
   is a pointer to the first in the chain of statement descriptors.
   (Output).  This is a null pointer on return if no statement
   descriptors have been created.


Pfirst_token_desc
   is a pointer to the first in the chain of token descriptors.
   (Output).  This is a null pointer on return if no tokens were found
   in the input string.
code
   is one of the following status codes:  (Output)
   0
      the parsing was completed successfully.
   error_table_$zero_length_seg
      no tokens were found in the input string.
   error_table_$no_stmt_delim
      the input string did not end with a statement delimiter, when
      statement delimiters were used in the parsing.
   error_table_$unbalanced_quotes
      the input string ended with a quoted string that was not
      terminated by a quote_close delimiter.


                                          -----------------------------------------------------------


Historical Background

This edition of the Multics software materials and documentation is provided and donated
to Massachusetts Institute of Technology by Group BULL including BULL HN Information Systems Inc. 
as a contribution to computer science knowledge.  
This donation is made also to give evidence of the common contributions of Massachusetts Institute of Technology,
Bell Laboratories, General Electric, Honeywell Information Systems Inc., Honeywell BULL Inc., Groupe BULL
and BULL HN Information Systems Inc. to the development of this operating system. 
Multics development was initiated by Massachusetts Institute of Technology Project MAC (1963-1970),
renamed the MIT Laboratory for Computer Science and Artificial Intelligence in the mid 1970s, under the leadership
of Professor Fernando Jose Corbato. Users consider that Multics provided the best software architecture 
for managing computer hardware properly and for executing programs. Many subsequent operating systems 
incorporated Multics principles.
Multics was distributed in 1975 to 2000 by Group Bull in Europe , and in the U.S. by Bull HN Information Systems Inc., 
as successor in interest by change in name only to Honeywell Bull Inc. and Honeywell Information Systems Inc. .

                                          -----------------------------------------------------------

Permission to use, copy, modify, and distribute these programs and their documentation for any purpose and without
fee is hereby granted,provided that the below copyright notice and historical background appear in all copies
and that both the copyright notice and historical background and this permission notice appear in supporting
documentation, and that the names of MIT, HIS, BULL or BULL HN not be used in advertising or publicity pertaining
to distribution of the programs without specific prior written permission.
    Copyright 1972 by Massachusetts Institute of Technology and Honeywell Information Systems Inc.
    Copyright 2006 by BULL HN Information Systems Inc.
    Copyright 2006 by Bull SAS
    All Rights Reserved