02/06/84 lex_string_ The lex_string_ subroutine provides a facility for parsing an ASCII character string into tokens (character strings delimited by break characters) and statements (groups of tokens). It supports the parsing of comments and quoted strings. It parses an entire character string during one invocation, creating a chain of descriptors for the tokens and statements in a temporary segment. The cost per token of lex_string_ is significantly lower than that of parse_file_ because the overhead of calling parse_file_ to obtain each token is eliminated. Therefore, the lex_string_ subroutine is recommended for translators that deal with moderate to large amounts of input. The descriptors generated when the lex_string_ subroutine parses a character string can be used as input to translators generated by the reduction_compiler command, as well as in other applications. In addition, the information in the statement and token descriptors can be used in error messages printed by the lex_error_ subroutine. Refer to the Subroutines manual for details on the operation of the lex_string_ subroutine. Entry points in lex_string_: (List is generated by the help command) :Entry: init_lex_delims: 02/06/84 lex_string_$init_lex_delims Function: constructs two character strings from the set of break characters and comment, quoting, and statement delimiters: one string contains the first character of every delimiter or break character defined by the language to be parsed; the second string contains a character of control information for each character in the first string. These two character strings form the break tables that the lex_string_ subroutine uses to parse an input string. It is intended that these two (delimiter and control) character strings be internal static variables of the program that calls lex_string_, and that they be initialized only once per process. They can then be used in successive calls to lex_string_$lex. Syntax: declare lex_string_$init_lex_delims entry (char(*), char(*), char(*), char(*), char(*), bit(*), char(*) varying aligned, char(*) varying aligned, char(*) varying aligned, char(*) varying aligned); call lex_string_$init_lex_delims (quote_open, quote_close, comment_open, comment_close, statement_delim, Sinit, break_chars, ignored_break_chars, lex_delims, lex_control_chars); Arguments: quote_open is the character string delimiter that begins a quoted string. (Input). It can contain up to four characters. If it is a null character string, then quoted strings are not supported during the parsing of an input string. quote_close is the character string delimiter that ends a quoted string. (Input). It can be the same character string as quote_open, and can contain up to four characters. comment_open is the character string delimiter that begins a comment. (Input). It can contain up to four characters. If it is a null character string, then comments are not supported during the parsing of a character string. comment_close is the character string delimiter that ends a comment. (Input). It can be the same character string as comment_open, and can contain up to four characters. statement_delim is the character string delimiter that ends a statement. (Input). It can contain up to four characters. If it is a null character string, then statements are not delimited during the parsing of a character string. Sinit is a bit string that controls the creation of statement descriptors and token descriptors for quoting delimiters. (Input) The bit string consists of two bits in the order listed below. Ssuppress_quoting_delims is "1"b if token descriptors for the quote opening and closing delimiters of a quoted string are to be suppressed. A token descriptor is still created for the quoted string itself, and the quoted_string switch in this descriptor is turned on. If Ssuppress_quoting_delims is "0"b, then token descriptors are returned for the quote opening and closing delimiters, as well as for the quoted string. Ssuppress_stmt_delims is "1"b if the token descriptor for a statement delimiter is to be suppressed. The end_of_stmt switch in the descriptor of the token that precedes the statement delimiter is turned on, instead. If Ssuppress_stmt_delims is "0"b, then a token descriptor is returned for a statement delimiter, and the end_of_stmt switch in this descriptor is turned on. break_chars is a character string containing all of the characters that can be used to delimit tokens. (Input). The string can include characters used also in the quoting, comment, or statement delimiters, and should include any ASCII control characters that are to be treated as delimiters. ignored_break_chars is a character string containing all of the break_chars that can be used to delimit tokens but that are not tokens themselves. (Input). No token descriptors are created for these characters. lex_delims is an output character string containing all of the delimiters that the lex_string_ subroutine uses to parse an input string. (Output) This string is constructed by the init_lex_delims entry from the preceding arguments. It must be long enough to contain all of the break_chars, plus the first character of the quote_open delimiter, the comment_open delimiter, and the statement_delim delimiter, plus 30 additional characters. This length must not exceed 128 characters, the number of characters in the ASCII character set. lex_control_chars is an output character string containing one character of control information for each character in lex_delims. (Output). This string is also constructed by init_lex_delims from the preceding arguments. It must be as long as lex_delims. :Entry: lex: 02/06/84 lex_string_$lex Function: parses an input string according to the delimiters, break characters, and control information given as its arguments. The input string consists of two parts: the first part is a set of characters, which are to be ignored by the parser except for the counting of lines; the second part is the characters to be parsed. It is necessary to count lines in the part that is otherwise ignored so that accurate line numbers can be stored in the token and statement descriptors for the parsed section of the string. Syntax: declare lex_string_$lex entry (ptr, fixed bin(21), fixed bin(21), ptr, bit(*), char(*), char(*), char(*), char(*), char(*), char(*) varying aligned, char(*) varying aligned, char(*) varying aligned, char(*) varying aligned, ptr, ptr, fixed bin(35)); call lex_string_$lex entry (Pinput, Linput, Lignored_input, Psegment, Slex, quote_open, quote_close, comment_open, comment_close, statement_delim, break_chars, ignored_break_chars, lex_delims, lex_control_chars, Pfirst_stmt_desc, Pfirst_token_desc, code); Arguments: Pinput is a pointer to the string to be parsed. (Input) Linput is the length (in characters) of the second part of the input string, the part that is actually to be parsed. (Input) Lignored_input is the length (in characters) of the first part of the input string, the part that is ignored except for line counting. (Input). This length can be 0 if none of the input characters are to be ignored. Psegment is a pointer to a temporary segment created by the translator_temp_ subroutine. (Input) SLex is a bit string that controls the creation of statement and comment descriptors, the handling of doubled quotes within a quoted string, and the interpretation of a comment_close delimiter that equals the statement_delim. (Input). The bit string consists of four bits: Sstatement_desc is "1"b if statement descriptors are to be created along with the token descriptors. If Sstatement_desc is "0"b, or if the statement delimiter is a null character string, then no statement descriptors are created. Sscomment_desc is "1"b if comment descriptors are to be created for any comments that appear in the input string. When Scomment_desc is "0"b, comment_open is a null character string, or statement descriptors are not being created, then no comment descriptors are created. Sretain_doubled_quotes is "1"b if doubled quote_close delimiters that appear within a quoted string are to be retained. If Sretain_doubled_quotes is "0"b, then a copy of each quoted string containing doubled quote_close delimiters is created in the temporary segment with all doubled quote_close delimiters changed to single quote_close delimiters. Sequate_comment_close_stmt_delim is "1"b if the comment_close and statement_delim character strings are the same, and if the closing of a comment is to be treated as the ending of the statement containing the comment. It could be used when parsing line-oriented languages that have only one statement per line and one comment per statement. quote_open is the character string delimiter that begins a quoted string. (Input). It can contain up to four characters. If it is a null character string, then quoted strings are not supported during the parsing of an input string. quote_close is the character string delimiter that ends a quoted string. (Input). It can be the same character string as quote_open, and can contain up to four characters. comment_open is the character string delimiter that begins a comment. (Input). It can contain up to four characters. If it is a null character string, then comments are not supported during the parsing of a character string. comment_close is the character string delimiter that ends a comment. (Input). It can be the same character string as comment_open, and can contain up to four characters. statement_delim is the character string delimiter that ends a statement. (Input). It can contain up to four characters. If it is a null character string, then statements are not delimited during the parsing of a character string. break_chars is a character string containing all of the characters that can be used to delimit tokens. (Input). The string can include characters used also in the quoting, comment, or statement delimiters, and should include any ASCII control characters that are to be treated as delimiters. ignored_break_chars is a character string containing all of the break_chars that can be used to delimit tokens but that are not tokens themselves. (Input). No token descriptors are created for these characters. lex_delims is the character string initialized by lex_string_$init_lex_delims. (Input) lex_control_chars is the character string initialized by lex_string_$init_lex_delims. (Input) Pfirst_stmt_desc is a pointer to the first in the chain of statement descriptors. (Output). This is a null pointer on return if no statement descriptors have been created. Pfirst_token_desc is a pointer to the first in the chain of token descriptors. (Output). This is a null pointer on return if no tokens were found in the input string. code is one of the following status codes: (Output) 0 the parsing was completed successfully. error_table_$zero_length_seg no tokens were found in the input string. error_table_$no_stmt_delim the input string did not end with a statement delimiter, when statement delimiters were used in the parsing. error_table_$unbalanced_quotes the input string ended with a quoted string that was not terminated by a quote_close delimiter. ----------------------------------------------------------- Historical Background This edition of the Multics software materials and documentation is provided and donated to Massachusetts Institute of Technology by Group BULL including BULL HN Information Systems Inc. as a contribution to computer science knowledge. This donation is made also to give evidence of the common contributions of Massachusetts Institute of Technology, Bell Laboratories, General Electric, Honeywell Information Systems Inc., Honeywell BULL Inc., Groupe BULL and BULL HN Information Systems Inc. to the development of this operating system. Multics development was initiated by Massachusetts Institute of Technology Project MAC (1963-1970), renamed the MIT Laboratory for Computer Science and Artificial Intelligence in the mid 1970s, under the leadership of Professor Fernando Jose Corbato. Users consider that Multics provided the best software architecture for managing computer hardware properly and for executing programs. Many subsequent operating systems incorporated Multics principles. Multics was distributed in 1975 to 2000 by Group Bull in Europe , and in the U.S. by Bull HN Information Systems Inc., as successor in interest by change in name only to Honeywell Bull Inc. and Honeywell Information Systems Inc. . ----------------------------------------------------------- Permission to use, copy, modify, and distribute these programs and their documentation for any purpose and without fee is hereby granted,provided that the below copyright notice and historical background appear in all copies and that both the copyright notice and historical background and this permission notice appear in supporting documentation, and that the names of MIT, HIS, BULL or BULL HN not be used in advertising or publicity pertaining to distribution of the programs without specific prior written permission. Copyright 1972 by Massachusetts Institute of Technology and Honeywell Information Systems Inc. Copyright 2006 by BULL HN Information Systems Inc. Copyright 2006 by Bull SAS All Rights Reserved