Go to the previous, next section.
lex and POSIX
flex is a rewrite of the Unix tool lex (the two
implementations do not share any code, though), with some extensions
and incompatibilities, both of which are of concern to those who wish to
write scanners acceptable to either implementation. At present, the
POSIX lex draft is very close to the original lex
implementation, so some of these incompatibilities are also in
conflict with the POSIX draft. But the intent is that except as noted
below, flex as it presently stands will ultimately be POSIX
conformant (i.e., that those areas of conflict with the POSIX draft will
be resolved in flex's favor). Please bear in mind that all the
comments which follow are with regard to the POSIX draft standard of
Summer 1989, and not the final document (or subsequent drafts); they are
included so flex users can be aware of the standardization issues
and those areas where flex may in the near future undergo changes
incompatible with its current definition.
flex is fully compatible with lex with the following exceptions:
lex scanner internal variable yylineno is
not supported. It is difficult to support this option efficiently,
since it requires examining every character scanned and reexamining the
characters when the scanner backs up. Things get more complicated when
the end of buffer or file is reached or a NUL is scanned (since
the scan must then be restarted with the proper line number count), or
the user uses the yyless, unput, or REJECT actions,
or the multiple input buffer functions.
The fix is to add rules which, upon seeing a newline, increment
yylineno. This is usually an easy process, though it can be a
drag if some of the patterns can match multiple newlines along with
other characters.
yylineno is not part of the POSIX draft.
input routine is not redefinable, though it may be called to
read characters following whatever has been matched by a rule. If
input encounters an end-of-file the normal yywrap
processing is done. A "real" end-of-file is returned by input
as EOF.
Input is instead controlled by redefining the YY_INPUT macro.
The flex restriction that input cannot be redefined is in
accordance with the POSIX draft, but YY_INPUT has not yet been
accepted into the draft (and probably won't; it looks like the draft
will simply not specify any way of controlling the scanner's input other
than by making an initial assignment to `yyin').
flex scanners do not use stdio for input. Because of
this, when writing an interactive scanner one must explicitly call
fflush on the stream associated with the terminal after writing
out a prompt. With lex such writes are automatically flushed
since lex scanners use getchar for their input. Also,
when writing interactive scanners with flex, the `-I' flag
must be used.
flex scanners are not as reentrant as lex scanners. In
particular, if you have an interactive scanner and an
interrupt handler which long-jumps out of the scanner,
and the scanner is subsequently called again, you may
get the following message:
fatal flex scanner internal error--end of buffer missed
To reenter the scanner, first use
yyrestart( yyin );
output is not supported. Output from the ECHO macro is
done to the file-pointer yyout (default stdout).
The POSIX draft mentions that an output routine exists but
currently gives no details as to what it does.
lex does not support exclusive start conditions (`%x'),
though they are in the current POSIX draft.
flex encloses them in
parentheses. With lex, the following:
NAME [A-Z][A-Z0-9]*
%%
foo{NAME}? printf( "Found it\n" );
%%
will not match the string `foo' because, when the macro is
expanded, the rule is equivalent to `foo[A-Z][A-Z0-9]*?' and the
precedence is such that the `?' is associated with
`[A-Z0-9]*'. With flex, the rule will be expanded to
`foo([A-Z][A-Z0-9]*)?' and so the string `foo' will match.
Note that because of this, the `^', `$', `<s>',
`/', and `<<EOF>>' operators cannot be used in a flex
definition.
The POSIX draft interpretation is the same as in flex.
lex one can use `[^]]' but with flex
one must use `[^\]]'. The latter works with lex, too.
yywrap routine, you must include a
`#undef yywrap' in the definitions section (section 1). Note that
the `#undef' will have to be enclosed in `%{}'.
The POSIX draft specifies that yywrap is a function, and this is
very unlikely to change; so flex users are warned that
yywrap is likely to be changed to a function in the near future.
unput, yytext and yyleng are
undefined until the next token is matched. This is not the case with
lex or the present POSIX draft.
lex interprets `abc{1,3}' as "match one, two, or three
occurrences of `abc'," whereas flex interprets it as
"match `ab' followed by one, two, or three occurrences of
`c'." The latter is in agreement with the current POSIX draft.
lex
interprets `^foo|bar' as "match either `foo' at the
beginning of a line, or `bar' anywhere", whereas flex
interprets it as "match either `foo' or `bar' if they
come at the beginning of a line". The latter is in
agreement with the current POSIX draft.
yytext outside of the scanner source file, the
correct definition with flex is `extern char *yytext' rather
than `extern char yytext[]'. This is contrary to the current POSIX
draft but a point on which flex will not be changing, as the
array representation entails a serious performance penalty. It is
hoped that the POSIX draft will be amended to support the flex
variety of declaration (as this is a fairly painless change to require
of lex users).
lex to be stdin;
flex, on the other hand, initializes `yyin' to NULL
and then assigns it to stdin the first time the scanner is called,
providing `yyin' has not already been assigned to a non-NULL
value. The difference is subtle, but the net effect is that with
flex scanners, `yyin' does not have a valid value until the
scanner has been called.
lex are not required by flex scanners; flex ignores
them.
FLEX_SCANNER is #define'd so scanners may be
written for use with either flex or lex.
The following flex features are not included in lex or the
POSIX draft standard:
yyterminate()<<EOF>>YY_DECL#linedirectives `%{}' around actionsyyrestart()comments beginning with `#' (deprecated) multiple actions on a line
This last feature refers to the fact that with flex you can put
multiple actions on the same line, separated with semicolons, while with
lex, the following
foo handle_foo(); ++num_foos_seen;
is (rather surprisingly) truncated to
foo handle_foo();
flex does not truncate the action. Actions that are not enclosed
in braces are simply terminated at the end of the line.
Go to the previous, next section.