Go to the previous, next section.
This appendix contains information mainly of interest to implementors and
maintainers of gawk. Everything in it applies specifically to
gawk, and not to other implementations.
See section Extensions in gawk not in POSIX awk,
for a summary of the GNU extensions to the awk language and program.
All of these features can be turned off by invoking gawk with the
`-W compat' option, or with the `-W posix' option.
If gawk is compiled for debugging with `-DDEBUG', then there
is one more option available on the command line:
This option is intended only for serious gawk developers,
and not for the casual user. It probably has not even been compiled into
your version of gawk, since it slows down execution.
This section briefly lists extensions that indicate the directions we are
currently considering for gawk. The file `FUTURES' in the
gawk distributions lists these extensions, as well as several others.
RS as a regexp
RS may be generalized along the lines of FS.
gawk to the array ENVIRON may be
propagated to subprocesses run by gawk.
awk array.
"", as a field separator, will cause field
splitting and the split function to separate individual characters.
Thus, split(a, "abcd", "") would yield a[1] == "a",
a[2] == "b", and so on.
lint warnings
RECLEN variable for fixed length records
FIELDWIDTHS, this would speed up the processing of
fixed-length records.
RT variable to hold the record terminator
RS variable. The RT
variable would hold these characters.
restart keyword
$0, restart would restart the pattern
matching loop, without reading a new record from the input.
getline and
print and printf).
IGNORECASE affecting all comparisons
IGNORECASE variable may be generalized to
all string comparisons, and not just regular expression operations.
gawk for compatibility with other GNU programs.
(For example, `--field-separator=:' would be equivalent to
`-F:'.)
Here are some projects that would-be gawk hackers might like to take
on. They vary in size from a few days to a few weeks of programming,
depending on which one you choose and how fast a programmer you are. Please
send any improvements you write to the maintainers at the GNU
project.
awk programs: gawk uses a Bison (YACC-like)
parser to convert the script given it into a syntax tree; the syntax
tree is then executed by a simple recursive evaluator. This method incurs
a lot of overhead, since the recursive evaluator performs many procedure
calls to do even the simplest things.
It should be possible for gawk to convert the script's parse tree
into a C program which the user would then compile, using the normal
C compiler and a special gawk library to provide all the needed
functions (regexps, fields, associative arrays, type coercion, and so
on).
An easier possibility might be for an intermediate phase of awk to
convert the parse tree into a linear byte code form like the one used
in GNU Emacs Lisp. The recursive evaluator would then be replaced by
a straight line byte code interpreter that would be intermediate in speed
between running a compiled program and doing what gawk does
now.
This may actually happen for the 3.0 version of gawk.
Go to the previous, next section.