Go to the previous, next section.

Implementation Notes

This appendix contains information mainly of interest to implementors and maintainers of gawk. Everything in it applies specifically to gawk, and not to other implementations.

Downward Compatibility and Debugging

See section Extensions in gawk not in POSIX awk, for a summary of the GNU extensions to the awk language and program. All of these features can be turned off by invoking gawk with the `-W compat' option, or with the `-W posix' option.

If gawk is compiled for debugging with `-DDEBUG', then there is one more option available on the command line:

`-W parsedebug'
Print out the parse stack information as the program is being parsed.

This option is intended only for serious gawk developers, and not for the casual user. It probably has not even been compiled into your version of gawk, since it slows down execution.

Probable Future Extensions

This section briefly lists extensions that indicate the directions we are currently considering for gawk. The file `FUTURES' in the gawk distributions lists these extensions, as well as several others.

RS as a regexp
The meaning of RS may be generalized along the lines of FS.

Control of subprocess environment
Changes made in gawk to the array ENVIRON may be propagated to subprocesses run by gawk.

Databases
It may be possible to map a GDBM/NDBM/SDBM file into an awk array.

Single-character fields
The null string, "", as a field separator, will cause field splitting and the split function to separate individual characters. Thus, split(a, "abcd", "") would yield a[1] == "a", a[2] == "b", and so on.

More lint warnings
There are more things that could be checked for portability.

RECLEN variable for fixed length records
Along with FIELDWIDTHS, this would speed up the processing of fixed-length records.

RT variable to hold the record terminator
It is occasionally useful to have access to the actual string of characters that matched the RS variable. The RT variable would hold these characters.

A restart keyword
After modifying $0, restart would restart the pattern matching loop, without reading a new record from the input.

A `|&' redirection
The `|&' redirection, in place of `|', would open a two-way pipeline for communication with a sub-process (via getline and print and printf).

IGNORECASE affecting all comparisons
The effects of the IGNORECASE variable may be generalized to all string comparisons, and not just regular expression operations.

A way to mix command line source code and library files
There may be a new option that would make it possible to easily use library functions from a program entered on the command line.

GNU-style long options
We will add GNU-style long options to gawk for compatibility with other GNU programs. (For example, `--field-separator=:' would be equivalent to `-F:'.)

Suggestions for Improvements

Here are some projects that would-be gawk hackers might like to take on. They vary in size from a few days to a few weeks of programming, depending on which one you choose and how fast a programmer you are. Please send any improvements you write to the maintainers at the GNU project.

  1. Compilation of awk programs: gawk uses a Bison (YACC-like) parser to convert the script given it into a syntax tree; the syntax tree is then executed by a simple recursive evaluator. This method incurs a lot of overhead, since the recursive evaluator performs many procedure calls to do even the simplest things.

    It should be possible for gawk to convert the script's parse tree into a C program which the user would then compile, using the normal C compiler and a special gawk library to provide all the needed functions (regexps, fields, associative arrays, type coercion, and so on).

    An easier possibility might be for an intermediate phase of awk to convert the parse tree into a linear byte code form like the one used in GNU Emacs Lisp. The recursive evaluator would then be replaced by a straight line byte code interpreter that would be intermediate in speed between running a compiled program and doing what gawk does now.

    This may actually happen for the 3.0 version of gawk.

  2. An error message section has not been included in this version of the manual. Perhaps some nice beta testers will document some of the messages for the future.

  3. The programs in the test suite could use documenting in this manual.

  4. The programs and data files in the manual should be available in separate files to facilitate experimentation.

  5. See the `FUTURES' file for more ideas. Contact us if you would seriously like to tackle any of the items listed there.

Go to the previous, next section.