Chapter 3: Implementation Languages

He took his vorpal sword in hand:
Long time the manxome foe he sought --

The choice of implementation language has a major effect on the design of a text editor. In some environments, only one language is available. In such environments, you do the best that you can and your editor may end up different from what it would be if the ideal language was available. However, most environments offer at least two languages. You thus have a choice, and this chapter offers guidance in making that choice. Of course, this may be a choice between Scylla and Charybdis...

General Considerations

The general considerations in selecting a language to use for implementing a text editor are:

availability and implementation quality
text handling power
support for extensibility
support for large projects
efficiency

Each of these considerations will be explored in detail.

Availability and Implementation Quality

You can only use those languages that are supported on the system that your text editor is first implemented upon. Nonetheless, you should be thinking about the second, third, and later systems that your text editor will be ported to, and which languages all of those systems support in common.

In addition to the mere presence of a language processor on a system, you should take into consideration the quality of implementation of such systems. An implementation's speed of operation, quality of diagnostics, quality of code produced, and other such factors can make a large difference in the usability of the language on a particular system.

Text Handling Power

It may appear redundant to say that a text editor must handle text, but consider a spread sheet program: most of its work is in handling control flow, figuring redisplay, and setting up to execute commands. Only a small fraction of its time is spent in the floating point instructions that most users think is the program's "real work."

At any given moment, a text editor -- or most any other similar interactive program -- is mainly doing all of the following:

waiting for user input
parsing that input
setting up to execute the commands
executing the commands
determining the effect of those commands on the screen
updating the screen

Most of these operations involve processing text in some way or other. Text editors differ from other applications only in that the "executing the commands" item also involves manipulating text.

It is important to note that "text handling" does not necessarily mean "string handling." In many cases, the language's native string operations are not sufficient, and you must write your own string primitives. For example:

Fortran does not support strings with dynamically varying lengths.
C does not support strings that contain the NUL (0 decimal) character.
Many implementations of Pascal do not support arbitrarily long strings (the leading byte count is often only 8 or 16 bits wide).

Support for Extensibility

If they do nothing else, text editors change. The language should make it easy to make and maintain changes. In some cases, the source code must be changed and the editor recompiled. However, it is very desirable to allow users to change some of the editor to suit their tastes. The language should offer such support.

This support can take many forms:

late binding of names to procedures through indirect calls, dynamic linking, or other techniques
retaining and using the symbol table information at run time so that the user can think of changes in terms of names, not addresses
internal error and consistency checking under program control so that users can be protected from their mistakes
the ability to add code to the executing editor

Not all languages offer these features. You will have to simulate the missing features when using those languages that lack them.

Large Project Support

Text editors are apt to grow quite large. All of the techniques useful for any large project are useful here. Examples of these techniques are:

division of the program into separate modules
division of the program into separate files
separate compilation
a way to organize the global name space
a way to keep objects out of the global name space
automatic verification of procedure call/declaration compatibility
conditional compilation
compilation constants
a way of constructing "data abstractions" that package procedures and private state information
a way of dynamically allocating memory

Add your own favorites to the list.

Efficiency

Programs spend most of their time doing simple operations such as:

A = B
A = B +/- 1
A = B +/- C

No other expressions occur often enough to matter (Knuth 1971). Thus, the language should support these common operations well. Control structure implementations -- in particular, a procedure call -- should be kept efficient. Most languages do all right in this respect: the main thing is to ensure that they keep simple things simple.

Specific Language Notes

This section briefly examines a number of popular and/or interesting language choices. It is important to keep in mind that at some level, all languages are equivalent: anything that you can do in one, you can do in any other, given sufficient CPU time, memory, and programmer elbow grease. However, each language is intended to make solving one type of problem easy, and in most cases that type of problem is not text editing.

TECO

TECO (Text Editor and COrrector) was developed at the Massachusetts Institute of Technology. It was one of the first text editors ever written. It grew over the years, gaining both popularity and features. During one of its more stable periods, Digital Equipment Corporation took a "snapshot" of its commands and produced (subset) versions for all of DEC's computers.

But TECO kept growing. Along the way, it turned into a Turing-complete programming language. Several sets of editor macros were developed and used. Sometime around 1975, Richard Stallman organized these Editor MACroS into the first Emacs-type text editor.

TECO is clearly a language capable of supporting text editing. However, unless you have a DECSYSTEM 20 computer to run it on, you're out of luck: M.I.T.'s version of TECO is written in assembly language and only runs on such systems.

The TECO command set is described in Appendix D. There are two reasons why it is not a good choice as an implementation language:

As has been mentioned, its only implementation is on the PDP-10/DEC 20 series of computers. Implementations on other machines involve answering the question of what you write the TECO in -- the very question that this chapter discusses.
It is the only language less readable than APL. A listing of a TECO program has a more than passing resemblance to transmission line noise. Writing and maintaining TECO programs is a definite problem.

Lisp

Lisp -- especially Common Lisp -- is an excellent choice. It is readily extensible, as even compiled Lisp code usually has provisions for evaluating new expressions. It thus provides an interpretive language that can be readily used to write even complex editing macros. Modern implementations usually have excellent string support. The language has features such as macros and packages that support large projects well, and Lisp programs are fairly readable (if you don't mind lots of parentheses (like these (and these))). Compiled Lisp code is usually as efficient as that of any other language.

Its view of memory management makes it well suited to the linked line form of buffer management (described in Chapter 7).

C

C was designed by people who wrote operating systems and utilities. Since text editors are among those utilities, it is not surprising that C would be a good choice.

C supports extensibility as well as any other compiled language, and better than most. For example, it provides the ability to call procedures through a pointer.

C lacks a built-in string type, but this lack is not a hindrance, as you would probably need to re-implement strings anyway. There is a strong tradition in C of creating new data types, so the requirement is well supported.

C supports many of the features needed for large projects. In addition, as the language was designed by its users, and only came into widespread use after it was stable, there is a large existing base of compatible implementations. Due to this heritage, you don't need "improvements" in the language in order to get useful work done.

C's basic data types are focused around characters, integers, and pointers. These are exactly the core data types needed by text editors. C allows the ready manipulation of complicated data structures and yet remains generally readable.

C++ is a variant of C that provides much improved support for object-oriented programming. It, too, is a good choice.

PL/1

PL/1 is another example of a "systems language." Thus, most of the comments regarding C also apply to PL/1. However, its main failing is a lack of multiple implementations: the only vendor that seriously supports it is IBM Corp.

Other Systems Languages

There are a number of other systems languages (e.g., Modula). However, like PL/1, they have only a limited availability. Many were designed as research projects. None of them even distantly approach C in the number of implementations or trained programmers.

Fortran

Well, some people think that it's a great language for writing astronomy programs. I have even written a text editor in it. Not by choice.

Pascal

Many people consider this language to be a good alternative (read "better") to C. It is worth reviewing Pascal's history: it was originally intended as a language to present (relatively small) algorithms in an academic setting. It was also targeted to introductory programming courses. For those purposes it is an excellent choice.

However, the standard language is not targeted towards developing large projects and does not provide the features that make developing a large project practical. On the other hand, each Pascal vendor has supplied those features. Unfortunately, they have in general chosen different ways to provide the features. Thus leading to incompatible implementations that make porting code difficult.

Basic

Basic has Pascal's problems, only more so: the core version is not even standardized (by the industry: there is an ANSI standard which is honored in the breach). Implementations range from "Tiny Basic," which can be run in only a few Kilobytes of memory to "True Basic," as defined by Kemeny & Kurtz (Kemeny 1985), which offers all the advanced features that you could want and all but omits line numbers. But "True Basic" bears little resemblance to what most programmers think of as the Basic language.

Ada

Ada was designed as a language to support embedded, real-time systems. It has many features which allow compilers to validate code and use external information to produce small, reliable object modules. However, these features do not mesh well with the need for extensiblity (for example, there is rarely a need to reprogram an altimeter while in flight). Further, the general computing environment that is the home for most text editors is simply outside the scope of what Ada is intended for. However, it should be seriously examined as a choice if the text editor is to execute in an embedded, real-time system.

Sine

Sine (Anderson 1979) was a Lisp-like language tailored for text applications. Its only implementation to date is on Interdata 7/32 (or Perkin-Elmer 3200) minicomputers running the MagicSix operating system developed at M.I.T.'s Architecture Machine Group. It is interesting because it is a language tailored for implementing editors. It is an example of an "ideal" implementation language.

Sine is composed of two parts. Sine source code is assembled into a compact format. This object code is then interpreted. It allows function rebinding and other such niceties. In addition, the interpreter implements such things as memory management and screen redisplay. Thus, the resulting editor is nicely structured, with "irrelevant" details hidden away. This mention of Sine leads nicely into...

Custom Editor Languages

No traditional language (except perhaps for Common Lisp) offers complete support for text editing. The solution, used by virtually every implementation of Emacs-type text editors, as well as many implementations of other editors, is the creation of a custom editor language.

An existing language -- very often C -- is selected. This language is used to write an interpreter for the custom editor language. The interpreter manages memory, handles display refresh, and in general provides all of the necessary utility functions. The editor language is then used to write the logic of all the user-visible commands.

As the editor language is implemented using an interpreter, the command set is readily extensible. Also, because the editor language is designed around text editing, it can offer excellent text-handling power.

The division of the programming tasks into two components provides an excellent base for supporting large projects. And, since the interpreter is usually implemented in a language such as C, the interpreter can be quite efficient.

For these reasons, custom editor languages are the preferred method for implementing text editors.

Questions to Probe Your Understanding

What is a good way of implementing a command dispatch table in C? Fortran? Pascal? Ada? (Easy)

Why is a string-oriented language such as SNOBOL not a good choice? (Easy)

How much compilation is appropriate for the custom editor language (none, just interpret the text; tokenization; full)? (Medium)

Following on the previous question, how would an opcode-oriented interpreter compare to a threaded-code interpreter? (Medium)

Back to Contents.

Back to Home.