Chapter 8: User-Oriented Commands: The Command Loop

He left it dead, and with its head
  He went galumphing back.

The previous two chapters described a way of dividing an implementation into parts and covered the internal sub-editor and redisplay. This chapter describes the last part: the user-oriented commands. This last part is what gives the editor its "feel." It determines the overall command structure (the syntax) and what each of the commands does (the semantics).

Command structure is a large enough topic to be divided into two chapters. This chapter describes how to implement the command structure. The next chapter covers command set design issues. It thus describes what commands should be implemented.

The Core Loop: Read, Evaluate, Print

The command loop is built around a basic core. This core reads in commands, evaluates (or executes) them, and prints the results.

Reading commands is the process of accepting user input and determining what operations the user wishes to perform.

Evaluating commands is the process of carrying out the user's wishes. In general, this is done by executing a series of sub-editor calls.

Printing the results is the redisplay.

The core loop looks like this:

	char c;

	while (1) {
		c = Key_Get();
		if (Evaluate(c)) break;
		Redisplay();
		}

This loop accepts user input (a single character), evaluates it, exits (if the user has requested to quit the editor, causing Evaluate to return True), and invokes Redisplay. This, like all program examples in this chapter, is a simplified version of just one of the many ways you can implement these functions. They are meant as examples, not as limits.

The Evaluate Procedure

	FLAG Evaluate(char c)
		{
		FLAG is_exit = FALSE;
		FLAG is_arg = FALSE;
		int arg = 1;

		while (!(*commands[c])(&is_arg, &arg, &is_exit, c)) {
			c = Key_Get();
			Redisplay();
			}
		return(is_exit);
		}

This is the core of the Evaluate routine. The is_exit flag records whether the command is one to exit the editor. The is_arg flag records whether the user has specified a repeat-count argument. The arg variable records the repeat-count argument.

This routine -- and the editor implementation -- is built around a set of command dispatch tables. Each table is an array of pointers to procedures, indexed by command characters. Thus, the element:

	commands['a']

would specify the procedure to handle the command designated by the "a" character. These procedures all have the same interface. This interface is:

	FLAG Command_Procedure(FLAG *is_argptr, int *argptr,
		FLAG *is_exitptr, char c);

The first three arguments are pointers to the three state variables. They are pointers instead of the values so that the command procedures can alter their values. The fourth argument is the character that is used to invoke the procedure. The procedure returns True if the command has completed, or False if the command is incomplete.

The reasons for selecting this interface will be made clear through four sample command procedures: "move by character," "insert character," "second-level dispatch," and "accept an argument."

Move by a Character

This procedure moves forward by arg characters if arg is positive or backward by arg characters if arg is negative. It looks like this:

	FLAG Move_by_Character(FLAG *is_argptr, int *argptr,
		 FLAG *is_exitptr, char c)
		{
		Point_Move(*argptr);
		return(TRUE);
		}

Insert a Character

This procedure inserts arg copies of the character used to invoke it. If arg is negative, its absolute value is used. It looks like this:

	FLAG Insert_A_Character(FLAG *is_argptr, int *argptr,
		 FLAG *is_exitptr, char c)
		{
		int arg = *argptr;

		if (arg < 0) arg = -arg;
		while (arg-- > 0) Insert_Char(c);
		return(TRUE);
		}

Second-Level Dispatch

This procedure doesn't implement a command itself. Rather, it accepts a second character and uses that to select a command from a second dispatch table. In this case, the "^X" character will be used as the dispatch.

	FLAG Ctrl_X_Dispatch(FLAG *is_argptr, int *argptr,
		 FLAG *is_exitptr, char c)
		{
		char c;

		c = Delayed_Display(CTRL_X_DELAY, "^X ");
		return(*ctrl_x_commands[c])(is_argptr, argptr,
			is_exitptr, c));
		}

The Delayed_Display routine waits for a character and returns it. If more than a specified amount of time passes with no input, the prompt string is displayed.

Note that this routine passes the arguments and exit status to and from the second-level command routine.

Accept an Argument

Again, this procedure doesn't implement a command itself. Rather, it performs one step of accepting a numeric argument. For the purposes of this example, we will assume that all digit characters specify an argument and do not insert themselves.

	FLAG Argument(FLAG *is_argptr, int *argptr,
		 FLAG *is_exitptr, char c)
		{
		if (!*is_argptr) {	/* no arg yet */
			*is_argptr = TRUE;
			*argptr = 0;
			}
		*argptr = *argptr * 10 + c - '0';
		return(FALSE);
		}

This routine is the first one that does not completely execute the command. Rather, it modifies the state information that is passed to the command procedure itself.

(Note: this routine does not implement the Emacs "universal argument" command, but is a simplified version for the purposes of this example only. It actually performes vi-style argument handling.)

Philosophy

The loop as described puts few (theoretical) restrictions on the command syntax. Each character, in its raw form, is mapped to a procedure which is in turn evaluated. State information is passed to and from this procedure, which can either update the state information, perform an operation, or both. Arbitrary syntax and semantics can be implemented with this base.

In theory, a syntax of commands being words (e.g., "delete," "move," etc.) could be implemented in this structure by having either a large number of dispatch tables (and thus implementing a symbol-state table architecture) or a procedure which parses the syntax of the command via conditional statements. If you really want to do one of these, you will want to invent your own -- different -- internal structure.

A Minimalist Command Set Design

Consider the thought that every character that is typed at the keyboard causes a procedure to be executed. The first conclusion that results is that it is silly to type "insert x" or anything like that when you want "x" to be inserted. As this is a very common operation, it makes more sense to bind the key "x" to an "InsertX" function (or, more probably, the Insert_A_Character procedure just defined).

This architecture binds all of the straight, printing ASCII characters to commands that insert the character. The remaining things that can be entered from most keyboards are the control characters, the delete key, and the break key. These could be bound to functions that implement a complex syntax, but why bother? It is not too difficult for users to learn even a large number of key bindings, so let us bind the control keys directly to useful functions. For example, ^F could be "move forward a character," ^D could be "delete the following character," and so forth. Note that the "break" key does not have an ASCII value and is therefore difficult to use without writing operating system-specific code.

Thirty-three functions (the 32 control characters plus the Delete character) are not enough for even the commonly used functions. Thus, some of the keys should be bound to functions which temporarily rebind the dispatch table. For each of these rebinding functions, 128 new functions are made available (there is no reason for the printing characters in those second-level tables to be bound to "self insert").

Thus, even though we began with a structure for the command loop that did not impose any constraints on the syntax of commands (and thus was as general as possible), we arrived at a specific syntax for commands. This syntax is to bind the printing characters to "self insert," bind the control characters to a mixture of useful functions and second-level dispatch tables, and to have three or four alternate dispatch tables (enough to supply many hundreds of commands). Thus, commands are rarely more than two keystrokes long. The price that is paid for this brevity is a possibly longer time learning to use the editor effectively.

Note that most of the increased time spent learning the editor is not from the brevity of the commands, but because there are more commands to learn. Given a "conventional" editor of some other command set design (e.g., insert/replace modes or command lines) and an equivalent subset of this "minimalist" editor, learning times will probably be comparable when the same number of commands from each are covered (assuming sensible command assignments in both cases).

Errors

There are two main types of errors: internal and external. Internal errors are those that occur in the editor itself. Examples are a subscript being out of range and division by zero. External errors are those that are caused by the user. Examples of these are an attempt to delete off the end of the buffer. There are also "non-error" errors, such as a normal exit condition. Errors can be detected both from within the editor and from outside the editor (for example, by the operating system).

Internal Errors

Internal errors will be considered first. These errors cause an immediate exit to the operating system with no questions asked and no delays tolerated. They will be internally generated by such things as arithmetic overflows and bad subscripts. (While the editor might catch and process some of these, it will not in general process them all. This section only discusses the non-processed ones.) These errors are unpredictable and the state of the editor should remain intact.

The user should also be able to signal such an error to abort out of the editor. He or she might want to do this signaling because of a problem with the editor itself (e.g., infinite loop) or because he or she wants to do something else (e.g., suspend this process and do another task). This signaling is usually done with the help of the operating system. In any case, the precise state of the editor should be retained so that it can be resumed exactly where it left off. Most operating systems have some facility for doing this; they differ principally in the freedom of action that they allow before losing the state. This freedom ranges from nothing to doing arbitrarily many other things.

At the user's discretion, the editor should be restartable either from exactly where it left off or at a safe restart point. This point is ordinarily a portion of the editor which recovers the buffers and other current state information and then resumes the command loop. Note that in many implementations, the editor must perform actions both on the process suspension and when it resumes. These actions must handle saving and restoring the state, restoring and saving the display modes, and taking note of any changes in the environment, such as a window resizing.

External Errors

External errors are principally user errors. The action ordinarily taken is the display of an error message and a return to command level. The implementation of this level of recovery is built into the procedures which implement the commands.

There is a variation of external errors which are generated manually by the user. Typically, these involve backing out of an undesired state (e.g., the unwanted invoking of a dispatch table rebinding or aborting an undesired argument). The ^G character has often been used for this purpose. In this case, the procedures will know that this character has been typed and will implement the back-out protocol.

Exiting

Finally, provisions to exit the editor must be made. These provisions often take the form of a flag variable such as the is_exit variable described earlier.

Note that various other uses might be multiplexed onto this flag, signifying varying levels of "exiting." For example, one level could be used by buffer switching in order to rebind the dispatch tables (see the section on modes later in this chapter). Alternatively, the different functions could use multiple flag variables.

Ordinary exiting involves several types of processing. The editor might ask the user what to do with buffers that have been modified but not written out. If, as is ordinarily assumed, the state of the editor is preserved across invocations, the state must be saved. If not, it must be sure that all memory is deallocated. Finally, the user's environment should be restored as it was found. This implies such varied things as cleaning up the stack, closing files, deallocating unneeded storage, and resetting terminal parameters.

Arguments

Arguments are specified by the user to modify the behavior of a function. The Emacs argument mechanism will be described as an example of three diverse ways in which arguments are obtained.

There are three standard argument types. First are numeric (prefix) arguments. These are invoked by a string of functions (which are in turn invoked by characters typed before the "actual" command character) and are an example of using the key/function binding to implement a more complicated syntax. Next are string (suffix) arguments. When obtaining a string argument, the editor is invoked recursively on an argument buffer, and upon return from the recursive invocation the contents of that buffer are given to the requesting procedure. Last are positional arguments. These are the internal variables of the editor.

Numeric (Prefix) Arguments

Prefix arguments are entered before the command whose behavior the arguments are modifying, thus, their syntax does not depend upon the command. The interpretation of prefix arguments can vary from command to command. Emac type editors limit these arguments to numeric values.

Ordinarily, commands will have an internal variable available to them named something like arg, and it will have a value of one. Prefix arguments allow the user to change that value to any other positive or negative integer. It is useful to provide a mechanism for command procedures to determine whether an argument has been given at all. This mechanism allows the procedures to handle the default case where no arguments are supplied differently than the case where an argument is supplied.

Each command uses arguments for different, but related, purposes.

The first purpose is to specify a repeat count for a command. Thus, specifying an argument of "12" to the "forward character" command would cause the command to move forward 12 characters.

The second purpose is to tell a command to use a specific value. For example, it doesn't make sense to say "move to the end of the buffer" 12 times. Instead, that command might interpret its argument as a line number and move to the specified line of the buffer. In this case, the "default" value would be the (end of the) last line.

An Emacs-type text editor uses the ^U character as the "universal argument" function. It can be used in either of two ways. ^U command means to supply an argument of "4" to command. Adding another ^U means to multiply the current argument by four. Thus, ^U ^U ^U command means to supply an argument of 64 to the command. The factor of 4 was selected because 5 is too large (1, 5, 25, 125 goes up too fast) and, while 3 might have better spacing (1, 3, 9, 27, 81, 243), the powers of 4 are known by all people who are likely to be around computers. In addition, on a 24 x 80 display, 64 is about the number of characters per line and 16 is 2/3 of the screen height.

The other use is to specify a value exactly. ^U number command means to supply an argument of number to the command. For example, ^U - 1 4 7 command means to supply an argument of -147 to the command. The ^U in this case serves as an "escape" to logically rebind the digit and "-" keys. If you want to supply an argument to the commands normally invoked by the digit and '-' characters, you use the quote command, located on ^Q.

On some terminals, there are two sets of numeric keys. One set is across the top row and always sends the ASCII code for the corresponding digit character. Another set may form a numeric pad and its keys can be configured to send either the ASCII codes for the digit characters or different codes. In this case, these "other numbers" can be bound directly to functions that set up the implied arguments and the initial ^U is not needed.

String (Suffix) Arguments

Numeric arguments are made available in the same way to all commands. Suffix arguments, however, must be explicitly requested by the commands that use them. A command may also request multiple suffix arguments. Most suffix arguments are for strings, not numbers.

The program notifies the user of the string argument by displaying a prompt. This prompt indicates the type of argument that is requested. The user responds by entering the value up through and including a terminating character. The command then proceeds to execute, using the value in whatever way is appropriate.

The following points should be taken into consideration regarding string arguments.

First, the prompt should clearly state what is being asked for -- for example, "Name of the file to be read."

Second, the key or key sequence used to terminate the end of the string should be able to vary and should be indicated in the prompt -- for example, "Name of the file to be read (Return): " or "String to search for (ESC): " There should be a way to cleanly abort out of the prompt and its requesting command. This should be the same command used for the "abort" command (e.g., ^G).

Third, in order to facilitate the abort process, the command should first ask for all user input, and only then perform any actions. This organization means that any abort will result in no effect rather than leave inconsistent state information.

Each command that requests a string prompt should provide a default value for the prompt. This value should be used if the user enters a null response. The value should be the program's best guess about what value the user would most likely want to enter. If no other guess is available, the last value entered should be used.

Here are some examples of string arguments:

Here are some example prompts:

While these examples were requesting a character string, this need not always be the case. For example, to enter numeric values, the requesting procedure merely has to convert the read-in character string to a numeric value. An example of such a command would be a "go to line" command.

One way to implement the routine that accepts string arguments is to use a variation of the Get_Line routine defined in the Introduction. However, a better way to implement this routine is to create an argument buffer in a new window, display a prompt, and call the editor recursively with that as the current buffer. By following this scheme, the full power of the editor is available to correct typing mistakes or otherwise make the entry process easier. It has the additional advantage of not creating a new "mode": the user is free to continue editing while responding to the prompt.

Further, the full power of the editor can be brought to bear on a problem. For example, suppose that someone sends you a mail message that says "the answer is in file X." While reading the mail message, you give the "find file" command. This command prompts you to enter a file name. You switch buffers (from the prompt buffer to the mail message buffer), copy the file name, switch back, and paste it into the prompt, then type the prompter terminator. Voila! A fully integrated, modeless environment.

Finally, the prompts need not be "lifeless" and "passive." A passive prompt just accumulates the input until complete, then passes it back as a block. It has no interaction. A "lively" and "active" prompt offers interaction with the user. For example:

Positional Arguments

Positional arguments are not directly specifiable by the user. They are the editor's internal state variables. Such variables include both those required by the editor (e.g., the length of the buffer, the locations of the point and the mark, etc.) and those which have a specialized purpose (e.g., the current value of the right-hand margin, the tab spacing, etc.).

Often these values are used in unusual ways. For example, the horizontal position (column) of the point can often be a more pleasant way of specifying a value than entering a number. The user can indicate that "this is where I want the right margin to be" instead of having to count characters to get a number.

A specialized positional argument is the region. This is the range of text delimited by the point and the mark. By convention, it does not matter whether the point or the mark is placed earlier in the buffer.

Selection Arguments

The use of graphical input devices opens up new ways of issuing commands and specifying arguments. For example, the cursor can be moved by a graphical input device as well as the more traditional point-motion commands. In addition, a region can be specified by a "click and drag" operation (or whatever sequence is used by the operating system).

Rebinding

Binding is the act of connecting a name and a meaning, rebinding the act of changing the binding. In the case of editors, there are two different levels that binding (and rebinding) can occur on.

The first is at the key level. Binding in this case mean attaching an operation to a key. These bindings are often implemented by means of a dispatch table.

The second is at the function level. Binding in this case means attaching a procedure to an operation. Again, these bindings are often implemented by means of a dispatch table.

For example, the alphabetic keys may be bound to the "insert" operation. This operation, in turn, can be bound to a variety of procedures:

Implementations can perform at one of two levels of rebinding: static and dynamic. Static rebinding is when the new procedure is known about at the time that the editor is invoked. All implementations can perform this level of rebinding. Dynamic rebinding is possible when the new procedure can be defined after the editor is invoked. Unless otherwise stated, this discussion assumes dynamic rebinding.

To a first approximation, editors that are written in compiled languages (e.g., C and Pascal) can only perform static rebinding, and editors that are written in interpreted languages (e.g., Lisp) can also perform dynamic rebinding. Dynamic linking, however, allows compiled editors to include new procedures at run time, and so this distinction is not always a proper one to make. Dynamic bindings are also possible when a compiled language is used to implement an interpreted language, which in turn implements at least the user command portion of the editor.

Rebinding Keys

The process of key rebinding is relatively simple and is done essentially the same way in all implementations. A set of dispatch tables is used to map keys (represented by their ASCII values) to their respective functions.

In languages such as C and Lisp, the table can contain the pointer to the procedures themselves. In less powerful languages such as Fortran and Pascal, the dispatch table branches to a different part of the same routine that contains the table. There, the procedure call is made. In languages that supply it, a case statement can be used instead of the n-way branch.

All of these command procedures have the same formal parameters, and so they can all be invoked with the same calling sequence. Thus, the C and Lisp direct invocations can work properly. Note also that simple commands do not have to have a separate procedure assigned to them, but the code to execute them can be placed in-line in place of a call (where the case-statement equivalent is used). Making this substitution loses some potential flexibility.

Rebinding Functions

Dynamic rebinding is ordinarily a language-supplied feature and so it will not be discussed in depth. Two comments will, however, be made on how to simulate it.

If the underlying operating system has dynamic linking (e.g., Multics, OS/2, and some new UNIX systems), a procedure may be rebound at run time. Dynamic linking is a way of linking procedures together in which the actual link is not made until the procedure is about to be executed. At that time, the procedure is located in the file system and brought into memory. The link may either be left alone, in which case the next call will have the procedure re-located (a relatively expensive process) or it may be snapped. Snapping a link is the process of converting the general call instruction (which is kept in a special, writeable part of the program) into a call instruction to the appropriate address. If a link is snapped, it must be explicitly unsnapped before any rebinding is done.

If the operating system does not support dynamic linking, you might choose to simulate it manually. Such a process is complex, and some thought will have to be given to the desirability of rebinding functions. The process is tantamount to explicit overlaying.

This all has a straightforward bearing on rebinding functions. Rebinding a function involves changing the definition of the procedure that is invoked by referencing it. What has been discussed are ways of changing such a procedure definition. Note that if the code to execute a function is inserted in-line in the basic editor, it cannot be rebound by any of these methods.

If dynamic linking is not available and is not feasible to simulate, there is still one way out. This way will only provide static rebinding. Instead of just using one dispatch table which indicates a procedure to be called directly, use two. Use use the first table to map from keys to the operation to be performed (e.g., ^F is mapped to "moving forward one character") and the second table to map from the operation to be performed to a procedure that will perform it (e.g., "moving forward one character" is mapped to the Forward_Char procedure).

Modes

A mode is a collection of command rebindings. Modes can be invoked implicitly, explicitly, or automatically.

An implicitly invoked mode is one that is not visible to the user. Implicit modes are used to support large, infrequently used commands. For example, suppose that you had an editor command that played the game Adventure. You probably wouldn't want the code for that command to be occupying resources whenever you were using your editor for editing. However, you might still want to make your "adventure" command available at all times. In this case, you would use an implicit mode. The "adventure" command would then take these steps:

From now on, whenever the user gives the "adventure" command, the editor will directly execute that code.

An explicitly invoked mode is one that the user asks to use. Examples of such modes are "auto fill" mode, "auto save" mode, and alternate command sets. The common element is that the user gives a command, knowing that that command itself has no function other than to persistently alter the key bindings.

An automatically invoked mode is one which the implementation determines is appropriate to invoke, based on a command given by the user that "appeared" to do something else.

One example of an automatically invoked mode is a language mode (for example, a "C" mode). This mode will automatically be invoked whenever the user edits a C source file (by convention, one whose name ends in ".c" or ".h"). Such a mode might do the following:

And so forth. Another example of an automatically invoked mode is the specialized mode that the editor places you in when executing such commands as "help," "read mail," and "view a directory." In these commands, the user is effectively placed in a specialized application that shares as much as possible with the regular editor commands, but does have its own extra commands. For example, in the "view a directory" application, the "d" key might delete a file, the "r" key might rename a file, and so forth. However, the "buffer" or "window" switch command should still be available so that the user can perform other editing while the special application is active.

Modes and Dynamic Rebinding

The function rebindings that are commonly done by an editor are known in advance and so they can be done by any implementation (see the preceding section for a discussion of the difficulties involved in function rebinding). Fully dynamic rebinding (the new definition of the procedure is not known until run time) is desirable for several reasons:

Implementing Modes

Modes are defined on a per-buffer basis and so an implementation must provide for changing these bindings as the current buffer is switched. The general technique for doing this is to have a set of default bindings for the editor, a set of current bindings for each buffer, and a set of procedures that can be invoked to change the former into the latter. When a buffer switch is made, the current bindings are used to dispatch all commands.

Whenever a change to the mode list is made -- especially one that removes a mode -- the editor must initialize the current bindings to the default bindings, then invoke each mode procedure in turn to make its changes.

Changing Your Mind

This section discusses the methods used to help users who want to change their minds about an editing command.

Command Set Design

By far the most effective step that you can take is to design the command set to minimize both state variables and changes of perspective. Such proper design is far more effective than any other tool. However, as these topics are covered in the next chapter and in Chapter 1, they won't be discussed here.

Kill Ring

The most basic way in which a user can change his or her mind is to delete something, then say "oops, I didn't want to delete that." After all, if the user inserted extra text, deleting it is easy and straightforward. However, retyping accidentally deleted text is in general neither easy nor straightforward. Hence, an important feature to provide is the ability to save that text for the user. This feature can be added as a single-level save (referred to as the kill buffer) or a multiple-level save (the kill ring).

As an extra benefit, once the deleted text is saved, that feature can be used for "cut and paste" operations. In addition, the saved text can be tied into the system "clipboard" or similar facility.

An Emacs-type editor records multiple text-deletions in a "kill ring" (for historical reasons, commands that save the deleted text were called "kill" commands). Some small but fixed number of the successive deletions are stored together. The "yank" or "paste" command retrieves the last such deletion (inserting it at the point). Alternate "yank" commands are available that cycle through the kill ring, re-deleting the last-yanked text and replacing it with the next item. (A ring is better than a single kill buffer because it can store multiple deletions. It is superior to a stack of separate buffers because of the ease with which various "undeletions" can be tried out.)

It is easy to implement such a kill ring. A buffer is designated as the one to hold the deleted text and the commands that perform the deletion simply copy the text that they are about to delete to that buffer before performing the actual deletion. There are two fine points to the implementation.

First, successive deletion commands should add to the current deletion, not create a new one. Thus, the Emacs commands:

	^U ^[ d

which deletes the next four words, should have exactly the same effect as:

	^[ d ^[ d ^[ d ^[ d

i.e., four successive "delete word" commands. In both cases, all four words should be part of the same deletion.

Second, your implementation must take care to delete properly. Deleting following items (characters, words, sentences, etc.) should add to the end of the deleted text. Deleting previous items should add to the beginning of the deleted text. Not coincidentally, all deletion commands in the command set of Emacs-type editors are of the form "delete following..." or "delete previous..." with the exception of the "delete current line" command. This command must take the text from the point to the end of the line and add it to the end of the deleted text, and take the text from the point to the beginning of the line and add it to the start.

Undo

The "kill ring" approach requires explicit support from the user commands and provides considerable power, yet there are many ways in which it does not make it easy for a user to change his or her mind. An "undo" facility is the most general way that you help a user when he or she changes his or her mind.

In principle, an undo facility provides a mechanism for reversing changes made by the user. These effects can be as simple as moving the point or inserting or deleting a character, or as complex as a file write or global replace.

Note, however, that you may still want to provide the kill ring, as it offers both one intuitive (if limited) type of undo, as well as operations that undo cannot perform. For example, you cannot implement "cut and paste" with undo, except for the limited case where you only want to paste the text exactly where you cut it from!

Unlike the kill ring, which requires explicit support in the user commands, the best place to provide support for undo is in the sub-editor interface. Note that this is in the interface, not the sub-editor itself, although in general the undo facility will work closely with the sub-editor.

The support works like this: each sub-editor procedure that makes any change to a state variable or the buffer first makes a note of what is to be changed, then records the pre-change value, then finally makes the change. For example, the Point_Set routine would record "the point is about to change," and the old (current) location of the point. It would then change the point's location. Note that some procedures (such as the ones to delete a block of text) must record arbitrarily large amounts of state information.

This type of recording allows you to back up as far as you like. Since, in essence, each change to the state information is recorded, all earlier states are recoverable by reversing each state change (in reverse order, of course).

There is more to implementing undo than just recording state changes, but the additional items are more icing than cake.

First, most undo commands operate on a user-command, not sub-editor call, basis. Thus you must also record when a new user command is given. Thus, each undo then undoes consecutive changes until it reaches such a command marker. In this way, even a complex global replace can be undone. Note that in general you will wish to have repeated undo commands undo successively-earlier other commands.

Second, it probably makes sense to undo an entire consecutive set of newly typed characters as a single command.

Third, the resources available to retain the undo information may be limited. This design minimizes that problem, as the "excess" undo state can simply drop off the end. The part that is retained will still be consistent.

Fourth, the operating system may not support the undoing of file-level operations or other commands. In some cases, you can simulate such undoing (say, by making a backup file), but in general you will have to live with some limits to undo. (Undoing a print operation after the printing is complete is quite difficult.)

Fifth, state information is kept in places other than in the sub-editor. These other places must also be incorporated into the undo facility.

An Undo Heresy

Is undo a nice feature to offer? Yes. Is it vital to an editor? Probably not. Will adding it make a poorly designed editor into a good one? No. Will it make such an editor acceptable? Maybe. As was said earlier, the best way to help the user is with a good command set design: it will minimize the need or desire for an undo.

While undo is a general-purpose facility that has good applications, it is not clear that a text editor is one of them. The "good design" approach (using the Emacs command set as an example) and the undo approach will now be compared in their approach to moving around in text and deleting text.

Moving around in text is simply solving the problem "I am at X and I want to be at Y." The good design solution involves translating this difference into a sequence of commands to move the point from X to Y. If a mistake is made in the process of implementing the solution, the problem is merely restated to "I am at X' and I want to be at Y" and it is re-solved. The undo solution differs by detecting the error (i.e., deviation from the intended solution), saying "undo" to put you back on the original path, and proceeding. Ordinarily this difference between the two solutions is not very great.

If the user has accidentally moved a large distance (say, to the start of the buffer), it becomes a little more difficult for the user to recover his or her earlier position. Emacs-type editors resolve this issue by having the large-movement commands set the mark to where you were. Thus, an interchange point and mark sequence will recover from the error. Keep in mind that almost all of the time, the user does not care where the mark happens to be.

The two approaches are all but identical in the text deletion case. On the one hand, accidentally-deleted text is recovered with a "yank" command, and on the other hand, with an "undo" command.

In conclusion, undo is a nice extra feature, but is no substitute for a good design.

Redo

Redo is the mechanism for undoing an undo. Conceptually, the record of undos is:

  1. most-recent command changes
  2. next-most-recent command changes
  3. next-next-most-recent command changes
  4.      ...

The first invocation of the "undo" command undoes #1. The next invocation undoes #2, and so forth. Redo redoes the most recent undo, with repeated "redo" commands moving back up the undo stack. Let's look at an implementation:

	FLAG Undo(FLAG *is_argptr, int *argptr, FLAG *is_exitptr, char c)
		{
		if (undo_ptr == NULL) undo_ptr = last_command_undo;
		Undo_Command(undo_ptr);
		undo_ptr = Previous(undo_ptr);
		return(TRUE);
		}

	FLAG Redo(FLAG *is_argptr, int *argptr, FLAG *is_exitptr, char c)
		{
		if (undo_ptr == NULL)
			Error("No undo to redo");
		else {
			Redo_Command(undo_ptr);
			undo_ptr = Next(undo_ptr);
			}
		return(TRUE);
		}

and, in the main command loop:

	if (last_cmd != Undo && last_cmd != Redo) undo_ptr = NULL;

The Undo procedure checks and, if it hasn't been called "recently," starts at the latest command. It undoes the command, then sets a pointer to point to the undo information for the previous command.

The Redo procedure checks and, if Undo hasn't been called "recently," can't run, as there is no undo to redo. Otherwise, it redoes the command and sets a pointer to point to the undo information for the next command.

The main command loop determines whether the Undo or Redo procedures have been called "recently." Basically, if the last command wasn't undo or redo, it resets the undo pointer to null.

This implementation allows arbitrary undoing and redoing in any combination, so long as the commands are given sequentially. Bear in mind that with the "change in state" recording of undo information, it is only legal to apply those changes in the correct order.

This implementation ignored the arguments to the undo and redo commands. Feel free to assign reasonable interpretations to the arguments in your implementations.

Macros

In a sense, macros allow a user to give commands by specifying them implicitly instead of explicitly. Macros fall into three general levels.

Again

The "again" facility allows a user to say "do what I just did again." For it to be most useful (i.e., easier to type than retyping the command), it should be assigned to a short key sequence (i.e., one shifted key).

The "repeat count" argument to the "again" command should be used instead of the previous "repeat count" argument to the command being repeated.

Keystroke Recording

This facility allows the user to say "start recording," then give a series of commands (observing their effects as they are typed), then say "stop recording." Later, the entire sequence of keystrokes can be replayed with a "play recording" command. A "repeat count" argument to the "play recording" command will cause the recording to be replayed the specified number of times. Note that commands within the recording can have repeat count arguments of their own.

Macro Languages

Finally, the editor should provide the user full access to the editor's macro language (if any). This language will in general provide a full programming language, thus allowing the user to specify an arbitrary set of editing operations as well as a way of naming these procedures for later invocation and key rebinding.

Redisplay Interaction

The introduction of keystroke recording and macro languages only servers to underscore the separation between the redisplay code and the rest of the editor described in the previous chapter. The playback of recorded keystrokes will almost certainly complete with no intervening redisplay. Thus, if any of the code based its actions on the current window contents, it would almost certainly execute incorrectly.

Questions to Probe Your Understanding

Expand the main command loop to include other types of input such as mouse operations. (Easy)

Generalize the main command loop and related code to use a general event-driven mechanism. (Medium)

Write routines to move by a word and to delete a sentence, but be sure to consider all of the punctuation and white space aspects of the problem. (Medium)

What is a good memory management scheme to use for holding the undo information? (Medium)

What fundamental support is required in the editor to best implement the keystroke recorder? (Easy)




Copyright 1999 by Craig A. Finseth.

Back to Top.

Back to Contents.

Back to Home.