6.102
6.102 — Software Construction
Spring 2024

Reading 19: Little Languages

Software in 6.102

Safe from bugsEasy to understandReady for change
Correct today and correct in the unknown future. Communicating clearly with future programmers, including future you. Designed to accommodate change without rewriting.

Objectives

In this reading we will begin to explore the design of a little language for constructing and manipulating music. Here’s the bottom line: when you need to solve a problem, instead of writing a program to solve just that one problem, build a language that can solve a range of related problems.

The goal for this reading is to introduce the idea of representing code as data and familiarize you with an initial version of the music language.

In the process, we will introduce the Visitor pattern, which is another example of treating functions as first-class values, and an alternative implementation strategy for operations on recursive data types. Visitors are a common feature of little languages implemented as recursive data types.

Representing code as data

Recall the Formula data type from Recursive Data Types:

Formula = Variable(name:String)
          + Not(formula:Formula)
          + And(left:Formula, right:Formula)
          + Or(left:Formula, right:Formula)

We used instances of Formula to take propositional logic formulas, e.g. (p ∧ q), and represent them in a data structure, e.g.:

And(Variable("p"), Variable("q"))

In the parlance of grammars and parsers, formulas are a language, and Formula is an abstract syntax tree.

But why did we define a Formula type? TypeScript already has a way to represent expressions of Boolean variables with logical and, or, and not. For example, given boolean variables p and q, we can just write:

p && q

Done!

The answer is that the code expression p && q is evaluated as soon as we encounter it in our running program. The Formula value And(...) is a first-class value that can be stored, passed and returned from one function to another, manipulated, and evaluated now or later (or more than once) as needed.

The Formula type is an example of representing code as data, and we’ve seen others. Here is a first-class function:

const andFunction = (p: boolean, q: boolean) => p && q;

Building languages to solve problems

When we define an abstract data type, we’re extending the universe of built-in types provided by TypeScript to include a new type, with new operations, appropriate to our problem domain. This new type is like a new language: a new set of nouns (values) and verbs (operations) we can manipulate. Of course, those nouns and verbs are abstractions built on top of the existing nouns and verbs which were themselves already abstractions.

A language has greater flexibility than a mere program, because we can use a language to solve a large class of related problems, instead of just a single problem.

  • That’s the difference between writing p && q and devising a Formula type to represent the semantically-equivalent Boolean formula.

  • And it’s the difference between writing an integer addition function and devising a IntegerExpression type to represent integer arithmetic expressions — and store them, manipulate them, rearrange them, evaluate them, and so on.

This kind of language is called a domain-specific language (DSL), because it solves problems in a narrower domain than a general-purpose programming language like TypeScript or Python. DSLs can be further classified into external and internal. External DSLs have custom syntax and semantics, independent of any general-purpose programming language. Examples that we’ve already seen in this class include regular expressions, ParserLib grammars, and the language of problem set 3. By contrast, an internal DSL is embedded in a general-purpose programming language, using the syntax and abstraction mechanisms of the host language rather than inventing its own. First-class functions enable us to create particularly powerful internal DSLs because we can capture patterns of computation as reusable abstractions. The music language that we’ll develop in this class is an example of an internal DSL.

Music language

In class, we will design and implement a language for generating and playing music. We’ll first see how to write a program to play music with the MIDI synthesizer. Then we’ll begin to develop our music language by writing a recursive abstract data type for simple musical tunes. We’ll choose a notation for writing music as a string of characters, and we’ll implement a parser to create instances of our Music type.

See the full source code for the basic music language.

Download the ZIP file at that link, unpack it, and prepare it with npm install, so that you can run the code and follow the discussion below.

Playing MIDI music

MidiSequencePlayer uses the Jazz MIDI API to play sequences of notes. This code is somewhat involved, and you don’t need to understand how it works.

MidiSequencePlayer implements the SequencePlayer interface, allowing clients to use it without depending on the particular MIDI implementation. We do need to understand this interface and the types it depends on:

addNote : SequencePlayer × Instrument × Pitch × number × number → void (SequencePlayer.ts) is the workhorse of our music player. Calling this method schedules a musical pitch to be played at some time during the piece of music.

start : SequencePlayer → Promise<void> (SequencePlayer.ts) actually plays the music. Until we call this method, we’re just scheduling music that will, eventually, be played.

The addNote operation depends on two more types:

Instrument is an enumeration of all the available MIDI instruments.

Pitch is an abstract data type for musical pitches (think keys on the piano keyboard).

Read and understand the Pitch documentation and the specifications for its public factory function and all its public methods.

Our music data type will rely on Pitch in its rep, so be sure to understand the Pitch spec as well as its rep and abstraction function.

Using the MIDI sequence player and Pitch, we’re ready to write code for our first bit of music!

Read and understand the examples/ScaleSequence code.

Run ScaleSequence using npm run ScaleSequence. You should hear a one-octave scale!

If MIDI does not work on your machine in Node.js, you can try using a web browser instead.
Run npm run WebServer, browse to the localhost URL it prints out, follow the ScaleSequence link, and click run main!

reading exercises

transpose

Pitch.transpose(number) is a:

(missing explanation)

Pitch

Which observers could MidiSequencePlayer feasibly use to determine what frequency an arbitrary Pitch represents?

(missing explanation)

addNote

SequencePlayer.addNote(..) is a:

(missing explanation)

Music data type

The Pitch data type is useful, but if we want to represent a whole piece of music using Pitch objects, we should create an abstract data type to encapsulate that representation.

To start, we’ll define the Music type with a few operations:

notes : String × Instrument → Music (MusicLanguage.ts) makes a new Music from a string of simplified abc notation, described below.

duration : Music → number (Music.ts) returns the duration, in beats, of the piece of music.

play : Music × SequencePlayer × number → void (Music.ts) plays the piece of music using the given sequence player.

We’ll implement duration and play as instance methods of Music, so we declare them in the Music interface.

notes will be a factory function; rather than put it in Music (which we could do, as with Pitch), we’ll put it in a separate file: MusicLanguage will be our place for all the functions we write to operate on Music.

Now that we’ve chosen some operations in the spec of Music, let’s choose a representation.

  • Looking at ScaleSequence, the first concrete variant that might jump out at us is one to capture the information in each call to addNote: a particular pitch on a particular instrument played for some amount of time. We’ll call this a Note.

  • The other basic element of music is the silence between notes: Rest.

  • Finally, we need a way to glue these basic elements together into larger pieces of music. We’ll choose a tree-like structure: Concat(first:Music,second:Music) represents first followed by second, where first and second are any music.

    This tree structure turns out to be an elegant decision as we further develop our Music type later on. In a real design process, we might have had to iterate on the recursive structure of Music before we found the best implementation.

Here’s the data type definition:

Music = Note(duration:number, pitch:Pitch, instrument:Instrument)
        + Rest(duration:number)
        + Concat(first:Music, second:Music)

Composite

Music is an example of the composite pattern, in which single objects (primitives, e.g. Note and Rest) and groups of objects (composites, e.g. Concat) all belong to the same type, so they can be treated the same way.

  • The recursive data types we’ve been using, like Formula, have been examples of the composite pattern.

  • HTML’s Document Object Model (DOM) relies heavily on the composite pattern: there are primitive elements like <img> and <input> that don’t have children, and composite elements like <div> and <span> that do contain other elements as children. All of them implement the common Element interface.

The composite pattern gives rise to a tree data structure, with primitives at the leaves and composites at the internal nodes.

Emptiness

One last design consideration: how do we represent the empty music? It’s always good to have a representation for nothing, and we’re certainly not going to use null or undefined.

We could introduce an Empty variant, but instead we’ll use a Rest of duration 0 to represent emptiness.

Implementing basic operations

First we need to create the Note, Rest, and Concat variants. All three are straightforward to implement, starting with constructors, checkRep, some observers, toString, and the equality methods.

  • Since the duration operation is an instance method, each variant implements duration appropriately.

  • The play operation is also an instance method; we’ll discuss it below under implementing the player.

We’ll discuss the notes operation, implemented as a static method in MusicLanguage, under implementing the parser.

Read and understand the Note, Rest, and Concat classes.

To avoid representation dependence, let’s add some additional factory functions for building Music instances:

note : number × Pitch × Instrument → Music (MusicLanguage.ts)

rest : number → Music (MusicLanguage.ts)

concat : Music × Music → Music (MusicLanguage.ts) is our first producer operation.

All three of them are easy to implement by constructing the appropriate variant.

reading exercises

Music rep

Which of the following represent a middle C followed by A above middle C?

(missing explanation)

Music notation

We will write pieces of music using a simplified version of abc notation, a text-based music format.

We’ve already been representing pitches using their familiar letters. Our simplified abc notation represents sequences of notes and rests with syntax for indicating their duration, accidental (sharp or flat), and octave.

For example:

C D E F G A B C' B A G F E D C represents the one-octave ascending and descending C major scale we played in ScaleSequence. C is middle C, and C' is C one octave above middle C. Each note is a quarter note.

C/2 D/2 _E/2 F/2 G/2 _A/2 _B/2 C'/2 is a different ascending scale, played twice as fast. The E, A, and B are flat (so this is a C minor scale). Each note is an eighth note.

Read and understand the specification of notes in MusicLanguage.

You don’t need to understand the parser implementation yet, but you should understand the simplified abc notation enough to make sense of the examples.

If you’re not familiar with music theory — why is an octave 8 notes but only 12 semitones? — don’t worry. You might not be able to look at the abc strings and guess what they sound like, but you can understand the point of choosing a convenient textual syntax.

reading exercises

Simplified abc syntax

Which of these notes are twice as long as E/4?

(missing explanation)

Implementing the parser

The notes method parses strings of simplified abc notation into Music.

notes : String × Instrument → Music (MusicLanguage.ts) splits the input into individual symbols (e.g. A,,/2, .1/2). We start with the empty Music, rest(0), parse symbols individually, and build up the Music using concat. Can you think of how to accomplish this same thing with some combination of map, filter, reduce?

parseSymbol : String × Instrument → Music (MusicLanguage.ts) returns a Rest or a Note for a single abc symbol (symbol in the grammar). It only parses the type (rest or note) and duration; it relies on parsePitch to handle pitch letters, accidentals, and octaves.

parsePitch : String → Pitch (MusicLanguage.ts) returns a Pitch by parsing a pitch grammar production. You should be able to understand the recursion — what’s the base case? What are the recursive cases?

reading exercises

parsePitch

Which of the following are valid inputs to parsePitch that are handled by its base case, without any recursion?

(missing explanation)

Implementing the player

Recall our operation for playing music:

play : Music × SequencePlayer × number → void (Music.ts) plays the piece of music using the given sequence player after the given number of beats delay.

Why does this operation take atBeat? Why not simply play the music now?

If we define play in that way, we won’t be able to play sequences of notes over time unless we actually wait during the play operation, for example by setting a timer. Our sequence player’s addNote operation is already designed to schedule notes in the future — it handles the delay.

With that design decision, it’s straightforward to implement play in every variant of Music.

Read and understand the Note.play, Rest.play, and Concat.play methods.

You should be able to follow their recursive implementations.

Just one more piece of utility code before we’re ready to jam: MusicPlayer plays a Music using the MidiSequencePlayer. Music doesn’t know about the concrete type of the sequence player, so we need a bit of code to bring them together.

Bringing this all together, let’s use the Music ADT:

Read and understand the examples/ScaleMusic code.

Run ScaleMusic using npm run ScaleMusic. You should hear the same one-octave scale again.

That’s not very exciting, so read examples/RowYourBoatInitial and run it using npm run RowYourBoatInitial. You should hear Row, row, row your boat!

Can you follow the execution of the code from calling notes(..), to returning an instance of Music, to the recursive play(..) call, and then individual addNote(..) calls? If you have trouble understanding the flow of the code, using a debugger to step through the code may help.

reading exercises

One note

Consider a Music object created by parsing abc for just the first note of Row, row, row your boat:

const oneNote: Music = notes("C", PIANO);

Use Snapdown to draw a snapshot diagram showing the oneNote after this line of code has executed.

oneNoteTODONotedurationpitchinstrument1.0Pitchvalue0InstrumentPIANO

How many Music objects are used in the representation of oneNote?

(missing explanation)

Two notes

Now consider this Music object:

const twoNotes: Music = notes("C D", PIANO);

How many Music objects are used in the representation of twoNotes?

(missing explanation)

Many notes

There are 27 notes in Row, row, row your boat.

Given the implementation, how many Music objects will be created by the notes call in RowYourBoat­Initial?

(missing explanation)

duration

What should be the result of rowYourBoat.duration?

(missing explanation)

Music

Assume we have

const r: Music = rest(1);
const p: Pitch = Pitch.make('A').transpose(6);
const n: Music = note(1, p, Instrument.GLOCKENSPIEL);
const s: Array<Music> = [r, n];

Which of the following is a valid Music?

(missing explanation)

Functions on recursive types

Since our introduction to recursive data types, we’ve been implementing functions on these types using the Interpreter pattern:

  • declare the operation as an instance method in the interface that defines the data type, and
  • implement the operation in each concrete variant.

In terms of the Composite pattern, composite variants (e.g. Concat in Music, Not in Formula) will implement the operation recursively, and primitive variants (e.g. Note in Music, Variable in Formula) will implement base cases.

For example, here’s the Boolean formula recursive data type again:

Formula = Variable(name:string)
          + Not(formula:Formula)
          + And(left:Formula, right:Formula)
          + Or(left:Formula, right:Formula)

Suppose we want to add an operation variables to Formula that returns the set of all variable names mentioned in the formula. We saw how to do this in a reading exercise in Recursive Data Types (at the time, using immutable set type ImSet).

reading exercises

Implementing the variables operation

We would implement the operation by modifying the interface to specify it:

interface Formula {
    // ... other operations ...
    /** @returns all the variables that appear in this Formula */
    variables(): Set<string>;
}

and modifying all the variant classes, implementing the base case:

class Variable implements Formula {
    public readonly name: string;
    // ... constructor, other operations ...

    public variables(): Set<string> {
        return new Set([this.name]);
    }
}

and the recursive cases (fill in the blank):

// assume we have this helper function:
/** @returns the union of set1 and set2 */
function setUnion(set1: Set<E>, set2: Set<E>): Set<E>;
class Not implements Formula {
    public readonly formula: Formula;
    // ... constructor, other operations ...
    public variables(): Set<string> {
} } class And implements Formula { public readonly left: Formula; public readonly right: Formula; // ... constructor, other operations ... public variables(): Set<string> { return setUnion(this.left.variables(), this.right.variables()); } } class Or implements Formula { public readonly left: Formula; public readonly right: Formula; // ... constructor, other operations ... public variables(): Set<string> { return setUnion(this.left.variables(), this.right.variables()); } }

(missing explanation)

Dynamic dispatch of method calls is what powers the Interpreter pattern.

Understanding dynamic dispatch requires recalling the distinction between static type and dynamic type. The static type of a variable or method call return is the type found in its declaration. Because it is declared in the code, it is known to the compiler at compile-time.

But at runtime, an object variable or method call return refers to an actual object value in memory. This object has a type, called its dynamic type, which is the class it was constructed from.

Dynamic dispatch means that the dynamic type of the object value is used to decide which method body to execute. If a variable is declared as Formula f, but the dynamic type of its value is a concrete type that implements Formula, then the code that runs when we call f.variables() is that concrete type’s implementation of the variables method. If f points to a Not object, then f.variables() runs Not.variables().

The dynamic type of an object value does not have to be identical to the static type of a variable that points to it. When the static type is an interface, in fact, then the dynamic type is always different, because a dynamic type must be a class. (Recall that interfaces don’t have constructors, so only a class can produce an object value at runtime.)

But the dynamic type must be consistent with the static type in an important way: the dynamic type is always a subtype of the static type. This guarantees that any method we can call for the static type also exists in the dynamic type, with the same or stronger spec. Not is a concrete variant class that implements the Formula interface, so it is a subtype of Formula. TypeScript enforces this requirement by static checking; you can’t, for example, just write f = "not", because string is not a subtype of Formula.

reading exercises

Dynamic dispatch

Consider this code:

const f1: Formula = new And(new Variable("x"), new Not(new Variable("y")));
const f2: Not = new Not(new Variable("y"));

What is the static type of the variable f1?

(missing explanation)

After this code executes, what is the dynamic type of the value of f1?

(missing explanation)

Assuming this code has executed, then for each of the following calls to variables(), which method body will be called first by dynamic dispatch?

f1.variables()

f2.variables()

f1.left.variables()

f2.formula.variables()

(missing explanation)

Dynamic dispatch vs. static checking

When we say that TypeScript uses dynamic dispatch for a method call object.method(...), it means that when the program is running, i.e. dynamically, the method call executes the:

of method that is found in the:

of object’s:

(missing explanation)

When we say that TypeScript uses static checking for a method call object.method(...), it means that before the program runs, i.e. statically, the TypeScript compiler checks for the existence of method and the type-compatibility of argument expressions against the:

of method that is found in the:

of object’s:

(missing explanation)

Downsides of Interpreter pattern

Let’s consider two disadvantages of the Interpreter pattern:

  1. For a complicated operation on a data type with many variants, the code to implement the operation will be distributed across all the different variant classes. If we want to inspect all the code, refactor it, or fix a bug, we must find it in all those different places: the code is harder to understand. Bugs in our recursive implementations are easier to introduce and harder to find.

  2. If we wish to add a new operation to our type, we must modify the interface and every implementing class. If there are many concrete variants that already have substantial code, perhaps written and maintained by other programmers, this change will be more difficult to make than if we could keep the code for the new operation in its own separate location.

Pattern matching

Going back to the functional definitions you wrote in the Recursive Data Types reading exercise:

variables(Variable(x)) = setAdd(x, emptySet())
variables(Not(f))      = variables(f)
variables(And(f1, f2)) = setUnion(variables(f1), variables(f2))
variables(Or(f1, f2))  = setUnion(variables(f1), variables(f2))

Translating those cases directly into code on our classes from above, we’d like to write something like this:

Bad pattern matching in TypeScript

If we attempt to transform the code above into working TypeScript, we might write something like this:

function variables(f: Formula): Set<string> {
    // inside each conditional, TypeScript narrows the type of f
    if (f instanceof Variable) {
        return new Set([f.name]);
    } else if (f instanceof Not) {
        return variables(f.formula);
    } else if (f instanceof And) {
        return setUnion(variables(f.left), variables(f.right));
    } else if (f instanceof Or) {
        return setUnion(variables(f.left), variables(f.right));
    } else {
        throw new Error("don't know what " + f + " is");
    }
}

We inspect the type of input f against all the concrete variants we know about, but static checking cannot guarantee that there are not other kinds of Formula we have not handled. This code is not safe from bugs.

reading exercises

Don’t throw that at me

What would happen if we removed the final else clause, which throws an exception?

(missing explanation)

Throw in another one

Returning to the original version of this code, including the else clause:

function variables(f: Formula): Set<string> {
    // inside each conditional, TypeScript narrows the type of f
    if (f instanceof Variable) {
        return new Set([f.name]);
    } else if (f instanceof Not) {
        return variables(f.formula);
    } else if (f instanceof And) {
        return setUnion(variables(f.left), variables(f.right));
    } else if (f instanceof Or) {
        return setUnion(variables(f.left), variables(f.right));
    } else {
        throw new Error("don't know what " + f + " is");
    }
}

What would this code do if we added a new concrete variant to the Formula type – perhaps Xor to represent exclusive-or, or Literal to represent fixed true/false values?

(missing explanation)

Good pattern matching in TypeScript

In cases where the set of subtypes is fixed, it is possible to use this pattern safely in TypeScript. We saw an example of this when we used discriminated unions to describe the types of messages in Worker process message-passing.

A function on a recursive type

To find our way to a different approach, consider what we want: a function on a recursive data type. Instead of thinking about the function as a piece of code, let’s represent the function as data.

Our example operation has the type signature:

variables : Formula → Set<‍string>

Can we represent it as a lambda expression instead?

const variables = (f: Formula) => {
    // ... did that help?
};

This doesn’t get us anywhere. The body of the lambda still only knows that f is a Formula, and that’s not enough to implement an appropriate recursive or base case of extracting variables.

Let’s try defining a new type instead, specifically to represent functions on Formula, implemented case-wise:

/**
 * Represents a function on different kinds of Formulas.
 * @param <R> the type of the result of the function
 */
interface FormulaFunction<R> {

    /** @returns this applied to `variable` */
    onVariable(variable: Variable): R;

    /** @returns this applied to `not` */
    onNot(not: Not): R;

    /** @returns this applied to `and` */
    onAnd(and: And): R;

    /** @returns this applied to `or` */
    onOr(or: Or): R;
}

Now define the variables operation as a class that implements FormulaFunction:

class VariablesInFormula implements FormulaFunction<Set<string>> {

    // The base case works nicely:

    public onVariable(variable: Variable): Set<string> {
        return new Set([variable.name]);
    }

    // But we're not sure how to implement recursive calls:

    public onNot(not: Not): Set<string> {
        return ???; // how to apply this to not.formula?
    }

    public onAnd(and: And): Set<string> {
        return setUnion(???); // how to apply this to and.left & and.right?
    }

    public onOr(or: Or): Set<string> {
        return setUnion(???); // how to apply this to or.left & or.right?
    }
}

The code for onNot, onAnd, and onOr would be easy to write, but only once we understand how to make the recursive calls that apply the function — this — to their arguments’ Formula children. If we can answer that question, we’ll also answer the question of how to use one of these Variables­In­Formula objects in the first place:

const f: Formula = ...;
const getVariables: FormulaFunction<Set<string>> = new VariablesInFormula();
const theVariables: Set<string> = ...; // how to call getVariables on f?

How can we go from calling some method on a Formula object — like f in this code — to calling the variant-specific method we need? We use double dispatch, which means two method calls in succession:

  • the first method call uses dynamic dispatch to reach a concrete variant (e.g. And, Or, etc.)
  • the variant then makes a second method call, also using dynamic dispatch, on the object representing the recursive function (e.g. VariablesInFormula, or some other implementation of FormulaFunction).
  • for ASTs like Formula, this pattern continues over the recursive structure of the AST, alternating between using dynamic dispatch in the FormulaFunction object and in the concrete variant to walk over the data structure.

The first method call serves as an entry point for clients to invoke their function by passing the function as a parameter:

interface Formula {

    // ... other operations ...

    /**
     * Call a function on this Formula.
     * @param <R> the type of the result
     * @param fn the function to call
     * @returns fn applied to this
     */
    callFunction<R>(fn: FormulaFunction<R>): R;
}

And we’ll implement this callFunction operation in each variant to make the second function call, with each concrete variant now taking responsibility for revealing its dynamic type to the Formula­Function:

reading exercises

Implementing callFunction
class Variable implements Formula {
    // ... fields, other methods, etc. ...
    
    public callFunction<R>(fn: FormulaFunction<R>): R {
        return fn.onVariable(this);
    }
}
class Not implements Formula {
    // ... fields, other methods, etc. ...
    
    public callFunction<R>(fn: FormulaFunction<R>): R {
        return fn.onNot(this);
    }
}
class And implements Formula {
    // ... fields, other methods, etc. ...
    
    public callFunction<R>(fn: FormulaFunction<R>): R {
} } class Or implements Formula { // ... fields, other methods, etc. ... public callFunction<R>(fn: FormulaFunction<R>): R {
} }

So a call to callFunction, as implemented by all four variants, immediately turns around with a call to the appropriate onVariant method of the passed-in function. Now we can complete our recursive implementations in Variables­In­Formula, and call it:

reading exercises

Implementing VariablesInFormula
class VariablesInFormula implements FormulaFunction<Set<string>> {
    
    public onVariable(variable: Variable): Set<string> {
        return new Set([variable.name]);
    }
    
    public onNot(not: Not): Set<string> {
} public onAnd(and: And): Set<string> { return setUnion(and.left.callFunction(this), and.right.callFunction(this)); } public onOr(or: Or): Set<string> { return setUnion(or.left.callFunction(this), or.right.callFunction(this)); } }
Calling VariablesInFormula
const f: Formula = ...;
const getVariables: FormulaFunction<Set<string>> = new VariablesInFormula();
const theVariables: Set<string> = ??? ;

Fill in the ???:

(missing explanation)

Visitor pattern

Suppose we represent the formula (P ∨ Q) ∧ ¬R. Following the data type definition, the structure is:

And( Or(Variable("P"), Variable("Q")),
     Not(Variable("R")) )

Let f be that Formula instance.

reading exercises

What is the dynamic type of f?

(missing explanation)

class And implements Formula {
    public callFunction<R>(fn: FormulaFunction<R>): R {
      return fn.onAnd(this);
    } /* ... rest of class not shown */
}
class VariablesInFormula implements FormulaFunction<Set<string>> {
    public onAnd(and: And): Set<string> {
        return setUnion(and.left.callFunction(this),
                        and.right.callFunction(this));
    } /* ... rest of class not shown */
}
class Or implements Formula {
    public callFunction<R>(fn: FormulaFunction<R>): R {
      return fn.onOr(this);
    } /* ... rest of class not shown */
}
class VariablesInFormula implements FormulaFunction<Set<string>> {
    public onOr(or: Or): Set<string> {
        return setUnion(or.left.callFunction(this),
                        or.right.callFunction(this));
    } /* ... rest of class not shown */
}
class Variable implements Formula {
    public callFunction<R>(fn: FormulaFunction<R>): R {
        return fn.onVariable(this);
    } /* ... rest of class not shown */
}
class VariablesInFormula implements FormulaFunction<Set<string>> {
    public onVariable(variable: Variable): Set<string> {
        return new Set([variable.name]);
    }  /* ... rest of class not shown */  
}
class Not implements Formula {
    public callFunction<R>(fn: FormulaFunction<R>): R {
        return fn.onNot(this);
    } /* ... rest of class not shown */
}
class VariablesInFormula implements FormulaFunction<Set<string>> {
    public onNot(not: Not): Set<string> {
      return not.formula.callFunction(this);
    } /* ... rest of class not shown */
}
class Variable implements Formula {
    public callFunction<R>(fn: FormulaFunction<R>): R {
        return fn.onVariable(this);
    } /* ... rest of class not shown */
}
class VariablesInFormula implements FormulaFunction<Set<string>> {
    public onVariable(variable: Variable): Set<string> {
        return new Set([variable.name]);
    } /* ... rest of class not shown */   
}


If we run f.callFunction(getVariables), its execution will unfold following the structure of the expression.

Hover or tap on each step to see corresponding highlights in the snapshot diagram and the code on the right.

  • f.callFunction runs onAnd
    • callFunction runs onOr
      • callFunction runs onVariable
        • which returns the set { "P" }
      • callFunction runs onVariable
        • which returns the set { "Q" }
      • returns { "P", "Q" }
    • callFunction runs onNot
      • callFunction runs onVariable
        • which returns the set { "R" }
      • returns the set { "R" } unchanged
    • returns { "P", "Q", "R" }

What we have created with the FormulaFunction type and callFunction operation is a special kind of iterator, one that walks over the tree structure of a Formula and processes nodes in the tree one at a time. The name for this pattern is Visitor:

  • A visitor implements switch over types: the code above to implement Variables­In­Formula reads pretty closely to our original goal of a switch statement, but each case is given its own onVariant method. The dynamic dispatch in the visitor implicitly acts like a switch statement, executing the callFunction for the appropriate concrete variant corresponding to the dynamic type.

  • A visitor represents a function over a recursive type: we can create an instance of a particular Formula­Function, pass it around, apply it as needed, etc. If the function we’re representing takes additional parameters, we can make them arguments to the visitor constructor and store those values in fields to be referenced as we go. The exercise at the end of this section explores this idea.

We’ll make a few changes to nomenclature to arrive at a final version of our Formula visitor:

  • We’ll call it FormulaVisitor instead of FormulaFunction.
  • Walking a visitor over an instance is called accepting it, so we’ll rename callFunction to accept.

Here is the code:

interface FormulaVisitor<R> {  // renamed from FormulaFunction<R>
    onVariable(variable: Variable): R;
    onNot(not: Not): R;
    onAnd(and: And): R;
    onOr(or: Or): R;
}

interface Formula {

    accept<R>(visitor: FormulaVisitor<R>): R; // renamed from callFunction()

    // ... other operations ...
}

class Variable implements Formula {
    // ...
    public accept<R>(visitor: FormulaVisitor<R>):R {
        return visitor.onVariable(this);
    }

}
class Not implements Formula {
    // ...
    public accept<R>(visitor: FormulaVisitor<R>):R {
        return visitor.onNot(this);
    }
}
class And implements Formula {
    // ...
    public accept<R>(visitor: FormulaVisitor<R>):R {
        return visitor.onAnd(this);
    }
}
class Or implements Formula {
    // ...
    public accept<R>(visitor: FormulaVisitor<R>):R {
        return visitor.onOr(this);
    }
}

class VariablesInFormula implements FormulaVisitor<Set<string>> {
    public onVariable(variable: Variable): Set<string> {
        return new Set([variable.name]);
    }
    public onNot(not: Not): Set<string> {
        return not.formula.accept(this);
    }
    public onAnd(and: And): Set<string> {
        return setUnion(and.left.accept(this), and.right.accept(this));
    }
    public onOr(or: Or): Set<string> {
        return setUnion(or.left.accept(this), or.right.accept(this));
    }
}

Calling it with:

const f: Formula = ...;
const variables: Set<string> = f.accept(new VariablesInFormula());

reading exercises

A visitor with more parameters

Our visitor example so far, variables, is a recursive function taking a single Formula parameter:

variables: FormulaSet<string>

effects
returns the set of variables in the formula

Let’s look at how to use the Visitor pattern for a function that not only takes a Formula parameter but a second parameter too:

evaluate: Formula × Map<string,boolean>boolean

requires
all variables in the formula appear as keys in the map
effects
evaluates the formula with variables substituted with their values in the map

For example, abstractly speaking, evaluate(a∧¬b, {a:true, b:false, c:true}) is true.

We can implement evaluate as a visitor by passing the additional parameter to the constructor of the visitor object, which then stores it until needed. Here is a visitor for evaluate:

class EvaluateVisitor implements FormulaVisitor<▶▶A◀◀> {
    /** Make a new evaluator that always uses the current values in map. */
    public constructor(
        private readonly map: ▶▶B◀◀
    ) {
    }
    public onVariable(variable: Variable): ▶▶A◀◀ {
        return ▶▶C◀◀(variable.name) ?? assert.fail(variable.name+' undefined');
    }
    public onNot(not: Not): ▶▶A◀◀ {
        return ! not.formula.accept(this);
    }
    public onAnd(and: And): ▶▶A◀◀ {
        return and.left.accept(this) ▶▶D◀◀ and.right.accept(this);
    }
    public onOr(or: Or): ▶▶A◀◀ {
        return or.left.accept(this)  ▶▶E◀◀ or.right.accept(this);
    }
}

Fill in the blanks in the implementation above:

(missing explanation)

Now fill in the blanks in this method that invokes the visitor:

/**
 * @param formula
 * @param map must have a key for every variable in formula
 * @returns the value of the formula with variables substituted with their values in the map
 */
function evaluate(formula: Formula, map: Map<string,boolean>): boolean {
    return ▶▶F◀◀.accept(new EvaluateVisitor(▶▶G◀◀));
}

(missing explanation)

A visitor literal

We can go one step further in making visitors easy to write in TypeScript, by making use of structural subtyping and constructing an object literal in which each case of the recursive function is a method.

Now we can write evaluate() in a way that more closely resembles the switch statement we were looking for at the start of this reading — but the compiler checks our work, requiring that we include all the methods of FormulaVisitor:

function evaluate(formula: Formula, map: Map<string,boolean>): boolean {
    return formula.accept({
        onVariable(variable: Variable) { return map.get(variable.name); },
        onNot(not: Not) { ▶▶A◀◀ },
        onAnd(and: And) { return evaluate(and.left, map) && evaluate(and.right, map); },
        onOr(or: Or)    { return evaluate(or.left,  map) || evaluate(or.right,  map); },
    });
}

Fill in the blanks in the code above:

(missing explanation)

Why visitor?

To understand the contrast between the Visitor pattern and the Interpreter pattern, it helps to imagine all the code for all the operations of Formula as if it were written down in a table, with the variant classes across the top and the operations down the side. An entry in the table is the code that implements that operation for that variant. Here is part of the table (written compactly in mathematical notation rather than TypeScript):

Variable(name) And(f1,f2) Or(f1,f2) Not(f)
variables :
 Formula → Set<string>
{ name } variables(f1) ∪ variables(f2) variables(f1) ∪ variables(f2) variables(f)
evaluate :
Formula × Map<string,boolean>
    → boolean
map.get(name) evaluate(f1, map)
 ∧ evaluate(f2, map)
evaluate(f1, map)
 ∨ evaluate(f2, map)
¬ evaluate(f, map)
conjunctiveNormalForm :
 Formula → Formula
... ... ... ...
... etc. ...

Now we can compare the Interpreter pattern and the Visitor pattern for implementing functions over a recursive data type:

Interpreter
Code is grouped by column. Each variant has all the code for that variant, implementing all the operations.
Visitor
Code is grouped by row. All the code for an operation will reside in a visitor class, with code to handle all the variants.

The Interpreter pattern makes it easier to add new variants, because we don’t have to change any of our existing code: we just need to implement all the various operations as methods in the new variant class. All the code for the new variant class can be neatly added in one place.

The Visitor pattern makes it easier to add new operations. Instead of having to modify both the interface and every variant class, we just need to create a new, e.g., FormulaVisitor implementation with all the code for our new operation. There is no change to existing classes or interfaces. All the code for the new operation can be neatly added in one place.

Therefore, if adding new operations is a change we want to be most ready for, we might choose to define a visitor interface and write our functions as visitors.

We will also use the Visitor pattern when we intend clients of our type to know and understand its underlying structure and implement their own operations — and we’ve already seen a common and important use case for this: parse trees! Looking at code we’ve written to transform a concrete syntax tree into an abstract syntax tree, notice a smelly pattern:

function makeAST(parseTree: ParseTree<IntegerGrammar>): IntegerExpression {
    if (parseTree.name === IntegerGrammar.Expr) {
        // ... call makeAST recursively on child ...
    } else if (parseTree.name === IntegerGrammar.Sum) {
        // ... iterate over children and create Plus expressions ...
    } else if (parseTree.name === IntegerGrammar.Primary) {
        // ... call makeAST recursively on child ...
    } else if (parseTree.name === IntegerGrammar.Constant) {
        const n = parseInt(parseTree.text);
        return new Constant(n);
    } else {
        assert.fail('cannot make AST for unexpected node');
    }
}
Static checking for covering all ParseTree variants

We said above it is possible to get static checking when the set of subtypes is fixed. This is true for enums like Integer­Grammar, which have a fixed set of values. If you want to explore TypeScript type checking a bit:

  • What is the inferred type of parseTree.name in the else clause? What is it if we add a case for Whitespace?
  • How could we use that inferred type to get static checking that we’ve covered all the cases?

There’s that default case throwing an exception again: we don’t have static checking to make sure we’ve covered all the variants (although this problem can be fixed with a little work, see the sidebar).

ParserLib has the advantage of simplicity, but in a more complex parsing package, we are likely to find makeAST implemented with a visitor. The different concrete syntax tree nodes will likely have different types, for example so that the number and types of their children can be included in the static type, according to the shape required by the grammar.

And what works for concrete syntax trees is often very helpful for abstract syntax trees, where the implementor provides clients of the AST a visitor interface so that they can define their own operations by walking recursively over the tree.

Summary

  • ADTs can be used to make a domain-specific language (DSL), which solves a class of related problems instead of a single one
    • DSLs can be external, like regular expressions, or internal, like the music language developed in this class
  • The composite pattern treats single objects and groups of objects the same way, belonging to the same type
    • Music is a composite pattern, with Note, Rest, and Concat all implementing Music
  • The interpreter pattern groups code by column:
    • Declare each operation as an instance method in the interface that defines the data type
    • Implement the operation in each concrete variant
    • This makes adding new variants easier
  • The visitor pattern groups code by row, the opposite of the interpreter pattern
    • Using double dispatch, a visitor is like a switch over the concrete variants of the ADT
    • The acceptor method in each concrete variant will call the visitor with the appropriate visitor operation, which will in turn call the acceptor method for a child concrete variant, walking over the recursive structure of the ADT
    • This makes adding new operations easier

Thinking about our three main measures of code quality:

  • Safe from bugs. The Visitor pattern overcomes a missing feature in many object-oriented languages, giving us a type-safe way to represent and implement functions over a recursive data type as self-contained modules.

  • Easy to understand. If we expect to add new operations to our recursive ADT in the future, using a visitor makes our code easier to understand since all the code for the new operation is in the same place, without sacrificing the safety from bugs provided by static checking.

  • Ready for change. If we are building a little language where the ADT is designed for the purpose of allowing clients to define their own operations on our type from outside our implementation classes, the visitor pattern makes this possible and feasible to implement.