Reading 27: Little Languages II
Software in 6.031
Objectives
The goal of this reading is to introduce the Visitor pattern, another example of treating functions as first-class values, and an alternative implementation strategy for operations on recursive data types.
Visitors are a common feature of little languages implemented as recursive data types.
Functions on recursive types
Since our introduction to recursive data types, we’ve been implementing functions on these types using the Interpreter pattern:
- declare the operation as an instance method in the interface that defines the datatype, and
- implement the operation in each concrete variant.
In terms of the Composite pattern, composite variants (e.g. Concat in Music, Not in Formula) will implement the operation recursively, and primitive variants (e.g. Note in Music, Variable in Formula) will implement base cases.
For example, here’s the Boolean formula recursive datatype again:
Formula = Variable(name:String)
          + Not(formula:Formula)
          + And(left:Formula, right:Formula)
          + Or(left:Formula, right:Formula)Suppose we want to add an operation variables to Formula that returns the set of all variable names mentioned in the formula.
We saw how to do this in a reading exercise in Recursive Data Types (at the time, using immutable set type ImSet).
reading exercises
We would implement the operation by modifying the interface to specify it:
public interface Formula {
    // ... other operations ...
    /** @return all the variables that appear in this Formula */
    public Set<String> variables();
}and modifying all the variant classes, implementing the base case:
// assume we have this library method:
/** @return a set containing only elt */
public static <E> Set<E> singletonSet(E elt);
public class Variable implements Formula {
    public final String name;
    // ... constructor, other operations ...
    @Override public Set<String> variables() {
        return singletonSet(this.name);
    }
}and the recursive cases (fill in the blank):
// assume we have this library method:
/** @return the union of set1 and set2 */
public static <E> Set<E> 
    setUnion(Set<E> set1, Set<E> set2);public class Not implements Formula {
    public final Formula formula;
    // ... constructor, other operations ...
    @Override public Set<String> variables() {
    }
}
public class And implements Formula {
    public final Formula left, right;
    // ... constructor, other operations ...
    @Override public Set<String> variables() {
        return setUnion(this.left.variables(), this.right.variables());
    }
}
public class Or implements Formula {
    public final Formula left, right;
    // ... constructor, other operations ...
    @Override public Set<String> variables() {
        return setUnion(this.left.variables(), this.right.variables());
    }
}Dynamic dispatch of method calls in Java is what powers the Interpreter pattern.
Understanding dynamic dispatch requires recalling the distinction between declared type and actual type. The declared type of a variable or method call return is the type found in its declaration. Because it is declared in the code, it is known to the compiler at compile-time.
But at runtime, an object variable or method call return refers to an actual object value in memory. This object has a type, called its actual type, which is the class it was constructed from.
Dynamic dispatch means that the actual type of the object value is used to decide which method body to execute.
If a variable is declared as Formula f, but the actual type of its value is a concrete type that implements Formula, then the code that runs when we call f.variables() is that concrete type’s implementation of the variables method.
If f points to a Not object, then f.variables() runs Not.variables().
The actual type of an object value does not have to be identical to the declared type of a variable that points to it. When the declared type is an interface, in fact, then the actual type is always different, because an actual type must be a class. (Recall that Java interfaces don’t have constructors, so only a class can produce an object value at runtime.)
But the actual type must be consistent with the declared type in an important way: the actual type is always a subtype of the declared type. 
This guarantees that any method we can call for the declared type also exists in the actual type, with the same or stronger spec. 
Not is a concrete variant class that implements the Formula interface, so it is a subtype of Formula.
Java enforces this requirement by static checking; you can’t, for example, just write f = "not", because String is not a subtype of Formula.
reading exercises
Formula f1 = new And(new Variable("x"), new Not(new Variable("y")));
Not f2 = new Not(new Variable("y"));
Formula f3 = new Formula() {
  public Set<String> variables() {
    return new HashSet<>();
  }
};What is the type of the variable f1?
(missing explanation)
After this code executes, what is the actual type of the value of f1?
(missing explanation)
Assuming this code has executed, then for each of the following calls, which method body will be called first by dynamic dispatch?
(missing explanation)
When we say that Java uses dynamic dispatch for a method call object.method(...),
it means that when the program is running, i.e. dynamically, the method call executes the:
of method that is found in the:
of object’s:
(missing explanation)
When we say that Java uses static checking for a method call object.method(...),
it means that before the program runs, i.e. statically, the Java compiler checks for the existence of method and the type-compatibility of argument expressions against the:
of method that is found in the:
of object’s:
(missing explanation)
Downsides of Interpreter pattern
Let’s consider two disadvantages of the Interpreter pattern:
- For a complicated operation on a datatype with many variants, the code to implement the operation will be distributed across all the different variant classes. If we want to inspect all the code, refactor it, or fix a bug, we must find it in all those different places: the code is harder to understand. Bugs in our recursive implementations are easier to introduce and harder to find. 
- If we wish to add a new operation to our type, we must modify the interface and every implementing class. If there are many concrete variants that already have substantial code, perhaps written and maintained by other programmers, this change will be more difficult to make than if we could keep the code for the new operation in its own separate location. 
Pattern matching
Going back to the functional definitions you wrote in the Recursive Data Types reading exercise:
variables(Variable(x)) = set-add(x, set-empty())
variables(Not(f))      = variables(f)
variables(And(f1, f2)) = set-union(variables(f1), variables(f2))
variables(Or(f1, f2))  = set-union(variables(f1), variables(f2))Translating those cases directly into code on our classes from above, we’d like to write something like this:
public static Set<String> variables(Formula f) {
    switch(f) {
    case Variable var:
        return singletonSet(var.name);
    case Not not:
        return variables(not.formula);
    case And and:
        return setUnion(variables(and.left), variables(and.right));
    case Or or:
        return setUnion(variables(or.left), variables(or.right));
    }
}Even putting aside the direct access to the rep of each variant, a switch structure like this is not (yet?) valid Java.
There are many languages that do support pattern matching of instances to determine their underlying structure, but Java is not one of them.
Bad pattern matching in Java
If we attempt to transform the code above into working Java, we might write this terrible beast:
public static Set<String> variables(Formula f) {
    if (f instanceof Variable) {
        return singletonSet(((Variable)f).name);
    } else if (f instanceof Not) {
        return variables(((Not)f).formula);
    } else if (f instanceof And) {
        And and = (And)f;
        return setUnion(variables(and.left), variables(and.right));
    } else if (f instanceof Or) {
        Or or = (Or)or;
        return setUnion(variables(or.left), variables(or.right));
    } else {
      throw new IllegalArgumentException("don't know what " + f + " is");
    }
}This code stinks — even still putting aside the rep exposure.
Static checking has been thrown out the window: we inspect the type of input f against all the concrete variants we know about and cast when we find a match.
The compiler is not checking our work.
This code is not safe from bugs.
reading exercises
A function on a recursive type
To find our way to a different approach, consider what we want: a function on a recursive data type. Instead of thinking about the function as a piece of code, consider representing the function as data.
Our example operation has the type signature:
variables : Formula → Set<String>
Can we use the Function type from the standard library, which represents functions of a single argument, and is parameterized by the argument type and the return type?
Function<Formula, Set<String>> variables = f -> {
    // ... did that help?
};This doesn’t get us anywhere.
The body of the lambda still only knows that f is a Formula, and that’s not enough to implement an appropriate recursive or base case of extracting variables.
Let’s try defining a new type instead, specifically to represent functions on Formula:
/**
 * Represents a function on different kinds of Formulas
 * @param <R> the type of the result of the function
 */
interface FormulaFunction<R> {
    /**  @return this applied to var */
    public R onVariable(Variable var);
    /**  @return this applied to not */
    public R onNot(Not not);
    /**  @return this applied to and */
    public R onAnd(And and);
    /**  @return this applied to or */
    public R onOr(Or or);
}Now define the variables operation as a class that implements FormulaFunction:
class VariablesInFormula implements FormulaFunction<Set<String>> {
    // The base case works nicely:
    @Override public Set<String> onVariable(Variable var) {
        return singletonSet(var.name);
    }
    // But we're not sure how to implement recursive calls:
    @Override public Set<String> onNot(Not not) {
        return ???; // how to apply this to not.formula?
    }
    @Override public Set<String> onAnd(And and) {
        return setUnion(???); // how to apply this to and.left & and.right?
    }
    @Override public Set<String> onOr(Or or) {
        return setUnion(???); // how to apply this to or.left & or.right?
    }
}The code for onNot, onAnd, and onOr would be easy to write, but only once we understand how to make the recursive calls that apply the function — this — to their arguments’ Formula children.
If we can answer that question, we’ll also answer the question of how to use one of these VariablesInFormula objects in the first place:
Formula f = ...;
VariablesInFormula getVariables = new VariablesInFormula();
Set<String> theVariables = ...; // how to call getVariables on f?How can we go from calling some method on a Formula object — like f in this code, or like and.left and or.right in the previous code — to calling the variant-specific method we need?
We use double dispatch, which means two method calls in succession:
- the first method call uses dynamic dispatch to reach a concrete variant (e.g. And,Or, etc.)
- the variant then makes a second method call to the object representing the recursive function (e.g. VariablesInFormula, or some otherFormulaFunction).
The first method call serves as an entry point for clients to invoke their function by passing the function as a parameter:
public interface Formula {
    // ... other operations ...
    /**
     * Call a function on this Formula.
     * @param <R> the type of the result
     * @param function the function to call
     * @return function applied to this
     */
    public <R> R callFunction(FormulaFunction<R> function);
}And we’ll implement this callFunction operation in each variant to make the second function call, with each concrete variant now taking responsibility for revealing its actual type to the FormulaFunction:
reading exercises
public class Variable implements Formula {
    // ... fields, other methods, etc. ...
    
    @Override public <R> R callFunction(FormulaFunction<R> function) {
        return function.onVariable(this);
    }
}
public class Not implements Formula {
    // ... fields, other methods, etc. ...
    
    @Override public <R> R callFunction(FormulaFunction<R> function) {
        return function.onNot(this);
    }
}
public class And implements Formula {
    // ... fields, other methods, etc. ...
    
    @Override public <R> R callFunction(FormulaFunction<R> function) {
    }
}
public class Or implements Formula {
    // ... fields, other methods, etc. ...
    
    @Override public <R> R callFunction(FormulaFunction<R> function) {
    }
}So a call to callFunction, as implemented by all four variants, immediately turns around with a call to the appropriate onVariant method of the passed-in function.
Now we can complete our recursive implementations in VariablesInFormula, and call it:
reading exercises
class VariablesInFormula implements FormulaFunction<Set<String>> {
    
    @Override public Set<String> onVariable(Variable var) {
        return singletonSet(var.name);
    }
    
    @Override public Set<String> onNot(Not not) {
    }
    
    @Override public Set<String> onAnd(And and) {
        return setUnion(and.left.callFunction(this),
                        and.right.callFunction(this));
    }
    
    @Override public Set<String> onOr(Or or) {
        return setUnion(or.left.callFunction(this),
                        or.right.callFunction(this));
    }
}Visitor pattern
Suppose we represent the formula (P ∨ Q) ∧ ¬R. Following the datatype definition, the structure is:
And( Or(Variable("P"), Variable("Q")),
     Not(Variable("R")) )Let f be that Formula instance.
If we run f.callFunction(getVariables), how will the execution proceed?
- f.callFunctionruns- VariablesInFormula.onAnd- and.left.callFunctionruns- VariablesInFormula.onOr
- and.right.callFunctionruns- VariablesInFormula.onNot
- returns setUnion({ "P", "Q" }, { "R" }) = { "P", "Q", "R" }
 
What we have created with the FormulaFunction type and callFunction operation is a special kind of iterator, one that walks over the tree structure of a Formula and processes nodes in the tree one at a time.
The name for this pattern is Visitor:
- A visitor implements - switchover types: the code above to implement- VariablesInFormulareads pretty closely to our original goal of a- switchstatement, but each case is given its own- onVariantmethod.
- A visitor represents a function over a recursive type: we can create an instance of a particular - FormulaFunction, pass it around, apply it as needed, etc. If the function we’re representing takes additional parameters, we can make them arguments to the constructor and store those values in fields to be referenced as we go. The exercise at the end of this section explores this idea.
- A visitor acts as an iterator: the implementor of a function such as variables does not write explicit control structures to pull out each node in the tree one-by-one. Instead, the implementor makes recursive calls to - callFunction(this)and relies on the- Formulavariants to call it back.
We’ll make a few changes to nomenclature to arrive at a final version of our Formula visitor:
- We’ll call it Formula.Visitorinstead ofFormulaFunction.
- Walking a visitor over an instance is called accepting it, so we’ll rename callFunctiontoaccept.
- In the visitor interface, since the onVariantmethods each take a different static type, we can use a single overloaded name for all of them:on.
Finally, let’s stop referring directly to our fields and implement observer operations our visitors can use.
We don’t need to add these operations to the Formula interface: they are variant-specific, and will only be used by clients who are aware of what our concrete types are.
public interface Formula {
    public interface Visitor<R> {
        public R on(Variable var);
        public R on(Not not);
        public R on(And and);
        public R on(Or or);
    }
    public <R> R accept(Visitor<R> visitor);
    // ... other operations ...
}
class Variable implements Formula {
    private final String name;
    // ...
    public String name() { ... }
    @Override public <R> R accept(Visitor<R> visitor) {
        return visitor.on(this);
    }
}
class Not implements Formula {
    private final Formula formula;
    // ...
    public Formula formula() { ... }
    @Override public <R> R accept(Visitor<R> visitor) {
        return visitor.on(this);
    }
}
class And implements Formula {
    private final Formula left, right;
    // ...
    public Formula left() { ... }
    public Formula right() { ... }
    @Override public <R> R accept(Visitor<R> visitor) {
        return visitor.on(this);
    }
}
class Or implements Formula {
    private final Formula left, right;
    // ...
    public Formula left() { ... }
    public Formula right() { ... }
    @Override public <R> R accept(Visitor<R> visitor) {
        return visitor.on(this);
    }
}
class VariablesInFormula implements Formula.Visitor<Set<String>> {
    @Override public Set<String> on(Variable var) {
        return singletonSet(var.name());
    }
    @Override public Set<String> on(Not not) {
        return not.formula().accept(this);
    }
    @Override public Set<String> on(And and) {
        return setUnion(and.left().accept(this), and.right().accept(this));
    }
    @Override public Set<String> on(Or or) {
        return setUnion(or.left().accept(this), or.right().accept(this));
    }
}Calling it with:
Formula f = ...;
Set<String> variables = f.accept(new VariablesInFormula());reading exercises
Our visitor example so far, variables, is a recursive function taking a single Formula parameter:
variables: Formula → Set<String>- effects
- returns the set of variables in the formula
Let’s look at how to use the Visitor pattern for a function that not only takes a Formula parameter but a second parameter too:
evaluate: Formula × Map<String,Boolean> → Boolean- requires
- all variables in the formula appear as keys in the map
- effects
- evaluates the formula with variables substituted with their values in the map
For example, abstractly speaking, evaluate(a∧¬b, {a:true, b:false, c:true}) is true.
We can implement evaluate as a visitor by passing the additional parameter to the constructor of the visitor object, which then stores it until needed.
Here is a visitor for evaluate:
class EvaluateVisitor implements Formula.Visitor<▶▶A◀◀> {
    private ▶▶B◀◀ map;
    public EvaluateVisitor(▶▶B◀◀ map) {
        this.map = map;
    }
    @Override public ▶▶A◀◀ on(Variable var) {
        return ▶▶C◀◀(var.name());
    }
    @Override public ▶▶A◀◀ on(Not not) {
        return !not.formula().accept(this);
    }
    @Override public ▶▶A◀◀ on(And and) {
        return and.left().accept(this) ▶▶D◀◀ and.right().accept(this);
    }
    @Override public ▶▶A◀◀ on(Or or) {
        return or.left().accept(this)  ▶▶E◀◀ or.right().accept(this);
    }
}Fill in the blanks in the implementation above:
(missing explanation)
Now fill in the blanks in this method that invokes the visitor:
/**
 * @param formula
 * @param map must have a key for every variable in formula
 * @return the value of the formula with variables substituted with their values in the map
 */
public static boolean evaluate(Formula formula, Map<String,Boolean> map) {
    return ▶▶F◀◀.accept(new EvaluateVisitor(▶▶G◀◀));
}(missing explanation)
We can go one step further in making visitors easy to write in Java, by representing each case of the recursive function as a lambda expression.
The makeVisitor() method below takes n functional object arguments, one for each concrete variant in the type, and uses them as the implementations of the corresponding cases of an anonymous visitor class:
/**
 * @return a visitor object whose on(T) method calls the onT function parameter, 
 *         for all T that are concrete variants of Formula
 */
public static <R> Formula.Visitor<R> makeVisitor(
        Function<Variable,R> onVariable,
        Function<Not,R> onNot,
        Function<And,R> onAnd,
        Function<Or,R> onOr
) {
    return new ▶▶A◀◀() {
        public R on(Variable var) { return onVariable.apply(var); }
        public R on(Not not) { return onNot.apply(not); }
        public R on(And and) { return onAnd.apply(and); }
        public R on(Or or) { return onOr.apply(or); }
    };
}Now we can write evaluate() in a way that more closely resembles the switch statement we were looking for at the start of this reading:
public static boolean evaluate(Formula formula, Map<String,Boolean> map) {
    return formula.accept(makeVisitor(
        (Variable var) -> map.get(var.name()),
        (Not not) -> ▶▶B◀◀,
        (And and) -> evaluate(and.left(), map)  &&  evaluate(and.right(), map),
        (Or or)   -> evaluate(or.left(),  map)  ||  evaluate(or.right(),  map) 
    ));
}Fill in the blanks in the code above:
(missing explanation)
Suppose the program runs the call evaluate(new Variable("x"), map), assuming map contains a value for "x".
Put the following events in the order that they occur during the execution of evaluate().
(missing explanation)
Why visitor?
Consider all the code that will implement all the operations of Formula:
| Variable | Not | And | Or | |
| variables : Formula → Set<String> | ||||
| negationNormal : Formula → Formula | ||||
| evaluate : Formula × Environment → boolean | ||||
| ... etc. ... | 
And compare the Interpreter pattern and the Visitor pattern for implementing functions over a recursive data type:
- Interpreter
- Code is grouped by column. Each variant has all the code for that variant, implementing all the operations.
- Visitor
- Code is grouped by row. All the code for an operation will reside in a visitor class, with code to handle all the variants.
The Interpreter pattern makes it easier to add new variants, because we don’t have to change any of our existing code: we just need to implement all the various operations as methods in the new variant class.
The Visitor pattern makes it easier to add new operations.
Instead of having to modify both the interface and every variant class, we just need to create a new, e.g., Formula.Visitor implementation with all the code for our new operation.
There is no change to existing classes or interfaces.
Therefore, if adding new operations is a change we want to be most ready for, we might choose to define a visitor interface and write our functions as visitors.
We will also use the Visitor pattern when we intend clients of our type to know and understand its underlying structure and implement their own operations — and we’ve already seen a common and important use case for this: parse trees! Looking at code we’ve written to transform a concrete syntax tree into an abstract syntax tree, notice a smelly pattern:
static IntegerExpression makeAST(ParseTree<IntegerGrammar> parseTree) {
    switch (parseTree.name()) {
    case ROOT:
        // ... call makeAST recursively on child ...
    case SUM:
        // ... iterate over children and create Plus expressions ...
    case PRIMARY:
        // ... call makeAST recursively on child ...
    case NUMBER:
        final int n = Integer.parseInt(parseTree.text());
        return new Number(n);
    default:
        throw new AssertionError("should never get here");
}There’s that default case throwing an exception again: we don’t have static checking to make sure we’ve covered all the variants.
We’re also stuck talking about generic ParseTree nodes.
If there are multiple concrete variants of ParseTree, we would need to use unsafe casting to call any helpful operations they define.
ParserLib has the advantage of simplicity, but in a more complex parsing package, we are likely to find makeAST implemented with a visitor.
And what works for concrete syntax trees is often very helpful for abstract syntax trees, where the implementor offers AST clients a visitor interface to define their own operations by walking recursively over the tree.
Summary
The Visitor pattern overcomes a missing feature in the Java language, giving us a type-safe way to represent and implement functions over a recursive data type as self-contained modules. The visitor instances are also functional objects, so we can naturally pass them around our program as values.
If we expect to add new operations to our recursive ADT in the future, using visitor makes our code ready for change without sacrificing the safety from bugs provided by static checking.
If we are building a little language where the ADT is designed for the purpose of allowing clients to define their own operations on our type from outside our implementation classes, visitor makes this possible.