6.102
6.102 — Software Construction
Spring 2024

# Reading 11: Recursive Data Types

#### Software in 6.102

Safe from bugsEasy to understandReady for change
Correct today and correct in the unknown future. Communicating clearly with future programmers, including future you. Designed to accommodate change without rewriting.

#### Objectives

• Understand recursive data types
• Read and write data type definitions
• Understand and implement functions over recursive data types
• Understand immutable lists and know the standard operations on immutable lists

## Introduction

In this reading we’ll look at recursively-defined types, how to specify operations on such types, and how to implement them. Our main example will be immutable lists.

## Recursion

Before we introduce recursive data types — which have a recursive structure of both data and computation — take a minute to review what you know about recursive functions. You should have already practiced this heavily in 6.101 (formerly called 6.009). Here are some 6.101 readings to review:

• Recursion discusses base cases and recursive cases, recursive helper functions, and shows how to implement several functions recursively.
• Recursion and Iteration discusses the duality of recursion (a function calling itself) and iteration (loops) as ways to express repeated computation. It introduces three common patterns for recursive computation: list-like, tree-like, and graph-like. List-like and tree-like patterns will be very important in the rest of this reading.

If you are at all uncertain about what these terms mean and how to use them, take some time to review this 6.101 material before going on.

Just as a recursive function is defined in terms of itself, a recursive data type is defined in terms of itself. We’ll see the same need for base and recursive cases, which will now appear as different variants of the abstract type. Some of our recursive types will be list-like and some will be tree-like.

Recursive factorial

Consider this recursive implementation of the factorial function.

``````function factorial(n: number): number {
if (n === 0) {
return 1; // this is called the base case
} else {
return n * factorial(n-1); // this is the recursive step
}
}``````

For `factorial(3)`, how many times will the base case `return 1` be executed?

(missing explanation)

Recursive Fibonacci

Consider this recursive implementation of the Fibonacci sequence.

``````function fibonacci(n: number): number {
if (n === 0 || n === 1) {
return 1; // base cases
} else {
return fibonacci(n-1) + fibonacci(n-2); // recursive step
}
}``````

For `fibonacci(3)`, how many times will the base case `return 1` be executed?

(missing explanation)

Danger zone

Here is a buggy recursive implementation:

``````/**
* @param array must be nonempty
* @returns the maximum integer in array
*/
function maxArray(array: Array<number>): number {
return maxOfRange(array, 0, array.length-1);
}

// helper function -- finds the max of array[start]...array[end]
function maxOfRange(array: Array<number>, start: number, end: number): number {
if (start === end) {
const elem = array[start];
array.splice(start, 1);
return elem;
} else {
const midpoint = Math.floor((start + end + 1) / 2);
return Math.max(maxOfRange(array, start, midpoint), maxOfRange(array, midpoint+1, end));
}
}``````

Which of the following bugs does this implementation suffer from? If a single bug could be described in more than one way, select all the matching choices.

(missing explanation)

## Example: Immutable lists

Immutability is powerful not just because of its safety, but also because of the potential for sharing. Sharing actually produces performance benefits: less memory consumed, less time spent copying. Here we’re going to look at how to represent list data structures differently than in our familiar array lists or linked lists.

Let’s define a data type for an immutable list, `ImList<E>`. The data type has four fundamental operations:

empty: void → ImList // returns an empty list // returns a new list formed by adding an element to the front of another list // returns the first element of a list, requires the list to be nonempty // returns the list of all elements of this list except for the first, requires the list to be nonempty

These four operations have a long and distinguished pedigree. They are fundamental to the list-processing languages Lisp and Scheme (where for historical reasons they are called `nil`, `cons`, `car`, and `cdr`, respectively). They are widely used in functional programming, where `first` and `rest` are sometimes called head and tail instead.

Before we design TypeScript classes to implement this data type, let’s get used to the operations a bit, using lists of integers. We’ll write lists with square brackets, like `[ 1, 2, 3 ]`, and we’ll write the operations as if they are simple functions. Once we get to TypeScript, the syntax will look different but the operations will have the same meaning.

``````empty() → [ ]
cons(0, empty()) → [ 0 ]
cons(0, cons(1, cons(2, empty()))) → [ 0, 1, 2 ]

x = cons(0, cons(1, cons(2, empty())))  → [ 0, 1, 2 ]
first(x) → 0
rest(x) → [ 1, 2 ]

first(rest(x)) → 1
rest(rest(x)) → [ 2 ]
first(rest(rest(x)) → 2
rest(rest(rest(x))) → [ ]``````

The fundamental relationship between `first`, `rest`, and `cons` is:

``````first(cons(x, y)) = x
rest(cons(x, y)) = y``````

What `cons` puts together, `first` and `rest` peel back apart.

### Immutable lists in TypeScript

To implement this data type in TypeScript, we’ll use an interface:

``````// todo: empty() returning ImList<E>
interface ImList<E> {
cons(first: E): ImList<E>;
}``````

This interface declares a generic type `ImList<E>` that can be instantiated for any type `E`: `ImList<number>`, `ImList<string>`, etc. The `E` in these declarations is a placeholder that the compiler will fill in when it checks our code for type safety.

And we’ll write two classes that implement this interface:

• `Empty` represents the result of the `empty` operation (an empty list)
• `Cons` represents the result of a `cons` operation (an element glued together with another list)
``````class Empty<E> implements ImList<E> {
public constructor() {
}
public cons(first: E): ImList<E> {
return new Cons<E>(first, this);
}
public get first(): E {
throw new Error("unsupported operation");
}
public get rest(): ImList<E> {
throw new Error("unsupported operation");
}
}``````

Note the use of getter methods for the `first` and `rest` properties, which allow these operations to fail fast when called on an empty list.

``````class Cons<E> implements ImList<E> {

public constructor(first: E, rest: ImList<E>) {
this.first = first;
this.rest = rest;
}
public cons(first: E): ImList<E> {
return new Cons<E>(first, this);
}
}``````

So we have operations for `cons`, `first`, and `rest`, but where is the fourth operation of our data type, `empty`?

One way to implement `empty` is to have clients call the `Empty` class constructor to obtain empty lists. This sacrifices representation independence — clients have to know about the `Empty` class!

As we saw in an earlier reading, a better way to do it is as a factory function that takes no arguments and produces an instance of `Empty`:

``````function empty<E>(): ImList<E> {
return new Empty<E>();
}``````

Now that we have all the operations, here’s some actual TypeScript code that parallels the abstract examples we wrote earlier. Hover or tap on each row in the table to update the snapshot diagram:

TypeScript syntax Functional syntax Result
`const nil: ImList<number> = empty();` `nil = empty()` `[ ]`
`nil.cons(0)` `cons(0, nil)` `[ 0 ]`
`nil.cons(2).cons(1).cons(0)` `cons(0, cons(1, cons(2, nil)))` `[ 0, 1, 2 ]`
`const x = nil.cons(2).cons(1).cons(0);` `x = cons(0, cons(1, cons(2, nil)))` `[ 0, 1, 2 ]`
`x.first` `first(x)` `0`
`x.rest` `rest(x)` `[ 1, 2 ]`
`x.rest.first` `first(rest(x))` `1`
`x.rest.rest` `rest(rest(x))` `[ 2 ]`
`x.rest.rest.first` `first(rest(rest(x)))` `2`
`x.rest.rest.rest` `rest(rest(rest(x)))` `[ ]`
`const y = x.rest.cons(4);` `y = cons(4, rest(x))` `[ 4, 1, 2 ]`

The key thing to note here is the sharing of structure that immutable list provides. In the last example above, `x` and `y` are sharing their representation of the sublist `[ 1, 2 ]`. Only one copy of this sublist exists in memory, and both `x` and `y` point to it, but this aliasing is perfectly safe because the list is immutable.

### Two classes implementing one interface

For `ImList`, the two implementations `Empty` and `Cons` cooperate in order to implement the data type — you need them both.

Terrible

We wrote our `ImList` implementations without documenting the abstraction function and representation invariant, which makes us terrible people.

``````class Empty<E> implements ImList<E> {
public constructor() { }
public cons(first: E): ImList<E> { return new Cons<E>(first, this); }
public get first(): E { throw new Error("unsupported operation"); }
public get rest(): ImList<E> { throw new Error("unsupported operation"); }
}``````

Select the best abstraction function for `Empty`:

(missing explanation)

And select lines to include in the rep invariant:

(missing explanation)

Terrible II
``````class Cons<E> implements ImList<E> {
public constructor(first: E, rest: ImList<E>) {
this.first = first;
this.rest = rest;
}
public cons(first: E): ImList<E> { return new Cons<E>(first, this); }
}``````

Select all the good abstraction functions for `Cons`:

(missing explanation)

And select lines to include in the rep invariant:

(missing explanation)

Short layover

Given this code:

``const airports = empty().cons("SFO").cons("IAD").cons("BOS");``

(missing explanation)

### TypeScript shortcut: parameter properties

A common pattern in object-oriented programming is a constructor that puts one or more of its parameters directly into the rep, as we saw with `first` and `rest`:

``````class Cons<E> implements ImList<E> {

public constructor(first: E, rest: ImList<E>) {
this.first = first;
this.rest = rest;
}
public cons(first: E): ImList<E> {
return new Cons<E>(first, this);
}
}``````

TypeScript offers a shortcut for this common pattern, called parameter properties. Attaching `public/private/readonly` keywords to a constructor parameter makes it into an instance variable, and eliminates the need for separate declaration and initialization:

``````class Cons<E> implements ImList<E> {

public constructor(
) {
this.first = first;
this.rest = rest;
}
public cons(first: E): ImList<E> {
return new Cons<E>(first, this);
}
}
``````

The same pattern can be used for private rep fields that are initialized from constructor arguments. But remember the dangers of putting parameters directly into the rep, and use this power carefully!

Danger: shortcut

Consider this ADT, whose implementation is appealingly short because the rep has been implemented using public readonly parameter properties:

``````/**
* Represents an immutable tweet from Twitter.
*/
class Tweet {
public constructor(
) {}
}``````

This not only automatically initializes the `author`, `text`, and `timestamp` fields of the rep, but also makes them accessible to the client as observer operations, which look syntactically like instance variables with the same names.

Suppose `tweet: Tweet` points to a `Tweet` object. How can a client threaten the immutability of `tweet`?

(missing explanation)

Half of the danger: shortcut

To fix the immutability issue in the previous exercise, we make a private instance variable `_timestamp` and provide a public getter for `timestamp` that defensively copies the Date object:

``````/**
* Represents an immutable tweet from Twitter.
*/
class Tweet {

public constructor(
timestamp: Date
) {
this._timestamp = timestamp;
}
public get timestamp(): Date {
return new Date(this._timestamp);
}
}``````

Assume a client creates an immutable tweet like this:

``````const author = "6.102";
const text = "SFB, RFC, ETU";
const timestamp = new Date(); // current time

const tweet = new Tweet(author, text, timestamp);``````

Choose all the ways that `tweet` can be mutated:

(missing explanation)

## Recursive data type definitions

The abstract data type `ImList`, and its two concrete classes `Empty` and `Cons`, form a recursive data type. `Cons` is an implementation of `ImList`, but it also uses `ImList` inside its own rep (for the `rest` field), so it recursively requires an implementation of `ImList` in order to successfully implement its contract.

To make this fact clearly visible, we’ll write a data type definition:

``ImList<E> = Empty + Cons(first: E, rest: ImList<E>)``

This is a recursive definition of `ImList` as a set of values. Here’s the high-level meaning: the set `ImList` consists of values represented in two ways: either by an `Empty` object (which has no fields), or by a `Cons` object whose fields are an element `first` and an `ImList` `rest`.

A more detailed reading: `ImList<E>` is a generic type where for any type `E`, the set `ImList<E>` consists of the values represented either by an `Empty` object, or by a `Cons` object, where a `Cons` object has fields consisting of `first` of type `E` and `rest` of type `ImList<E>`.

The recursive nature of the data type becomes far more visible when written in a data type definition.

We can write any `ImList` value as a term or expression using this definition, e.g. the list `[ 0, 1, 2 ]` is written as

``Cons(0, Cons(1, Cons(2, Empty)))``

And we can build up the infinite set of all `ImList` values by starting from the base case that is `Empty` and recursively applying `Cons`. For example, with `ImList<boolean>`:

 base case 1st recursive application 2nd recursive application 3rd ... ```ImList ``` ```= Empty ``` ```+ Cons(true, Empty) + Cons(false, Empty) ``` ```+ Cons(true, Cons(true, Empty)) + Cons(true, Cons(false, Empty)) + Cons(false, Cons(true, Empty)) + Cons(false, Cons(false, Empty)) ``` ```+ Cons(true, ...) + Cons(true, ...) + Cons(true, ...) + Cons(true, ...) + Cons(false, ...) + Cons(false, ...) + Cons(false, ...) + Cons(false, ...) ``` ```+ ... ```

Formally, a data type definition has:

• an abstract data type on the left, defined by its representation (or concrete data type) on the right
• the representation consists of variants of the data type combined by a union operator, which we’ll write as `+`
• each variant is a class name with zero or more fields, written with name and type separated by a colon (`:`).

A recursive data type definition is a data type definition where the abstract type (on the left) appears in its own definition (as the type of a field on the right). Another example is a binary tree:

``Tree<E> = Empty + Node(e: E, left: Tree<E>, right: Tree<E>)``

The `Tree` type is defined as the set of values formed either by `Empty` or by `Node`, which takes an element `e` (of type `E`) and `left` and `right` subtrees.

We’ll see more examples below.

Data type definitions are called algebraic data types in functional programming. The syntax we are using here is designed for object-oriented programming languages like TypeScript, Python, and Java, in which classes and interfaces are used to implement the recursive data type. Functional programming languages like Haskell and ML use a different syntax, but the idea is the same: the type consists of a union of variants, some of which may use the type recursively.

When you write a recursive data type, document the recursive data type definition as a comment in the interface:

``````interface ImList<E> {

// Data type definition:
//   ImList<E> = Empty + Cons(first: E, rest: ImList<E>)

// ...``````

It’s rocks all the way down

Consider the data type definition:

``````Geology = Core(a: number, b: number, c: number, d: number)
+ Planet(core: Core, a: number, b: number, c: number, d: number)
+ System(geology: Geology, a: number, b: number)``````

Suppose you have a reference to a `Geology` object. How many numbers might it have in its representation?

(missing explanation)

Given their definitions, which of the following immutable data types would be able to represent a pair of numbers?

(missing explanation)

I heard you like poison, so…

For the same data type definitions as the previous question:

``````T = A(x: number) + B(a1: A, a2: A)
U = D(x: number, y: number)
V = F(z: number, v: V) + G(z: number, v: V)
W = K(n: number) + L(n: number, m: W)
X = M + N(here: number, there: X)``````

…which data types are recursive?

(missing explanation)

VVVVVV

What is the problem with type `V`?

``V = F(z: number, v: V) + G(z: number, v: V)``

(missing explanation)

Bottom of the barrel

Let’s compare type `W` with type `X`:

``````W = K(n: number) + L(n: number, m: W)
X = M + N(here: number, there: X)``````

We saw that type `X` is just our `ImList` in disguise (and only for numbers). Which are possible problems with type `W` compared to type `X`, if we also want to think of `W` as a list?

(missing explanation)

Defining Optional

In TypeScript, to specify that a value of type `E` may be present or may be missing, we might use a union type like `E|undefined` (see below for why we might not).

Languages that don’t support union types typically have a library class to provide the same ability in a typesafe way. For example, the Java library has Optional<E>, an immutable type that either contains a value of type `E`, or doesn’t.

Suppose you wanted to define `Optional` in TypeScript, as an interface with two variant classes. Write a data type definition for `Optional<E>`, following the data type definition pattern of `ImList<E>` above.

(missing explanation)

## Functions over recursive data types

This way of thinking about data types — as a recursive definition of an abstract data type with concrete variants — is appealing not only because it can handle recursive and unbounded structures like lists and trees, but also because it provides a convenient way to describe operations over the data type, as functions with one case per variant.

First, notice how the data type definition maps to the abstract interface and concrete variants that we’ve already seen:

 Data type definition TypeScript code ```ImList ``` ``````interface ImList { ... } `````` ` =` ```Empty ``` ``````class Empty implements ImList { public constructor() { } } `````` ` +` ```Cons(first: E, rest: ImList) ``` ``````class Cons implements ImList { private readonly first: E; private readonly rest: ImList; public constructor(first: E, rest: ImList) { this.first = first; this.rest = rest; } } ``````

Now: let’s add a new operation by thinking about it as a recursive function with one case per variant: the size of the list, which is certainly an operation we’ll want in `ImList`. We can define it like this:

size : ImList → number  // returns the size of the list

and then fully specify its meaning by defining size for each variant of `ImList`:

``````size(Empty) = 0
size(Cons(first: E, rest: ImList)) = 1 + size(rest)``````

This function is recursive. We can think about the execution of size on a particular list as a series of reduction steps:

``````size(Cons(0, Cons(1, Empty)))
= 1 + size(Cons(1, Empty))
= 1 + (1 + size(Empty))
= 1 + (1 + 0)
= 1 + 1
= 2``````

And the cases from the definition can be translated directly into TypeScript in `ImList`, `Empty`, and `Cons`:

``````interface ImList<E> {
// ...
}

class Empty<E> implements ImList<E> {
// ...
}

class Cons<E> implements ImList<E> {
// ...
public get size(): number { return 1 + this.rest.size; }
}``````

This pattern of implementing an operation over a recursive data type by

• declaring the operation in the abstract data type interface
• implementing the operation (recursively) in each concrete variant

is a very common and practical design pattern. It sometimes goes by the unhelpful name interpreter pattern.

Let’s try a few more examples:

isEmpty : ImList → boolean

``````isEmpty(Empty) = true
isEmpty(Cons(first: E, rest: ImList)) = false``````

contains : ImList × E → boolean

``````contains(Empty, e: E) = false
contains(Cons(first: E, rest: ImList), e: E) = (first=e) or contains(rest, e)``````

get: ImList × number → E

``````get(Empty, n: number) = undefined
get(Cons(first: E, rest: ImList), n: number) = if n=0 then first else get(rest, n-1)``````

append: ImList × ImList → ImList

``````append(Empty, that: ImList) = that
append(Cons(first: E, rest: ImList), that: ImList) = cons(first, append(rest, that))``````

reverse: ImList → ImList

``````reverse(Empty) = empty()
reverse(Cons(first: E, rest: ImList)) = append(reverse(rest), cons(first, empty()))``````

For reverse, it turns out that the recursive definition produces a pretty bad implementation in TypeScript, with performance that’s quadratic in the length of the list you’re reversing. We could rewrite it using an accumulator helper function if needed.

isEmpty? No, isImplemented!

isEmpty : ImList → boolean

isEmpty(Empty) = true
isEmpty(Cons(first: E, rest: ImList)) = false

Let’s implement `ImList.isEmpty`.

``````interface ImList<‍E> {
// ...
/**
* @returns true iff this list is empty
}``````

``````class Empty<‍E> implements ImList<‍E> {
// ...

}
}``````

(missing explanation)

``````class Cons<‍E> implements ImList<‍E> {
// ...

}
}``````

(missing explanation)

append: ImList × ImList → ImList

append(Empty, that: ImList) = that
append(Cons(first: E, rest: ImList), that: ImList) = cons(first, append(rest, that))

Let’s implement `ImList.append`.

``````interface ImList<‍E> {
// ...
/**
* @param that list to append to this list
* @returns list with the elements of this followed by the elements of that
}``````

``````class Empty<‍E> implements ImList<‍E> {
// ...

}
}``````

(missing explanation)

``````class Cons<‍E> implements ImList<‍E> {
// ...

{``````

This time, pick all the correct implementations:

``````    }
}``````

(missing explanation)

### Tuning the rep

Getting the size of a list is a common operation. Right now our implementation of `size` takes O(n) time, where n is the length of the list — that’s linear in the number of list items. We can make it better with a simple change to the rep of the list that caches the size the first time we compute it, so that subsequently it costs only O(1) time — constant time, independent of the number of list items — to get:

``````class Cons<E> implements ImList<E> {
private cachedSize: number|undefined = undefined;
// rep invariant:
//   cachedSize is either a positive integer, equal to 1 + the number of E instances
//   found in this.rest, or undefined

// ...
public get size(): number {
if (this.cachedSize === undefined) this.cachedSize = 1 + this.rest.size;
return this.cachedSize;
}
}``````

There’s something interesting happening here: this is an immutable data type, and yet it has a mutable rep. It’s modifying its own `cachedSize` field, in this case to cache the result of an expensive operation. This is an example of a benevolent side-effect, a state change that doesn’t change the abstract value represented by the object, so the type is still immutable.

## Rep independence and rep exposure revisited

Does our TypeScript implementation of `ImList` still have rep independence? We’ve concealed the `Empty` constructor behind the factory function `empty()`, and clients should never need to use the `Empty` or `Cons` constructors directly.

We have a great deal of freedom to change our implementation — indeed, we just added a `cachedSize` field to the internal rep of `Cons`. We could even have an extra array in there to make `get()` run fast! This might get expensive in space, however, but we are free to make those tradeoffs.

What about an operation like `isEmpty` above: does that operation break rep independence by revealing the concrete variant to clients? No! The specification of `isEmpty` is not, e.g., “return true iff this is an instance of the Empty concrete class.” Instead, like any operation of an abstract type, we specify `isEmpty` abstractly, e.g., “return true iff this list contains no elements.” Any implementation of immutable lists, whether a recursive Empty-plus-Cons implementation, an array-backed list, a linked list, or something else, can support an `isEmpty` operation with that spec.

Any time you write an ADT, its specs must not talk about the rep. The concrete variants of a recursive ADT are its rep, so the specs must not mention them.

Finally, is there rep exposure because `Cons.rest()` returns a reference to its internal list? Could a clumsy client add elements to the rest of the list? If so, this would threaten two of `Cons`’s invariants: that it’s immutable, and that the cached size is always correct. But there’s no risk of rep exposure, because the internal list is immutable. Nobody can threaten the rep invariant of `Cons`.

## Null vs. empty

It might be tempting to get rid of the `Empty` class and just use `null` or `undefined` instead. Resist that temptation.

Using an object, rather than a null reference, to signal the base case or endpoint of a data structure is an example of a design pattern called sentinel objects. The enormous advantage that a sentinel object provides is that it acts like an object in the data type, so you can call methods on it. So we can call the `size` operation even on an empty list. If empty lists were represented by `null` or `undefined`, then we wouldn’t be able to do that, and as a consequence our code would be full of tests like:

``if (list !== null) n = list.size;``

which clutter the code, obscure its meaning, and are easy to forget. Better the much simpler

``n = list.size;``

which will always work, including when an empty `list` refers to an `Empty` object.

Keep `null` values out of your data structures, and your life will be happier.

## Static type vs. dynamic type

Now that we’re using interfaces and classes, it’s worth taking a moment to reinforce an important point about how TypeScript’s type-checking works. In fact every statically-checked object-oriented language works this way.

There are two worlds in type checking: compile time before the program runs, and run time when the program is executing. We saw this distinction when we discussed dynamic dispatch.

At compile time, every variable has a static type, also called its declared type or compile-time type, which is either stated in its declaration or inferred from its initializer. The compiler uses the static types of variables (and method return values) to deduce static types for every expression in the program. For example, after a variable declaration `s: string`, the expression `s` has static type `string`, `s[0]` has static type `string`, and `s.length` has static type `number`. Note that TypeScript allows a variable’s static type to be inferred from its initializer: so, for example, in `let n = 5`, the static type of `n` is `number`.

At run time, every object has a dynamic type, also called its actual type or runtime type, imbued in it by the constructor that created the object. For example, `new String()` makes an object whose dynamic type is `String`. `new Empty()` makes an object whose dynamic type is `Empty`. `new ImList()`, on the other hand, is forbidden by TypeScript, because `ImList` is an interface — it has no object values of its own, and no constructors.

In short, variables at compile time have static types, whereas objects at run time have dynamic types.

I do declare
``const hello = "Hello";``

(missing explanation)

Words, words, words
``````const words1: Array<string> = [];
words1.push("hello");

const words2: ImList<string> = empty();
words2.cons("hello");``````

(missing explanation)

## Dynamic type inspection

Dynamic dispatch makes it possible to run different code depending on the dynamic type of `this`, and we’re using it to run the variant-specific implementations when clients call the operations of our recursive type.

You may find that you are tempted to inspect the runtime type of an object, especially with recursive types, as you implement an algorithm. This is a bad idea, an anti-pattern that will quickly get us mired in weeds.

For example, suppose we want to implement:

last: ImList → E
// returns the last element of the list, requires the list to be nonempty

Do not do it this way:

``````class Cons<E> implements ImList<E> {
// ...
public last(): E {
if (this.rest instanceof Empty) { return this.first; } // do not do this
return this.rest.last();
}
}``````

The `instanceof` operator tests whether an object is an instance of a particular type. It is dynamic type checking, and it is both less safe from bugs and less ready for change than static type checking. The fundamental reason is that new variants might be added to the type. For operations implemented as instance methods of the interface, like `first` and `rest`, those new variants will be required to make sure that the operations are correctly implemented, or the code won’t even compile. (Static checking!) But for operations implemented like this – using `instanceof` – the compiler won’t help you find the places that you need to change to handle the new variants, and some of those places might quietly return wrong answers instead of throwing an error.

For example, suppose someone were to later on introduce a `CachingList` variant that cached the results of some otherwise expensive operations, like `size`. This is a different way of doing the caching we did above, but now using a common design pattern called a wrapper (also known as a decorator):

``````class CachingList<E> implements ImList<E> {
// rep invariant:
//   cachedSize === wrappedList.size

// ... constructor and other operations omitted

public get first(): E {
return wrappedList.first;
}
public get rest(): ImList<E> {
return wrappedList.rest;
}
public get size(): number {
return cachedSize;
}
}``````

With `CachingList` added to the system, there are now multiple ways to represent an empty list: an instance of `Empty`, or an instance of `CachingList` wrapped around an `Empty`. In general, any `instanceof` checks that assumed that `Cons` and `Empty` were the only ways to make lists, are now broken, without any static type error. Do not use `instanceof`.

How, then, can we implement the `last` operation on `ImList`? Any time the thought of `instanceof` seems appealing, take a step back and rethink the problem.

In trying to decide whether its `first` element is, in fact, also the `last` element, `Cons` doesn’t care about the representation of `rest`, it only cares about its abstract value. If our type doesn’t provide enough operations for clients (in this case, we are both implementers of `ImList` and, recursively, clients of it) to do what they want, we can make the type more powerful by adding operations:

A little help here

We’ve already proposed many additional operations for `ImList`. Which can we use as helpers to implement `last` without checking `instanceof`?

(missing explanation)

For the last time

If we decide to use `size`, what expression will fill the blank in the new and improved implementation of `Cons.last()`, taking the place of `this.rest instanceof Empty`?

``````class Cons<E> implements ImList<E> {
// ...
public last(): E {
if (this.rest instanceof Empty ________________) { return this.first; }
return this.rest.last();
}
}``````

(missing explanation)

## Equality

One place where using `instanceof` may be the least-bad option is defining an `equalValue()` operation for immutable types.

In Equality when we defined `equalValue(..)` for our immutable TypeScript types, we required that the type of argument `that` match the type of `this`. And on the surface, that makes sense.

But in languages where equality is defined as part of the language, as we saw with Python’s `__eq__`, the spec of the equality operation may allow clients to call it with any other object as an input. Since you can’t make any useful determination about equality without knowing the type, you must first use `instanceof` (in Python, `isinstance`).

And a similar problem exists for a recursive type. Suppose we want to define `equalValue(..)` for `ImList`: what will be the type of the argument? We might wish to define only `equalValue(Empty<E>)` in `Empty`, and only `equalValue(Cons<E>)` in `Cons`. But clients only know about the type `ImList`, so we need a signature like `equalValue(ImList<E>)` in the `ImList` interface that both concrete variants will implement.

There are two implementation options:

1. Define observers that allow us to make the equality determination without knowing the concrete variant of `that`. For example, if we have both `size` and `get(..)` as defined above:

``````interface ImList<E> {
// ... includes size and get(..) ...
equalValue(that: ImList<E>): boolean;
}
class Empty<E> implements ImList<E> {
public equalValue(that: ImList<E>): boolean {
return that.size === 0;
}
}
class Cons<E> implements ImList<E> {
public equalValue(that: ImList<E>): boolean {
if (this.size !== that.size) { return false; }
for (let ii = 0; ii < this.size; ii++) {
if (this.get(ii) !== that.get(ii)) { return false; }
}
return true;
}
}``````

This approach is quite clunky (harder to understand), but effective, and keeps everything above the abstraction barrier (safer from bugs, if all those operations are well-tested).

2. Check the runtime type:

``````interface ImList<E> {
// ...
equalValue(that: ImList<E>): boolean;
}
class Empty<E> implements ImList<E> {
public equalValue(that: ImList<E>): boolean {
return that instanceof Empty;
}
}
class Cons<E> implements ImList<E> {
public equalValue(that: ImList<E>): boolean {
if ( ! (that instanceof Cons)) { return false; }
return this.first === that.first && this.rest.equalValue(that.rest);
}
}``````

This approach has the frisson of runtime type inspection (less safe from bugs), but is elegant and with a recursive structure that matches the data (easier to understand). It’s also far less likely that introducing new variants will change the checking of equality in existing variants.

Note that this implementation relies on the fact that a given abstract immutable list can ony be represented by exactly one rep value: if there are alternative reps, with a different structure of concrete variant instances, we need to correctly account for those alternatives.

## Another example: Boolean formulas

Another useful sort of recursive data type in computer science is for Boolean formulas. For instance, here’s a formula of propositional logic:

(P Q) (¬P R)

which means “either P or Q is true, and either P is false or R is true.”

We can give a data type definition suitable for representing all formulas of propositional logic.

``````Formula = Variable(name: String)
+ Not(formula: Formula)
+ And(left: Formula, right: Formula)
+ Or(left: Formula, right: Formula)``````

(P ∨ Q) ∧ (¬P ∨ R) would be

``````And( Or(Variable("P"), Variable("Q")),
Or(Not(Variable("P")), Variable("R")) )``````

Not

Alyssa and Ben are studying the recursive data type for Boolean formulas.

Ben suggests changing the `Not` variant from `Not(formula:Formula)` to `Not(var: Variable)`. Alyssa doesn’t think that will work. Which of these formulas should she offer as counterexamples?

(missing explanation)

Variables

Alyssa decides she would like an operation to retrieve all of the variable names in a formula as a set, using the immutable set type `ImSet`:

variables : Formula → ImSet<‍string> // returns the set of variables used in the formula

Suppose immutable `ImSet<E>` provides three operations:

emptySet : void → ImSet<‍E>
// returns an empty set

setAdd : E × ImSet<‍E> → ImSet<‍E>
// takes a set and an element and returns the (duplicate-free) set with that element added

setUnion : ImSet<‍E> × ImSet<‍E> → ImSet<‍E>
// returns the (duplicate-free) union of two sets

Use those operations to define variables. We’ve defined two of the cases for you:

 variables(Variable(x)) = setAdd(x, emptySet()) variables(Not(f)) = (missing answer) variables(And(f1, f2)) = setUnion(variables(f1), variables(f2)) variables(Or(f1, f2)) = (missing answer)

(missing explanation)

ImSet

How might we implement immutable `ImSet`? Let’s use `ImList`!

Here’s the data type definition, then, for `ImSet`:

``ImSet<E> = ImList<E>``

Let’s define the operations using function notation. Remember the four original operations of `ImList`:

empty: void → ImList
cons: E × ImList → ImList
first: ImList → E
rest: ImList → ImList

Plus these additional ones we defined:

size : ImList → number
contains : ImList × E → boolean

We’ll define emptySet for you:

emptySet : void → ImSet<‍E>

 emptySet() = empty()

(missing explanation)

And suppose we implement setSize using the size operation:

setSize : ImSet<‍E> → number

 setSize = size

(missing explanation)

Complete these recursive definitions:

setAdd : E × ImSet<‍E> → ImSet<‍E>

(missing explanation)

setUnion : ImSet<‍E> × ImSet<‍E> → ImSet<‍E>

 setUnion(Empty, set2) = set2 setUnion(Cons(first, rest), set2) = if contains(set2, first) then (missing answer) else cons(first, (missing answer) )

Hint for setUnion: make the first argument closer to the `Empty` base case.

(missing explanation)

A key operation for Boolean formulas is testing whether they are satisfiable, that is, whether some assignment of true/false values to the variables leads the formula to evaluate to true. There is a simple but slow algorithm for finding a satisfying assignment if it exists:

1. Extract the set of variables from the formula.
We’ve already implemented this with the variables operation.

2. Try all possible assignments of true/false values to those variables.
We can represent an assignment with an `Environment`: a list of variables and their values. We could use `ImList` to implement the `Environment`, or develop an immutable map type.

3. Evaluate the formula for each environment.
For this, we’ll define evaluate : Formula × Environment → boolean.

4. Return the first environment in which the formula evaluates to true, or fail if it’s not possible.

Defining these pieces and putting them together into a satisfiable : Formula → boolean function is an exercise for another time.

SAT Solver enthusiasts may be quick to point out that a faster algorithm exists which makes use of backtracking. Can our `Formula` recursive data type handle this improvement? As you’ll see in the next section, the answer is it can, and quite well!

## Backtracking search with immutability

We started out this reading with immutable lists, which are a representation that permits a lot of sharing between different list instances. Sharing of a particular kind, though: only the ends of lists can actually be shared. If two lists are identical at the beginning but then diverge from each other, they have to be stored separately. (Why?)

It turns out that backtracking search is a great application for these lists, and here’s why. A search through a space (like the space of assignments to a set of Boolean variables) generally proceeds by making one choice after another, and when a choice leads to a dead end, you backtrack.

Mutable data structures are typically not a good approach for backtracking. If you use a mutable `Map`, say, to keep track of the current variable bindings you’re trying, then you have to undo those bindings every time you backtrack. That’s error-prone and painful compared to what you do with immutable maps — when you backtrack, you just throw the map away!

But immutable data structures with no sharing aren’t a great idea either, because the space you need to keep track of where you are (in the case of the satisfiability problem, the environment) will grow quadratically if you have to make a complete copy every time you take a new step. You need to hold on to all the previous environments on your path, in case you need to back up.

Immutable lists have the nice property that each step taken on the path can share all the information from the previous steps, just by adding to the front of the list. When you have to backtrack, you stop using the current step’s state — but you still have references to the previous step’s state.

Finally, a search that uses immutable data structures is immediately ready to be parallelized. You can delegate multiple processors to search multiple paths at once, without having to deal with the problem that they’ll step on each other in a shared mutable data structure. We’ll talk about this more when we get to concurrency.

## Immutability and performance

Since we’re talking about immutable types here, this is a good moment to discuss performance as well. One common reason for choosing mutability rather than immutability is performance optimization. But it’s not correct to conclude that a mutable type is always more efficient than an immutable type.

The key question to think about is how much sharing is possible, and how much copying is required. An immutable value may be safely shared by many different parts of a program, where a mutable value in the same context would have to be defensively copied over and over, at a cost of both time and space. On the other hand, if a value needs to be edited, in the sense of having many small mutations made to it, then a mutable value may be more efficient, because it doesn’t require the whole value to be copied on every edit.

But even when an immutable value needs to be heavily edited, it may still be possible to design it in a way that exploits sharing in order to reduce copying. For example, if you have an immutable string that is a million characters long, editing a single character in the middle of the string seems to require copying all the unchanged characters too. But a clever `string` implementation might internally share the unchanged regions of characters before and after the edit, so that even though you get a new `string` object as a result of the edit, it actually occupies very little additional memory space. We will see an example of this kind of implementation in a few classes, when we talk about abstract data types. This kind of internal sharing is only possible because of immutability.

The git object graph is another example of the benefit of sharing. Because commits are immutable, other commits can point to them without fear that they will become broken because the parent commit is modified (e.g. reverted or undone). Without immutability, this kind of structural sharing is impossible.

## Summary

In addition to the big idea of recursive data types, we saw in this reading:

• data type definitions: a powerful way to think about abstract types, particularly recursive ones
• functions over recursive data types: declared in the specification for the type, and implemented with one case per concrete variant
• immutable lists: a classic, canonical example of an immutable data type

As always, we ask how these ideas make our code safer from bugs, easier to understand, and more ready for change. Look again at the definition and implementation of `size` in `ImList`. The definition is little more than the mathematical definition of size. The code is little more than the definition, with some syntactic changes to placate the compiler.

If we examine the definitions for further methods — `isEmpty`, `contains`, etc. — in each case we see a safe, easy-to-read implementation waiting to be coded. Since we’ve taken the time to specify these operations, if we avoid rep exposure and maintain rep independence, we know our code is ready for change: different clients will be able to reuse our data type, and we will be able to update the implementation without breaking them.