6.031TS
6.031TS — Software Construction
TypeScript Pilot — Spring 2021

Reading 10: Abstract Data Types

Software in 6.031

Safe from bugsEasy to understandReady for change
Correct today and correct in the unknown future. Communicating clearly with future programmers, including future you. Designed to accommodate change without rewriting.

Objectives

Today’s class introduces two ideas:

  • Abstract data types
  • Representation independence

Introduction

In this reading, we look at a powerful idea: abstract data types. This idea enables us to separate how we use a data structure in a program from the particular form of the data structure itself.

Abstract data types address a particularly dangerous problem: clients making assumptions about the type’s internal representation. We’ll see why this is dangerous and how it can be avoided. We’ll also discuss the classification of operations, and some principles of good design for abstract data types.

Access control in TypeScript

reading exercises

The following questions use the code below. Study it first, then answer the questions.

      class Wallet {
          private amount: number = 0;

          public loanTo(that: Wallet): void {
/*A*/         that.amount += this.amount;
/*B*/         amount = 0;
          }
      }

      class Person {
          private w: Wallet;

          public getNetWorth(): number {
/*C*/         return this.w.amount;
          }

          public isBroke(): boolean {
/*D*/         return Wallet.amount === 0;
          }
      }


      function main():void {
/*E*/     const w = new Wallet();
/*F*/     w.amount = 100;
/*G*/     w.loanTo(w);
      }

      main();
Snapshot

Suppose the program has paused after running the line marked /*A*/ but before reaching /*B*/. A partial snapshot diagram of its internal state is shown at the right, with numbered gray boxes as placeholders for you to fill in. What should each of those boxes be?

1

2

3

4

(missing explanation)

Access control A
that.amount += this.amount;

(missing explanation)

Access control B
amount = 0;

(missing explanation)

Access control C
return this.w.amount;

(missing explanation)

Access control D
return Wallet.amount === 0;

(missing explanation)

Access control E
const w = new Wallet();

(missing explanation)

Access control F
w.amount = 100;

(missing explanation)

Access control G
w.loanTo(w);

(missing explanation)

What abstraction means

Abstract data types are an instance of a general principle in software engineering, which goes by many names with slightly different shades of meaning. Here are some of the names that are used for this idea:

  • Abstraction. Omitting or hiding low-level details with a simpler, higher-level idea.
  • Modularity. Dividing a system into components or modules, each of which can be designed, implemented, tested, reasoned about, and reused separately from the rest of the system.
  • Encapsulation. Building a wall around a module so that the module is responsible for its own internal behavior, and bugs in other parts of the system can’t damage its integrity.
  • Information hiding. Hiding details of a module’s implementation from the rest of the system, so that those details can be changed later without changing the rest of the system.
  • Separation of concerns. Making a feature (or “concern”) the responsibility of a single module, rather than spreading it across multiple modules.

As a software engineer, you should know these terms, because you will run into them frequently. The fundamental purpose of all of these ideas is to help achieve the three important properties that we care about in 6.031: safety from bugs, ease of understanding, and readiness for change.

We have in fact already encountered some of these ideas in previous classes, in the context of writing methods that take inputs and produce outputs:

  • Abstraction: A spec is an abstraction in that the client only has to understand its preconditions and postconditions to use it, not the full internal behavior of the implementation.
  • Modularity: Unit testing and specs help make methods into modules.
  • Encapsulation: The local variables of a function are encapsulated, since only the function itself can use or modify them. Contrast with global variables, which are quite the opposite, or local variables pointing to mutable objects that have aliases, which also threaten encapsulation.
  • Information hiding: A spec uses information-hiding to leave the implementer some freedom in how the function is implemented.
  • Separation of concerns: A good function spec is coherent, meaning it is responsible for just one concern.

Starting with today’s class, we’re going to move beyond abstractions for functions, and look at abstractions for data as well. But we’ll see that functions will still play a crucial role in how we describe data abstraction.

User-defined types

In the early days of computing, a programming language came with built-in types (such as integers, booleans, strings, etc.) and built-in functions, e.g., for input and output. Users could define their own functions: that’s how large programs were built.

A major advance in software development was the idea of abstract types: that one could design a programming language to allow user-defined types, too. This idea came out of the work of many researchers, notably Dahl, who invented the Simula language; Hoare, who developed many of the techniques we now use to reason about abstract types; and Parnas, who coined the term information hiding and first articulated the idea of organizing program modules around the secrets they encapsulated.

Here at MIT, Barbara Liskov and John Guttag did seminal work in the specification of abstract types, and in programming language support for them – and developed the original 6.170, the predecessor to 6.005, predecessor to 6.031. Barbara Liskov earned the Turing Award, computer science’s equivalent of the Nobel Prize, for her work on abstract types.

The key idea of data abstraction is that a type is characterized by the operations you can perform on it. A number is something you can add and multiply; a string is something you can concatenate and take substrings of; a boolean is something you can negate, and so on. In a sense, users could already define their own types in early programming languages: you could create a record type date, for example, with integer fields for day, month, and year. But what made abstract types new and different was the focus on operations: the user of the type would not need to worry about how its values were actually stored, in the same way that a programmer can ignore how the compiler actually stores integers. All that matters is the operations.

reading exercises

Abstract data types

Consider an abstract data type Bool. The type has the following operations:

true : Bool
false : Bool

and : Bool × Bool → Bool
or : Bool × Bool → Bool
not : Bool → Bool

… where the first two operations construct the two values of the type, and the last three operations have the usual meanings of logical and, logical or, and logical not on those values.

Which of the following are possible ways that Bool might be implemented, and still be able to satisfy the specs of the operations? Choose all that apply.

(missing explanation)

Classifying types and operations

Types, whether built-in or user-defined, can be classified as mutable or immutable. The objects of a mutable type can be changed: that is, they provide operations which when executed cause the results of other operations on the same object to give different results. So Date is mutable, because you can call setMonth and observe the change with the getMonth operation. But string is immutable, because its operations create new string objects rather than changing existing ones. Sometimes a type will be provided in two forms, a mutable and an immutable form. ReadonlyArray, for example, is an immutable form of Array.

The operations of an abstract type are classified as follows:

  • Creators create new objects of the type. A creator may take values of other types as arguments, but not an object of the type being constructed.
  • Producers also create new objects of the type, but require one or more existing objects of the type as input. The concat method of string, for example, is a producer: it takes two strings and produces a new string representing their concatenation.
  • Observers take objects of the abstract type and return objects of a different type. The length property of Array, for example, returns a number.
  • Mutators change objects. The push method of Array, for example, mutates an array by adding an element to the end.

We can summarize these distinctions schematically like this (explanation to follow):

  • creator : t* → T
  • producer : T+, t* → T
  • observer : T+, t* → t
  • mutator : T+, t* → void | t | T

These show informally the shape of the signatures of operations in the various classes. Each T is the abstract type itself; each t is some other type. The + marker indicates that the type may occur one or more times in that part of the signature, and the * marker indicates that it occurs zero or more times. The | indicates or. For example, a producer may take two values of the abstract type T, like String.concat() does:

  • concat : string × string → string

Some observers take zero arguments of other types t, such as:

… and others take several:

  • compare : Buffer × Buffer × number × number × number × number → number

A creator operation is often implemented as a constructor, like new Date(). But a creator can simply be a static method instead, like Array.of(). A creator implemented as a static method or standalone function is often called a factory.

Mutators are often signaled by a void return type. A method that returns void must be called for some kind of side-effect, since it doesn’t otherwise return anything. But not all mutators return void. For example, Set.add() returns the set itself, so that multiple add() calls can be chained together.

Abstract data type examples

Here are some examples of abstract data types, along with some of their operations, grouped by kind.

number is TypeScript’s primitive numeric type. number is immutable, so it has no mutators.

  • creators: the numeric literals 0, 1, 2, …
  • producers: arithmetic operators +, -, *, /, **
  • observers: comparison operators ===, !==, <, >
  • mutators: none (it’s immutable)

Array is TypeScript’s array type. Array is mutable.

string is TypeScript’s string type. String is immutable.

  • creators: String constructor, "..." literal syntax, String.fromCharCode static method
  • producers: concat, substring, toUpperCase
  • observers: length, charAt, [i] indexing syntax
  • mutators: none (it’s immutable)

This classification gives some useful terminology, but it’s not perfect. In complicated data types, there may be an operation that is both a producer and a mutator, for example. We will refer to such a method as both a producer and a mutator, but some people would prefer to just call it a mutator, reserving the term producer only for operations that do no mutation.

reading exercises

Operations

Each of the methods below is an operation on an abstract data type from the JavaScript library. Click on the link to look at its documentation. Think about the operation’s type signature. Then classify the operation.

Hints: pay attention to whether the type itself appears as a parameter or return value. And remember that instance methods (lacking the static keyword) have an implicit parameter.

 

Number.parseInt()

(missing explanation)

 

Array.slice()

(missing explanation)

 

Array.shift()

(missing explanation)

 

String.toUpperCase()

(missing explanation)

 

Set.has()

(missing explanation)

 

Map.keys()

(missing explanation)

An abstract type is defined by its operations

The essential idea here is that an abstract data type is defined by its operations. The set of operations for a type T, along with their specifications, fully characterize what we mean by T.

So, for example, when we talk about the Array type, what we mean is not a linked list, or a block of memory, or a hash table, or any other specific data structure that might represent a sequence of values.

Instead, the Array type is a set of opaque values — the possible objects that can have Array type — that satisfy the specifications of all the operations of Array: […] (indexing), length, push(), etc. The values of an abstract type are opaque in the sense that a client can’t examine the data stored inside them, except as permitted by operations.

Expanding our metaphor of a specification firewall, you might picture values of an abstract type as hard shells, hiding not just the implementation of an individual function, but of a set of related functions (the operations of the type) and the data they share (the private fields stored inside values of the type).

The operations of the type constitute its abstraction. This is the public part, visible to clients who use the type.

The fields of the class that implements the type, as well as related classes that help implement a complex data structure, constitute a particular representation. This part is private, visible only to the implementer of the type.

Designing an abstract type

Designing an abstract type involves choosing good operations and determining how they should behave. Here are a few rules of thumb.

It’s better to have a few, simple operations that can be combined in powerful ways, rather than lots of complex operations.

Each operation should have a well-defined purpose, and should have a coherent behavior rather than a multitude of special cases. We probably shouldn’t add a sum operation to Set, for example. It might help clients who work with sets of numbers, but what about sets of strings? Or nested sets? All these special cases would make sum a hard operation to understand and use.

The set of operations should be adequate in the sense that there must be enough to do the kinds of computations clients are likely to want to do. A good test is to check that every property of an object of the type can be extracted. For example, if string had no indexing operation [i], we would not be able to find out what the individual characters of the string are. Basic information should not be inordinately difficult to obtain. For example, the length property is not strictly necessary for string, because we could apply [i] for increasing indices i until we get undefined, but this is inefficient and inconvenient.

The type may be generic: a list, or a set, or a graph, for example. Or it may be domain-specific: a street map, an employee database, a phone book, etc. But it should not mix generic and domain-specific features. A Deck type intended to represent a sequence of playing cards shouldn’t have a generic add method that accepts arbitrary objects like integers or strings. Conversely, it wouldn’t make sense to put a domain-specific method like dealCards into the generic type Array.

Representation independence

Critically, a good abstract data type should be representation independent. This means that the use of an abstract type is independent of its representation (the actual data structure or data fields used to implement it), so that changes in representation have no effect on code outside the abstract type itself. For example, the operations offered by Array are independent of whether the array is represented internally as a contiguous block of memory, a linked list, or a hash table.

As an implementer, you will only be able to safely change the representation of an ADT if its operations are fully specified with preconditions and postconditions, so that clients know what to depend on, and you know what you can safely change.

Example: different representations for strings

Let’s look at a simple abstract data type to see what representation independence means and why it’s useful. The MyString type below has far fewer operations than the primitive TypeScript string, and their specs are a little different, but it’s still illustrative. Here are the specs for the ADT:

/** MyString represents an immutable sequence of characters. */
class MyString { 

    //////////////////// Example of a creator operation ///////////////
    /**
     * @param s 
     * @returns MyString representing the sequence of characters in s
     */
    public constructor(s:string) { ... }

    //////////////////// Examples of observer operations ///////////////
    /**
     * @returns number of characters in this string
     */
    public length(): number { ... }

    /**
     * @param i character position (requires 0 <= i < string length)
     * @returns character at position i
     */
    public charAt(i: number): string { ... }

    //////////////////// Example of a producer operations ///////////////    
    /** 
     * Get the substring between start (inclusive) and end (exclusive).
     * @param start starting index
     * @param end ending index.  Requires 0 <= start <= end <= string length.
     * @returns string consisting of charAt(start)...charAt(end-1)
     */
    public substring(start: number, end: number): MyString { ... }

    /////// no mutator operations (why not?)
}

These public operations and their specifications are the only information that a client of this data type is allowed to know. But implementing the data type requires a representation. For now, let’s look at a simple representation for MyString: just an array of characters. Each character is represented by a 16-bit number, its Unicode value, which we can get from a primitive string using String.charCodeAt() and String.fromCharCode(). The appropriate type for an array of characters in TypeScript is Uint16Array, which means “unsigned 16-bit integer array.” This type largely behaves like an Array, except that is limited to storing 16-bit unsigned integers, and it cannot grow or shrink. In our MyString representation, the array will have exactly the length of the string, with no extra room at the end.

Here’s how that internal representation might be declared, as an instance variable within the class:

private a: Uint16Array;

With that choice of representation, the operations would be implemented in a straightforward way:

public constructor(s: string) {
  this.a = new Uint16Array(s.length);
  for (let i = 0; i < s.length; ++i) {
    this.a[i] = s.charCodeAt(i);
  }
}

public length(): number {
  return this.a.length;
}

public charAt(i: number): string {
  return String.fromCharCode(this.a[i]);
}

public substring(start: number, end: number): MyString {
  const that = new MyString("");      // make a temporarily-empty MyString object
  that.a = this.a.slice(start, end);  // ... and immediately replace its array
  return that;
}

Question to ponder: Why don’t charAt and substring have to check whether their parameters are within the valid range? What do you think will happen if the client calls these implementations with illegal inputs?

Here’s a snapshot diagram showing what this representation looks like for a couple of typical client operations:

const s = new MyString("true");
const t = s.substring(1,3);

One problem with this implementation is that it’s passing up an opportunity for performance improvement. Because this data type is immutable, the substring operation doesn’t really have to copy characters out into a fresh array. It could just point to the original MyString object’s character array and keep track of the start and end that the new substring object represents. (In some versions of Java, the built-in String implementation does exactly this.)

To implement this optimization, we could change the internal representation of this class to:

private a: Uint16Array;
private start: number;
private end: number;

With this new representation, the operations are now implemented like this:

public constructor(s: string) {
  this.a = new Uint16Array(s.length);
  for (let i = 0; i < s.length; ++i) {
    this.a[i] = s.charCodeAt(i);
  }
  this.start = 0;
  this.end = this.a.length;
  return s;
}

public length(): number {
  return this.end - this.start;
}

public charAt(i: number): string {
  return String.fromCharCode(this.a[this.start + i]);
}

public substring(start: number, end: number): MyString {
  const that = new MyString(""); // make a temporarily-empty MyString object
  that.a = this.a;               // ... and immediately replace its instance variables 
  that.start = this.start + start;
  that.end = this.start + end;
  return that;
}

Now the same client code produces a very different internal structure:

const s = new MyString("true");
const t = s.substring(1,3);

Because MyString’s existing clients depend only on the specs of its public methods, not on its private fields, we can make this change without having to inspect and change all that client code. That’s the power of representation independence.

reading exercises

Representation 1

Consider the following abstract data type.

/**
 * Represents a family that lives in a household together.
 * A family always has at least one person in it.
 * Families are mutable.
 */
class Family {
    // the people in the family, sorted from oldest to youngest, with no duplicates.
    public people: Array<Person>;

    /**
     * @returns an array containing all the members of the family, with no duplicates.
     */
    public getMembers(): Array<Person> {
        return this.people;
    }
}

Here is a client of this abstract data type:

function client1(f: Family): void {
    // get youngest person in the family
    const baby = f.people[f.people.length-1];
    ...
}

Assume all this code works correctly (both Family and client1) and passes all its tests.

Now Family’s representation is changed from a Array to Set, as shown:

/**
 * Represents a family that lives in a household together.
 * A family always has at least one person in it.
 * Families are mutable.
 */
class Family {
    // the people in the family, sorted from oldest to youngest, with no duplicates.
    public people: Set<Person>;

    /**
     * @returns an array containing all the members of the family, with no duplicates.
     */
    public getMembers(): Array<Person> {
        return Array.from(this.people);
    }
}

Assume that Family compiles correctly after the change.

(missing explanation)

Representation 2

Original version:

/**
 * Represents a family that lives in a 
 * household together. A family always 
 * has at least one person in it.
 * Families are mutable.
 */
class Family {
    // the people in the family
    public people: Array<Person>;

    /**
     * @returns an array containing all 
     *          the members of the family, 
     *          with no duplicates.
     */
    public getMembers(): Array<Person> {
        return this.people;
    }
}

Changed version:

/**
 * Represents a family that lives in a 
 * household together. A family always 
 * has at least one person in it.
 * Families are mutable.
 */
class Family {
    // the nuclear family (just parents and their children)
    public people: Array<Person>;

    // the extended family (grandparents, aunts/uncles, cousins, etc.)
    public others: Array<Person>;

    /**
     * @returns an array containing all 
     *          the members of the family, 
     *          with no duplicates.
     */
    public getMembers(): Array<Person> {
        return this.people.concat(this.others);
    }
}

Now consider client2:

function client2(f: Family): void {
    // get size of the family
    const familySize = f.people.length;
    ...
}

(missing explanation)

Representation 3

Original version:

/**
 * Represents a family that lives in a 
 * household together. A family always 
 * has at least one person in it.
 * Families are mutable.
 */
class Family {
    // the people in the family
    public people: Array<Person>;

    /**
     * @returns an array containing all 
     *          the members of the family, 
     *          with no duplicates.
     */
    public getMembers(): Array<Person> {
        return this.people;
    }
}

Changed version:

/**
 * Represents a family that lives in a 
 * household together. A family always 
 * has at least one person in it.
 * Families are mutable.
 */
class Family {
    // the people in the family
    public people: Set<Person>;

    /**
     * @returns an array containing all 
     *          the members of the family, 
     *          with no duplicates.
     */
    public getMembers(): Array<Person> {
        return Array.from(this.people);
    }
}

Now consider client3:

function client3(f: Family): void {
    // get any person in the family
    const anybody: Person = f.getMembers()[0];
    ...
}

(missing explanation)

Representation 4

For each section of the Family data type’s code shown below, is it part of the ADT’s specification, its representation, or its implementation?

1

/**
 * Represents a family that lives in a household together.
 * A family always has at least one person in it.
 * Families are mutable.
 */

2

class Family {

3

    // the people in the family, sorted from oldest to youngest, with no duplicates.

4

    private people: Array<Person>;

5

    /**
     * @returns an array containing all the members of the family, with no duplicates.
     */

6

    public getMembers(): Array<Person> {

7

        return people;
    }
}

(missing explanation)

Realizing ADT concepts in TypeScript

Let’s summarize some of the general ideas we’ve discussed in this reading, which are applicable in general to programming in any language, and their specific realization using TypeScript language features. The point is that there are several ways to do it, and it’s important to both understand the big idea, like a creator operation, and different ways to achieve that idea in practice. We’ll also include three items that haven’t yet been discussed in this reading, with notes about them below:

ADT concept

Ways to do it in TypeScript

Examples

Abstract data type

Class

Date

Interface + class(es) 1

ArrayLike and Array

Enum 2

PenColor

Creator operation

Constructor

Array()

Static (factory) method

Array.of()

Constant 3

Number.POSITIVE_INFINITY

Observer operation

Instance method

String.charAt()

Static method

Object.entries()

Producer operation

Instance method

String.trim()

Static method

Math.floor()

Mutator operation

Instance method

Array.push()

Static method

Object.assign()

Representation

private fields

Notes:

  1. Defining an abstract data type using an interface + class(es). We’ll discuss interfaces in a future reading.
  2. Defining an abstract data type using an enumeration (enum). Enums are ideal for ADTs that have a small fixed set of values, like the days of the week. We’ll discuss enumerations in a future reading.
  3. Using a constant object as a creator operation. This pattern is commonly seen in immutable types, where the simplest or emptiest value of the type is simply a public constant, and producers are used to build up more complex values from it.

Testing an abstract data type

We build a test suite for an abstract data type by creating tests for each of its operations. These tests inevitably interact with each other. The only way to test creators, producers, and mutators is by calling observers on the objects that result, and likewise, the only way to test observers is by creating objects for them to observe.

Here’s how we might partition the input spaces of the four operations in our MyString type:

// testing strategy for each operation of MyString:
//
// constructor:
//    partiton on string length: 0, 1, >1
// length(): 
//    partition on string length: 0, 1, >1
//    partition on this: produced by constructor, produced by substring()
// charAt(): 
//    partition on string length: 0, 1, >1
//    partition on i=0, 0<i<len-1, i=len-1
//    partition on this: produced by constructor, produced by substring()
// substring():
//    partition on string length: 0, 1, >1
//    partition on start=0, 0<start<len, start=len
//    partition on end=0, 0<end<len, end=len
//    partition on end-start: 0, >0
//    partition on this: produced by constructor, produced by substring()

Since several operations share the same partitions, we can also write this more DRYly:

// testing strategy for each operation of MyString:
//
// constructor, length(), charAt(), substring():
//    partiton on string length: 0, 1, >1
// length(), charAt(), substring():
//    partition on this: produced by constructor, produced by substring()
// charAt(): 
//    partition on i=0, 0<i<len-1, i=len-1
// substring():
//    partition on start=0, 0<start<len, start=len
//    partition on end=0, 0<end<len, end=len
//    partition on end-start: 0, >0

Now we want test cases that cover these partitions. Note that writing test cases that use assert.strictEqual or assert.deepStrictEqual directly on MyString objects wouldn’t be appropriate, because we haven’t defined the notion of equality for MyString. We’ll talk about how to implement equality carefully in a later reading. For now, the only operations we can perform with MyStrings are the ones we’ve defined above: the constructor, length, charAt, and substring.

Given that constraint, a compact test suite that covers all these partitions might look like:

it('should work for n-character string', () => {
  const s = new MyString('yes')
  assert.strictEqual(s.length(), 3);
  assert.strictEqual(s.charAt(0), 'y');
  assert.strictEqual(s.charAt(1), 'e');
  assert.strictEqual(s.charAt(2), 's');
});

it('should work for 1-character string', () => {
  const s = new MyString('L');
  assert.strictEqual(s.length(), 1);
  assert.strictEqual(s.charAt(0), 'L');
});

it('should work for empty string', () => {
  const s = new MyString('');
  assert.strictEqual(s.length(), 0);
  const t = s.substring(0,0);
  assert.strictEqual(t.length(), 0);
});

it('should work for substring() of middle of string', () => {
  const s = new MyString('hello').substring(1, 2);
  assert.strictEqual(s.length(), 1);
  assert.strictEqual(s.charAt(0), 'e');
});

it('should work for substring() of whole string', () => {
  const s = new MyString('bye').substring(0, 3);
  assert.strictEqual(s.length(), 3);
  assert.strictEqual(s.charAt(0), 'b');
  assert.strictEqual(s.charAt(1), 'y');
  assert.strictEqual(s.charAt(2), 'e');
});

it('should work for substring() of empty string returned by substring()', () => {
  const s = new MyString('again').substring(1, 1).substring(0, 0);
  assert.strictEqual(s.length(), 0);
});

Try to match each test case to the subdomains it covers.

reading exercises

Partition covering

Which test cases cover the part “charAt() with string length = 1”?

(missing explanation)

Which test cases cover the part “substring() of string produced by substring()“?

(missing explanation)

Which test cases cover the part “constructor returns string length > 1”?

(missing explanation)

Unit testing an ADT

What “units” are being unit-tested by ‘should work for n-character string’?

(missing explanation)

Summary

  • Abstract data types are characterized by their operations.
  • Operations can be classified into creators, producers, observers, and mutators.
  • An ADT’s specification is its set of operations and their specs.
  • A good ADT is simple, coherent, adequate, and representation independent.
  • An ADT is tested by generating tests for each of its operations, but using the creators, producers, mutators, and observers together in the same tests.

These ideas connect to our three key properties of good software as follows:

  • Safe from bugs. A good ADT offers a well-defined contract for a data type, so that clients know what to expect from the data type, and implementers have well-defined freedom to vary.

  • Easy to understand. A good ADT hides its implementation behind a set of simple operations, so that programmers using the ADT only need to understand the operations, not the details of the implementation.

  • Ready for change. Representation independence allows the implementation of an abstract data type to change without requiring changes from its clients.