Reading 11: Abstraction Functions & Rep Invariants

Software in 6.031

Safe from bugs	Easy to understand	Ready for change
Correct today and correct in the unknown future.	Communicating clearly with future programmers, including future you.	Designed to accommodate change without rewriting.

Objectives

Today’s reading introduces several ideas:

invariants
representation exposure
abstraction functions
representation invariants

In this reading, we study a more formal mathematical idea of what it means for a class to implement an ADT, via the notions of abstraction functions and rep invariants. These mathematical notions are eminently practical in software design. The abstraction function will give us a way to cleanly define the equality operation on an abstract data type (which we’ll discuss in more depth in a future class). The rep invariant will make it easier to catch bugs caused by a corrupted data structure.

Invariants

Resuming our discussion of what makes a good abstract data type, the final, and perhaps most important, property of a good abstract data type is that it preserves its own invariants. An invariant is a property of a program that is always true, for every possible runtime state of the program. Immutability is one crucial invariant that we’ve already encountered: once created, an immutable object should always represent the same value, for its entire lifetime. But there are many other kinds of invariants, including types of variables (int i means that i is always an integer), or relationships between variables (e.g. when i is used to loop over a list, then 0 <= i < list.size() is an invariant within the body of the loop).

Saying that the ADT preserves its own invariants means that the ADT is responsible for ensuring that its own invariants hold. This is done by hiding or protecting the variables involved in the invariants (e.g., using language features like private), and allowing access only through operations with well-defined contracts.

When an ADT preserves its own invariants, reasoning about the code becomes much easier. If you can count on the fact that Strings are immutable, for example, then you can rule out that possibility when you’re debugging code that uses Strings – or when you’re trying to establish an invariant for another ADT that uses Strings. Contrast that with a mutable string type, which can be mutated by any code that has access to it. To reason about a bug or invariant involving a mutable string, you’d have to check all the places in the code where the string might be accessible.

reading exercises

Statically-checked invariants

We’re already accustomed to declaring some useful invariants. Consider this line of code:

List<String> names = List.of("Huey", "Dewey", "Louie");

Which of the following are invariants expressed by this code?

names refers to an object of type List

every element of names is the name of a real person

every element of names is an object of type String

the first element of names is the string "Huey"

(missing explanation)

Immutability

Later in this reading, we’ll see many interesting invariants. Let’s focus on immutability for now. Here’s a specific example:

/**
 * This immutable data type represents a tweet from Twitter.
 */
public class Tweet {

    public String author;
    public String text;
    public Date timestamp;

    /**
     * Make a Tweet.
     * @param author    Twitter user who wrote the tweet
     * @param text      text of the tweet
     * @param timestamp date/time when the tweet was sent
     */
    public Tweet(String author, String text, Date timestamp) { 
        this.author = author;
        this.text = text;
        this.timestamp = timestamp;
    }
}

This example uses the java.util.Date class. As we’ll see, it is not a good choice, and the Java API docs mark most of Date as deprecated: new code shouldn’t use Date at all.

At the end of the section, we suggest at least one better alternative, but for now, we want an instructive example.

How do we guarantee that these Tweet objects are immutable – that, once a tweet is created, its author, message, and date can never be changed?

The first threat to immutability comes from the fact that clients can — in fact must — directly access its fields. So nothing’s stopping us from writing code like this:

Tweet t = new Tweet("justinbieber", 
                    "Thanks to all those beliebers out there inspiring me every day", 
                    new Date());
t.author = "rbmllr";

This is a trivial example of representation exposure, meaning that code outside the class can modify the representation directly. Rep exposure like this threatens not only invariants, but also representation independence. We can’t change the implementation of Tweet without affecting all the clients who are directly accessing those fields.

Fortunately, Java gives us language mechanisms to deal with this kind of rep exposure:

public class Tweet {

    private final String author;
    private final String text;
    private final Date timestamp;

    public Tweet(String author, String text, Date timestamp) {
        this.author = author;
        this.text = text;
        this.timestamp = timestamp;
    }

    /**
     * @return Twitter user who wrote the tweet
     */
    public String getAuthor() {
        return author;
    }

    /**
     * @return text of the tweet
     */
    public String getText() {
        return text;
    }

    /**
     * @return date/time when the tweet was sent
     */
    public Date getTimestamp() {
        return timestamp;
    }

}

The private and public keywords indicate which fields and methods are accessible only within the class and which can be accessed from outside the class. The final keyword also helps by guaranteeing that the fields of this immutable type won’t be reassigned after the object is constructed.

But that’s not the end of the story: the rep is still exposed! Consider this perfectly reasonable client code that uses Tweet:

/**
 * @return a tweet that retweets t, one hour later
 */
public static Tweet retweetLater(Tweet t) {
    Date d = t.getTimestamp();
    d.setHours(d.getHours()+1);
    return new Tweet("rbmllr", t.getText(), d);
}

retweetLater takes a tweet and should return another tweet with the same message (called a retweet) but sent an hour later. The retweetLater method might be part of a system that automatically echoes funny things that Twitter celebrities say.

What’s the problem here? The getTimestamp call returns a reference to the same Date object referenced by tweet t. t.timestamp and d are aliases to the same mutable object. So when that date object is mutated by d.setHours(), this affects the date in t as well, as shown in the snapshot diagram.

Tweet’s immutability invariant has been broken. The problem is that Tweet leaked out a reference to a mutable object that its immutability depended on. We exposed the rep, in such a way that Tweet can no longer guarantee that its objects are immutable. Perfectly reasonable client code created a subtle bug.

We can patch this kind of rep exposure by using defensive copying: making a copy of a mutable object to avoid leaking out references to the rep. Here’s the code:

public Date getTimestamp() {
    return new Date(timestamp.getTime());
}

Mutable types often have a copy constructor that allows you to make a new instance that duplicates the value of an existing instance. In this case, Date’s copy constructor uses the timestamp value, measured in milliseconds since January 1, 1970. As another example, StringBuilder’s copy constructor takes a String. Another way to copy a mutable object is clone(), which is supported by some types but not all. There are unfortunate problems with the way clone() works in Java, and you generally shouldn’t use it. For more, see Josh Bloch, Effective Java, item 11, or Copy Constructor vs. Cloning.

So we’ve done some defensive copying in the return value of getTimestamp. But we’re not done yet! There’s still rep exposure. Consider this (again perfectly reasonable) client code:

/**
 * @return a list of 24 inspiring tweets, one per hour today
 */
public static List<Tweet> tweetEveryHourToday () {
    List<Tweet> list = new ArrayList<Tweet>(); 
    Date date = new Date();
    for (int i = 0; i < 24; i++) {
        date.setHours(i);
        list.add(new Tweet("rbmllr", "keep it up! you can do it", date));
    } 
    return list;
}

This code intends to advance a single Date object through the 24 hours of a day, creating a tweet for every hour. But notice that the constructor of Tweet saves the reference that was passed in, so all 24 Tweet objects end up with the same time, as shown in this snapshot diagram.

Again, the immutability of Tweet has been violated. We can fix this problem, too, by using judicious defensive copying, this time in the constructor:

public Tweet(String author, String text, Date timestamp) {
    this.author = author;
    this.text = text;
    this.timestamp = new Date(timestamp.getTime());
}

In general, you should carefully inspect the argument types and return types of all your ADT operations. If any of the types are mutable, make sure your implementation doesn’t store those arguments directly in the rep, or return direct references to the rep, because that causes rep exposure.

You may object that this seems wasteful. Why make all these copies of dates? Why can’t we just solve this problem by a carefully written specification, like this?

/**
 * Make a Tweet.
 * @param author    Twitter user who wrote the tweet
 * @param text      text of the tweet
 * @param timestamp date/time when the tweet was sent. Caller must never 
 *                   mutate this Date object again!
 */
public Tweet(String author, String text, Date timestamp) {

This approach is sometimes taken when there isn’t any other reasonable alternative – for example, when the mutable object is too large to copy efficiently. But the cost in your ability to reason about the program, and your ability to avoid bugs, is enormous. In the absence of compelling arguments to the contrary, it’s almost always worth it for an abstract data type to guarantee its own invariants, and preventing rep exposure is essential to that.

An even better solution is to prefer immutable types. If – as recommended in Mutability & Immutability’s Groundhog Day example – we had used an immutable date object, like java.time.ZonedDateTime, instead of the mutable java.util.Date, then we would have ended this section after talking about public and private. No further rep exposure would have been possible.

Immutable wrappers around mutable data types

The Java collections classes offer an interesting compromise: immutable wrappers.

Collections.unmodifiableList() takes a (mutable) List and wraps it with an object that looks like a List, but whose mutators are disabled – set(), add(), remove(), etc. throw exceptions. So you can construct a list using mutators, then seal it up in an unmodifiable wrapper (and throw away your reference to the original mutable list, as discussed in Mutability & Immutability), and get an immutable list.

The downside here is that you get immutability at runtime, but not at compile time. Java won’t warn you at compile time if you try to sort() this unmodifiable list. You’ll just get an exception at runtime. And there will be no error at all if the original mutable list is modified, changing the value of the “unmodifiable” list. But the runtime exception for attempting to mutate the wrapper is still better than nothing, so using unmodifiable lists, maps, and sets can be a very good way to reduce the risk of bugs.

Because the Java library uses these wrappers and doesn’t provide separate types for immutable collections, other creator operations that return unmodifiable lists, like Collections.emptyList() or List.of(..), have the same downside: immutability at runtime only, with no compile-time checking.

reading exercises

Rep exposure

Consider the following problematic datatype:

      /** Represents an immutable right triangle. */
      class RightTriangle {
/*A*/     private double[] sides;

/*B*/     public final double hypotenuse;

          /** 
           * Make a right triangle.
           * @param legA, legB  the two legs of the triangle
           * @param hypotenuse  the hypotenuse of the triangle,
 *C*       *           requires hypotenuse^2 = legA^2 + legB^2
           *           (within the error tolerance of double arithmetic)
           */
          public RightTriangle(double legA, double legB, double hypotenuse) {
/*D*/         this.sides = new double[] { legA, legB };
/*D*/         this.hypotenuse = hypotenuse;
          }

          /** 
           * Get the two sides of the right triangle.
           * @return two-element array with the triangle's side lengths
           */
          public double[] getAllSides() {
/*E*/         return sides;
          }

          /**
           * @param factor to multiply the sides by
           * @return a triangle made from this triangle by 
           * multiplying all side lengths by factor.
           */
          public RightTriangle scale(double factor) {
              return new RightTriangle(sides[0]*factor, sides[1]*factor, hypotenuse*factor);
          }

          /**
           * @return a regular triangle made from this triangle.
           * A regular right triangle is one in which
           * both legs have the same length.
           */
          public RightTriangle regularize() {
              double bigLeg = Math.max(side[0], side[1]);
              return new RightTriangle(bigLeg, bigLeg, hypotenuse);
          }

      }

Which of the following statements are true?

The line marked /*A*/ is a problem for rep exposure because arrays are mutable.

(missing explanation)

The line marked /*B*/ is a problem for representation independence because it makes clients dependent on the rep.

(missing explanation)

The line marked *C* is a problem because creator operations should not have preconditions.

(missing explanation)

The two lines marked /*D*/ are a problem because they put legA, legB, and hypotenuse into the rep without doing a defensive copy first.

(missing explanation)

The line marked /*E*/ is a problem because it threatens the class’s immutability.

(missing explanation)

Sign here

Suppose you want to build a version of Java where users can choose to only use classes that have been cryptographically signed by identities they trust. You want each class to manage its own list of signatures, so you provide the operation:

public Identity[] getSigners()

and implement it by simply returning a reference to your private Identity[] field of identities whose signatures you’ve all checked and validated.

Which bugs do you have?

the client can append more identities to the array

the client can replace identities in the array with different ones

the client can reassign that field to a different array

the client can break a security-related invariant

(missing explanation)

Rep invariant and abstraction function

We now take a deeper look at the theory underlying abstract data types. This theory is not only elegant and interesting in its own right; it also has immediate practical application to the design and implementation of abstract types. If you understand the theory deeply, you’ll be able to build better abstract types, and will be less likely to fall into subtle traps.

In thinking about an abstract type, it helps to consider the relationship between two spaces of values.

The space of abstract values consists of the values that the type is designed to support, from the client’s point of view. For example, an abstract type for unbounded integers, like Java’s BigInteger, would have the mathematical integers as its abstract value space.

The space of representation values (or rep values for short) consists of the Java objects that actually implement the abstract values. For example, a BigInteger value might be implemented using an array of digits, represented as primitive int values. The rep space would then be the set of all such arrays.

In simple cases, an abstract type will be implemented as a single Java object, but more commonly a small network of objects is needed. For example, a rep value for List might be a linked list, a group of objects linked together by next and previous pointers. So a rep value is not necessarily a single object, but often something rather complicated.

Now of course the implementer of the abstract type must be interested in the representation values, since it is the implementer’s job to achieve the illusion of the abstract value space using the rep value space.

Suppose, for example, that we choose to use a string to represent a set of characters:

public class CharSet {
    private String s;
    ...
}

the abstract space and rep space of CharSet

Then the rep space R contains Strings, and the abstract space A is mathematical sets of characters. We can show the two value spaces graphically, with an arrow from a rep value to the abstract value it represents. There are several things to note about this picture:

Every abstract value is mapped to by some rep value. The purpose of implementing the abstract type is to support operations on abstract values. Presumably, then, we will need to be able to create and manipulate all possible abstract values, and they must therefore be representable.
Some abstract values are mapped to by more than one rep value. This happens because the representation isn’t a tight encoding. There’s more than one way to represent an unordered set of characters as a string.
Not all rep values are mapped. In this case, the string “abbc” is not mapped, because we have decided that the rep string should not contain duplicates. This will allow us to terminate the remove method when we hit the first instance of a particular character, since we know there can be at most one.

In practice, we can only illustrate a few elements of the two spaces and their relationships; the graph as a whole is infinite. So we describe it by giving two things:

1. An abstraction function that maps rep values to the abstract values they represent:

AF : R → A

The arrows in the diagram show the abstraction function. In the terminology of functions, the properties we discussed above can be expressed by saying that the function is surjective (also called onto), not necessarily injective (one-to-one) and therefore not necessarily bijective, and often partial.

2. A rep invariant that maps rep values to booleans:

RI : R → boolean

For a rep value r, RI(r) is true if and only if r is mapped by AF. In other words, RI tells us whether a given rep value is well-formed. Alternatively, you can think of RI as a set: it’s the subset of rep values on which AF is defined.

the abstract space and rep space of CharSet using the NoRepeatsRep

For example, the diagram at the right showing a rep for CharSet that forbids repeated characters, RI(“a”) = true, RI(“ac”) = true, and RI(“acb”) = true, but RI(“aa”) = false and RI(“abbc”) = false. Rep values that obey the rep invariant are shown in the green part of the R space, and must map to an abstract value in the A space. Rep values that violate the rep invariant are shown in the red zone, and have no equivalent abstract value in the A space.

Both the rep invariant and the abstraction function should be documented in the code, right next to the declaration of the rep itself:

public class CharSet {
    private String s;
    // Rep invariant:
    //   s contains no repeated characters
    // Abstraction function:
    //   AF(s) = {s[i] | 0 <= i < s.length()}
    ...
}

A common confusion about abstraction functions and rep invariants is that they are determined by the choice of rep and abstract value spaces, or even by the abstract value space alone. If this were the case, they would be of little use, since they would be saying something redundant that’s already available elsewhere.

The abstract value space alone doesn’t determine AF or RI: there can be several representations for the same abstract type. A set of characters could equally be represented as a string, as above, or as a bit vector, with one bit for each possible character. Clearly we need two different abstraction functions to map these two different rep value spaces.

It’s less obvious why the choice of both spaces doesn’t determine AF and RI. The key point is that defining a type for the rep, and thus choosing the values for the space of rep values, does not determine which of the rep values will be deemed to be legal, and of those that are legal, how they will be interpreted. Rather than deciding, as we did above, that the strings have no duplicates, we could instead allow duplicates, but at the same time require that the characters be sorted, appearing in nondecreasing order. This would allow us to perform a binary search on the string and thus check membership in logarithmic rather than linear time. Same rep value space – different rep invariant:

the abstract space and rep space of CharSet using the SortedRep

public class CharSet {
    private String s;
    // Rep invariant:
    //   s[0] <= s[1] <= ... <= s[s.length()-1]
    // Abstraction function:
    //   AF(s) = {s[i] | 0 <= i < s.length()}
    ...
}

Even with the same type for the rep value space and the same rep invariant, we might still interpret the rep differently by using different abstraction functions. Suppose the rep invariant admits any string of characters. Then we could define the abstraction function, as above, to interpret the array’s elements as the elements of the set. But there’s no a priori reason to let the rep decide the interpretation. Perhaps we’ll interpret consecutive pairs of characters as subranges, so that the string rep "acgg" is interpreted as two range pairs, [a-c] and [g-g], and therefore represents the set {a,b,c,g}. Here’s what the AF and RI would look like for that representation:

the abstract space and rep space of CharSet using the SortedRangeRep

public class CharSet {
    private String s;
    // Rep invariant:
    //   s.length() is even
    //   s[0] <= s[1] <= ... <= s[s.length()-1]
    // Abstraction function:
    //   AF(s) = union of { c | s[2i] <= c <= s[2i+1] } 
    //           for all 0 <= i < s.length()/2
    ...
}

The essential point is that implementing an abstract type means not only choosing the two spaces – the abstract value space for the specification and the rep value space for the implementation – but also deciding which rep values are legal, and how to interpret them as abstract values.

It’s critically important to write down these assumptions in your code, as we’ve done above, so that future programmers (and your future self) are aware of what the representation actually means. Why? What happens if different implementers disagree about the meaning of the rep?

You can look at example code for three different CharSet implementations.

reading exercises

Exploring a rep

Consider the last rep for CharSet proposed above:

public class CharSet {
    private String s;
    // Rep invariant:
    //   s.length() is even
    //   s[0] <= s[1] <= ... <= s[s.length()-1]
    // Abstraction function:
    //   AF(s) = union of { c | s[2i] <= c <= s[2i+1] } 
    //           for all 0 <= i < s.length()/2
    ...
}

Which of the following values of s satisfy this rep invariant?

(missing explanation)

Which of the following does AF("acfg") map to?

{a,b,c,d,e,f,g}

{a,b,c,f,g}

{a,c,f,g}

some other abstract value

no abstract value, because "acfg" does not satisfy the rep invariant

(missing explanation)

Which of these values does the abstraction function map to the same abstract value as it maps "tv"?

"ttv"

"ttuuvv"

"ttuv"

"tuv"

(missing explanation)

Rep invariant pieces

Suppose C is an abstract data type whose representation has two String fields:

class C {
    private String s;
    private String t;
    ...
}

Assuming you don’t know anything about C’s abstraction, which of the following might be statements in a rep invariant for C?

s contains only letters

s.length() == t.length()

s represents a set of characters

C’s observers

s is the reverse of t

s+t

(missing explanation)

Trying to implement without an AF/RI

Suppose Louis Reasoner has created CharSet with the following rep:

public class CharSet {
    private String s;
    ...
}

But Louis unfortunately neglects to write down the abstraction function (AF) and rep invariant (RI). Here are four possible AF/RI pairs that might have been what Louis had in mind. All of them were also mentioned in the reading above.

SortedRep:

// AF(s) = {s[i] | 0 <= i < s.length()}
// RI: s[0] < s[1] < ... < s[s.length()-1]

SortedRangeRep:

// AF(s) = union of { c | s[2i] <= c <= s[2i+1] } 
//         for all 0 <= i < s.length()/2
// RI: s.length() is even, and s[0] <= s[1] <= ... <= s[s.length()-1]

NoRepeatsRep:

// AF(s) = {s[i] | 0 <= i < s.length()}
// RI: s contains no character more than once

AnyRep:

// AF(s) = {s[i] | 0 <= i < s.length()}
// RI: true

Louis has three teammates helping him implement CharSet, each working on a different operation: add(), remove(), and contains(). Each teammate might have their own idea about what the AF/RI should be.

Which of the AF/RI pairs are consistent with this implementation of add()?

/**
 * Modifies this set by adding c to the set.
 * @param c character to add
 */
public void add(char c) {
    s = s + c;
}

SortedRep

SortedRangeRep

NoRepeatsRep

AnyRep

(missing explanation)

Trying to implement without an AF/RI #2

Which of the AF/RI pairs are consistent with this implementation of remove()?

/**
 * Modifies this set by removing c, if found.
 * If c is not found in the set, has no effect.
 * @param c character to remove
 */
public void remove(char c) {
    int position = s.indexOf(c);
    if (position >= 0) {
        s = s.substring(0, position) + s.substring(position+1, s.length());
    }
}

SortedRep

SortedRangeRep

NoRepeatsRep

AnyRep

(missing explanation)

Trying to implement without an AF/RI #3

Which of the AF/RI pairs are consistent with this implementation of contains()?

/**
 * Test for membership.
 * @param c a character
 * @return true iff this set contains c
 */
public boolean contains(char c) {
    for (int i = 0; i < s.length(); i += 2) {
        char low = s.charAt(i);
        char high = s.charAt(i+1);
        if (low <= c && c <= high) {
            return true;
        }
    }
    return false;
}

SortedRep

SortedRangeRep

NoRepeatsRep

AnyRep

(missing explanation)

the abstraction function and rep invariant of RatNum

Example: rational numbers

Here’s an example of an abstract data type for rational numbers. Look closely at its rep invariant and abstraction function. It would be completely reasonable to design another implementation of this same ADT with a more permissive RI. With such a change, some operations might become more expensive to perform, and others cheaper.

public class RatNum {

    private final int numerator;
    private final int denominator;

    // Rep invariant:
    //   denominator > 0
    //   numerator/denominator is in reduced form,
    //       i.e. gcd(|numerator|,denominator) = 1

    // Abstraction function:
    //   AF(numerator, denominator) = numerator/denominator

    /** 
     * Make a new RatNum == n.
     * @param n value
     */
    public RatNum(int n) {
        numerator = n;
        denominator = 1;
        checkRep();
    }

    /**
     * Make a new RatNum == (n / d).
     * @param n numerator
     * @param d denominator
     * @throws ArithmeticException if d == 0
     */
    public RatNum(int n, int d) {
        // reduce ratio to lowest terms
        int g = gcd(n, d);
        n = n / g;
        d = d / g;

        // make denominator positive
        if (d < 0) {
            numerator = -n;
            denominator = -d;
        } else {
            numerator = n;
            denominator = d;
        }
        checkRep();
    }
}

reading exercises

Checking the rep invariant

The rep invariant isn’t just a neat mathematical idea. If your implementation asserts the rep invariant at run time, then you can catch bugs early. Here’s a method for RatNum that tests its rep invariant:

// Check that the rep invariant is true
// *** Warning: this does nothing unless you turn on assertion checking
//     by running Java with -enableassertions (or -ea)
private void checkRep() {
    assert denominator > 0;
    assert gcd(Math.abs(numerator), denominator) == 1;
}

You should certainly call checkRep() to assert the rep invariant at the end of every operation that creates or mutates the rep – in other words, creators, producers, and mutators. Look back at the RatNum code above, and you’ll see that it calls checkRep() at the end of both constructors.

Observer methods don’t normally need to call checkRep(), but it’s good defensive practice to do so anyway. Why? Calling checkRep() in every method, including observers, means you’ll be more likely to catch rep invariant violations caused by rep exposure.

Why is checkRep private? Who should be responsible for checking and enforcing a rep invariant – clients, or the implementation itself?

reading exercises

Checking the rep invariant

Which of the following are true?

checkRep() is the abstraction function

checkRep() asserts the rep invariant

it’s good for an implementer to call checkRep() just before returning from a public method of an ADT class

it’s good for a client to call checkRep() just after calling a public method of an ADT class

(missing explanation)

No null values in the rep

Recall from the Specifications reading that null values are troublesome and unsafe, so much so that we try to remove them from our programming entirely. Preconditions and postconditions implicitly require that objects and arrays be non-null.

We extend that prohibition to the reps of abstract data types. The rep invariant implicitly includes x != null for every reference x in the rep that has object type (including references inside arrays or lists). So if your rep is:

class CharSet {
    String s;
}

then its rep invariant automatically includes s != null, and you don’t need to state it in a rep invariant comment. But checkRep() needs to include those null checks, because they are part of the rep invariant, and you want to catch null bugs as early as possible.

If some field in the rep is allowed to be null, then the rep invariant should say so explicitly. But think twice before designing a rep like this. Optional<T> may be a better way to do what you want.

reading exercises

Null check?

Consider this rep for some ADT named Thing:

class Thing {
    int i;
    String s;
    ...
}

The rep is accompanied by one of the rep invariants below. For each proposed rep invariant, translate it into a minimum set of assertions that you would need to place into checkRep() to check that rep invariant.

// Rep invariant:
//    i is positive

private void checkRep() {

assert i > 0;

assert i == s.length();

assert s != null;

assert i != null;

none of the above (checkRep() can be empty or omitted entirely)

(missing explanation)

// Rep invariant:
//    the length of s is even

private void checkRep() {

assert s.length() % 2 == 0;

assert !s.equals("");

assert s != null;

assert i != null;

none of the above (checkRep() can be empty or omitted entirely)

(missing explanation)

// Rep invariant:
//    true

private void checkRep() {

assert s == true;

assert i == true;

assert s != null;

assert i != null;

none of the above (checkRep() can be empty or omitted entirely)

(missing explanation)

Beneficent mutation

Recall that a type is immutable if and only if a value of the type never changes after being created. With our new understanding of the abstract space A and rep space R, we can refine this definition: the abstract value should never change. But the implementation is free to mutate a rep value as long as it continues to map to the same abstract value, so that the change is invisible to the client. This kind of change is called beneficent mutation.

Here’s a simple example using an alternative rep for the RatNum type we saw earlier. This rep has a weaker rep invariant that doesn’t require the numerator and denominator to be stored in lowest terms:

public class RatNum {

    private int numerator;
    private int denominator;

    // Rep invariant:
    //   denominator != 0

    // Abstraction function:
    //   AF(numerator, denominator) = numerator/denominator

    /**
     * Make a new RatNum == (n / d).
     * @param n numerator
     * @param d denominator
     * @throws ArithmeticException if d == 0
     */
    public RatNum(int n, int d) {
        if (d == 0) throw new ArithmeticException();
        numerator = n;
        denominator = d;
        checkRep();
    }

    ...
}

This weaker rep invariant allows a sequence of RatNum arithmetic operations to simply omit reducing the result to lowest terms. But when it’s time to display a result to a human, we first simplify it:

    /**
     * @return a string representation of this rational number
     */
    @Override
    public String toString() {
        int g = gcd(numerator, denominator);
        numerator /= g;
        denominator /= g;
        if (denominator < 0) {
            numerator = -numerator;
            denominator = -denominator;
        }
        checkRep();
        return (denominator > 1) ? (numerator + "/" + denominator) 
                                 : (numerator + "");
    }

Notice that this toString implementation reassigns the private fields numerator and denominator, mutating the representation – even though it is an observer method on an immutable type! But, crucially, the mutation doesn’t change the abstract value. Dividing both numerator and denominator by the same common factor, or multiplying both by -1, has no effect on the result of the abstraction function, AF(numerator, denominator) = numerator/denominator. Another way of thinking about it is that the AF is a many-to-one function, and the rep value has changed to another that still maps to the same abstract value. So the mutation is harmless, or beneficent.

We’ll see other examples of beneficent mutation in future readings. This kind of implementer freedom often permits performance improvements like caching, data structure rebalancing, and lazy cleanup.

Documenting the AF, RI, and safety from rep exposure

It’s good practice to document the abstraction function and rep invariant in the class, using comments right where the private fields of the rep are declared. We’ve been doing that above.

When you describe the rep invariant and abstraction function, you must be precise.

It is not enough for the RI to be a generic statement like “all fields are valid.” The job of the rep invariant is to explain precisely what makes the field values valid or not.

It is not enough for the AF to offer a generic explanation like “represents a set of characters.” The job of the abstraction function is to define precisely how the concrete field values are interpreted.

As a function, if we take the documented AF and substitute in actual (legal) field values, we should obtain out a complete description of the single abstract value they represent. For example, for the CharSet abstraction function AF(s) = {s[i] | 0 <= i < s.length()}, we can substitute a specific (legal) rep value like s="abbc", and get AF("abbc") = {"abbc"[i] | 0 <= i < "abbc".length } = { 'a', 'b', 'c' }.

But if we instead had a vague and ill-defined abstraction function like AF(s) = a set of characters, then substituting for s would not affect the righthand side at all. This definition does not say precisely which abstract set of characters corresponds to the rep value s="abbc".

Another piece of documentation that 6.031 asks you to write is a rep exposure safety argument. This is a comment that examines each part of the rep, looks at the code that handles that part of the rep (particularly with respect to parameters and return values from clients, because that is where rep exposure occurs), and presents a reason why the code doesn’t expose the rep.

Here’s an example of Tweet with its rep invariant, abstraction function, and safety from rep exposure fully documented:

// Immutable type representing a tweet.
public class Tweet {

    private final String author;
    private final String text;
    private final Date timestamp;

    // Rep invariant:
    //   author is a Twitter username (a nonempty string of letters, digits, underscores)
    //   text.length <= 280
    // Abstraction function:
    //   AF(author, text, timestamp) = a tweet posted by author, with content text, 
    //                                 at time timestamp 
    // Safety from rep exposure:
    //   All fields are private;
    //   author and text are Strings, so are guaranteed immutable;
    //   timestamp is a mutable Date, so Tweet() constructor and getTimestamp() 
    //        make defensive copies to avoid sharing the rep's Date object with clients.

    // Operations (specs and method bodies omitted to save space)
    public Tweet(String author, String text, Date timestamp) { ... }
    public String getAuthor() { ... }
    public String getText() { ... }
    public Date getTimestamp() { ... }
}

Notice that we don’t have any explicit rep invariant conditions on timestamp (aside from the conventional assumption that timestamp!=null, which we have for all object references). But we still need to include timestamp in the rep exposure safety argument, because the immutability property of the whole type depends on all the fields remaining unchanged.

Compare the argument above with an example of a broken argument involving mutable Date objects.

Here are the arguments for RatNum.

// Immutable type representing a rational number.
public class RatNum {
    private final int numerator;
    private final int denominator;

    // Rep invariant:
    //   denominator > 0
    //   numerator/denominator is in reduced form, i.e. gcd(|numerator|,denominator) = 1
    // Abstraction function:
    //   AF(numerator, denominator) = numerator/denominator
    // Safety from rep exposure:
    //   All fields are private, and all types in the rep are immutable.

    // Operations (specs and method bodies omitted to save space)
    public RatNum(int n) { ... }
    public RatNum(int n, int d) { ... }
    ...
}

Notice that an immutable rep is particularly easy to argue for safety from rep exposure.

You can look at the full code for RatNum.

reading exercises

Arguing against rep exposure

Consider the following ADT:

// Mutable type representing Twitter users' followers.
public class FollowGraph {
    private final Map<String,Set<String>> followersOf;

    // Rep invariant:
    //    all Strings in followersOf are Twitter usernames
    //           (i.e., nonempty strings of letters, digits, underscores)
    //    no user follows themselves, i.e. x is not in followersOf.get(x)
    // Abstraction function:
    //    AF(followersOf) = the follower graph where Twitter user x is followed by user y
    //                      if and only if followersOf.get(x).contains(y)
    // Safety from rep exposure:
    //    All fields are private, and ..???..

    // Operations (specs and method bodies omitted to save space)
    public FollowGraph() { ... }
    public void addFollower(String user, String follower) { ... }
    public void removeFollower(String user, String follower) { ... }
    public Set<String> getFollowers(String user) { ... }
}

For each statement below: assuming the omitted method bodies are consistent with the statement, can we use the statement in place of ..???.. to make a complete and persuasive safety-from-rep-exposure comment?

1. “Strings are immutable.”

Yes

(missing explanation)

2. “followersOf is a mutable Map containing mutable Set objects, but getFollowers() makes a defensive copy of the Set it returns, and all other parameters and return values are immutable String or void.”

Yes

(missing explanation)

3. “This class is mutable, so rep exposure isn’t an issue.”

Yes

(missing explanation)

4. “followersOf is a mutable Map, but it is never passed or returned from an operation.”

Yes

(missing explanation)

5. “FollowGraph() does not expose the rep; addFollower() does not expose the rep; removeFollower() does not expose the rep; getFollowers() does not expose the rep.”

Yes

(missing explanation)

6. “String is immutable, and the Set objects in the rep are made immutable by unmodifiable wrappers. The Map type is mutable, but that type is never passed or returned from an operation.”

Yes

(missing explanation)

What an ADT specification may talk about

Since we’ve just discussed where to document the rep invariant and abstraction function, now is a good moment to update our notion of what a specification may talk about.

The specification of an abstract type T – which consists of the specs of its operations – should only talk about things that are visible to the client. This includes parameters, return values, and exceptions thrown by its operations. Whenever the spec needs to refer to a value of type T, it should describe the value as an abstract value, i.e. mathematical values in the abstract space A.

The spec should not talk about details of the representation, or elements of the rep space R. It should consider the rep itself (the private fields) invisible to the client, just as method bodies and their local variables are considered invisible. That’s why we write the rep invariant and abstraction function as ordinary comments within the body of the class, rather than Javadoc comments above the class. Writing them as Javadoc comments would commit to them as public parts of the type’s specification, which would interfere with rep independence and information hiding.

ADT invariants replace preconditions

Now let’s bring a lot of pieces together. An enormous advantage of a well-designed abstract data type is that it encapsulates and enforces properties that we would otherwise have to stipulate in a precondition. For example, instead of a spec like this, with an elaborate precondition:

/** 
 * @param set1 is a sorted set of characters with no repeats
 * @param set2 is likewise
 * @return characters that appear in one set but not the other,
 *  in sorted order with no repeats 
 */
static String exclusiveOr(String set1, String set2);

We can instead use an ADT that captures the desired property:

/**
 * @return characters that appear in one set but not the other
 */
static SortedSet<Character> exclusiveOr(SortedSet<Character> set1, SortedSet<Character> set2);

This hits all our targets:

it’s safer from bugs, because the required condition (sorted with no repeats) can be enforced in exactly one place, the SortedSet type, and because Java static checking comes into play, preventing values that don’t satisfy this condition from being used at all, with an error at compile-time.
it’s easier to understand, because it’s much simpler, and the name SortedSet conveys what the programmer needs to know.
it’s more ready for change, because the representation of SortedSet can now be changed without changing exclusiveOr or any of its clients.

Many of the places where we used preconditions on the early problem sets in this course would have benefited from a custom ADT instead.

reading exercises

Encapsulating preconditions in ADTs

Consider this method:

/**
 * Find tweets written by a particular user.
 * 
 * @param tweets a list of tweets with distinct timestamps, not modified by this method.
 * @param username Twitter username (a nonempty sequence of letters, digits, and underscores)
 * @return all and only the tweets in the list whose author is username,
 *         in the same order as in the input list.
 */
public static List<Tweet> writtenBy(List<Tweet> tweets, String username) { ... }

Which ADTs would you create to eliminate the preconditions of this method?

TweetsAndUsername

TweetList

Username

UsernameCharacter

(missing explanation)

How to establish invariants

An invariant is a property that is true for the entire program – which in the case of an invariant about an object, reduces to the entire lifetime of the object.

To make an invariant hold, we need to:

make the invariant true in the initial state of the object; and
ensure that all changes to the object keep the invariant true.

Translating this in terms of the types of ADT operations, this means:

creators and producers must establish the invariant for new object instances; and
mutators, observers, and producers must preserve the invariant for existing object instances.

The risk of rep exposure makes the situation more complicated. If the rep is exposed, then the object might be changed anywhere in the program, not just in the ADT’s operations, and we can’t guarantee that the invariant still holds after those arbitrary changes.

So the full rule for proving invariants is as follows:

If an invariant of an abstract data type is

established by creators and producers;

preserved by mutators, observers, and producers; and

no representation exposure occurs,

then the invariant is true of all instances of the abstract data type.

This rule applies structural induction over the abstract data type.

reading exercises

Structural induction

Recall this data type from the first exercise in this reading, now with a different rep (and new bugs):

/** Represents an immutable right triangle. */
class RightTriangle {
    private double[] sides;

    // Rep invariant:
    //   sides.length = 3
    //   sides[i] > 0 for all i
    //   sides[0]^2 + sides[1]^2 = sides[2]^2 (within the error tolerance of double arithmetic)

    // Abstraction function:
    //   AF(sides) = the right triangle with legs sides[0] and sides[1], and hypotenuse sides[2]


    /**
     * Make a right triangle.
     * @param legA, legB  the two legs of the triangle, must be positive
     * @param hypotenuse  the hypotenuse of the triangle, also positive,
     *           requires hypotenuse^2 = legA^2 + legB^2
     *           (within the error tolerance of double arithmetic)
     */
    public RightTriangle(double legA, double legB, double hypotenuse) {
        this.sides = new double[] { legA, legB, hypotenuse };
    }

    /** 
     * Get all the sides of the triangle.
     * @return three-element array with the triangle's side lengths
     */
    public double[] getAllSides() {
        return sides;
    }

    /**
     * @return length of the triangle's hypotenuse
     */
    public double getHypotenuse() {
        return sides[2];
    }

    /**
     * @param factor to multiply the sides by, must be >0
     * @return a triangle made from this triangle by 
     * multiplying all side lengths by factor.
     */
    public RightTriangle scale(double factor) {
        return new RightTriangle(sides[0]*factor, sides[1]*factor, sides[2]*factor);
    }

    /**
     * @return a regular triangle made from this triangle.
     * A regular right triangle is one in which
     * both legs have the same length.
     */
    public RightTriangle regularize() {
        double bigLeg = Math.max(sides[0], sides[1]);
        return new RightTriangle(bigLeg, bigLeg, sides[2]);
    }

}

This datatype has an important invariant: the relationship between the legs and hypotenuse, as stated in the Pythagorean theorem.

Assuming the client obeys the contracts when using RightTriangle, which of the following statements are true about this invariant?

The creator RightTriangle() establishes the invariant if the client obeys all contracts.

(missing explanation)

The observer getAllSides() preserves the invariant if the client obeys all contracts.

(missing explanation)

The observer getHypotenuse() preserves the invariant if the client obeys all contracts.

(missing explanation)

The producer scale() preserves the invariant if the client obeys all contracts.

(missing explanation)

The producer regularize() preserves the invariant if the client obeys all contracts.

(missing explanation)

Summary

An invariant is a property that is always true of an ADT object instance, for the lifetime of the object.
A good ADT preserves its own invariants. Invariants must be established by creators and producers, and preserved by observers and mutators.
The rep invariant specifies legal values of the representation, and should be checked at runtime with checkRep().
The abstraction function maps a concrete representation to the abstract value it represents.
Representation exposure threatens both representation independence and invariant preservation.
An invariant should be documented, by comments or even better by assertions like checkRep(), otherwise the invariant is not safe from bugs.

The topics of today’s reading connect to our three properties of good software as follows:

Safe from bugs. A good ADT preserves its own invariants, so that those invariants are less vulnerable to bugs in the ADT’s clients, and violations of the invariants can be more easily isolated within the implementation of the ADT itself. Stating the rep invariant explicitly, and checking it at runtime with checkRep(), catches misunderstandings and bugs earlier, rather than continuing on with a corrupt data structure.
Easy to understand. Rep invariants and abstraction functions explicate the meaning of a data type’s representation, and how it relates to its abstraction.
Ready for change. Abstract data types separate the abstraction from the concrete representation, which makes it possible to change the representation without having to change client code.

More practice

If you would like to get more practice with the concepts covered in this reading, you can visit the question bank. The questions in this bank were written in previous semesters by students and staff, and are provided for review purposes only – doing them will not affect your classwork grades.