6.031
6.031 — Software Construction
Spring 2021

Reading 15: Equality

Java Tutor exercises

Keep making progress on Java by completing this category in the Java Tutor:

Software in 6.031

Safe from bugsEasy to understandReady for change
Correct today and correct in the unknown future. Communicating clearly with future programmers, including future you. Designed to accommodate change without rewriting.

Objectives

  • Understand the properties of an equivalence relation.
  • Understand equality for immutable types defined in terms of the abstraction function and in terms of observation.
  • Differentiate between reference equality and object equality.
  • Differentiate between strict observational and behavioral equality for mutable types.
  • Understand the Object contract and be able to implement equality correctly for mutable and immutable types.

Introduction

In the previous readings we’ve developed a rigorous notion of data abstraction by creating types that are characterized by their operations, not by their representation. For an abstract data type, the abstraction function explains how to interpret a concrete representation value as a value of the abstract type, and we saw how the choice of abstraction function determines how to write the code implementing each of the ADT’s operations.

In this reading we turn to how we define the notion of equality of values in a data type: the abstraction function will give us a way to cleanly define the equality operation on an ADT.

In the physical world, every object is distinct – at some level, even two identical snowflakes are different, even if the distinction is just the position they occupy in space. (This isn’t strictly true of all subatomic particles, but true enough of large objects like snowflakes and baseballs and people.) So two physical objects are never truly “equal” to each other; they only have degrees of similarity.

In the world of human language, however, and in the world of mathematical concepts, you can have multiple names for the same thing. So it’s natural to ask when two expressions represent the same thing: 1+2, √9, and 3 are alternative expressions for the same ideal mathematical value.

The first part of this reading focuses on defining equality for immutable types, so that we can understand the concepts without the complication of mutation. Then we’ll expand to consider mutable types as well.

Equivalence relation

Let’s start with a look at the mathematical properties that need to be satisfied by a sensible notion of equality.

An equality operation defined on a type T can be regarded as a binary relation E ⊆ T x T, where a pair of values (x,y) ∈ E if and only if x and y are considered equal according to E.

Furthermore, for an equality operation, E must be an equivalence relation, meaning that it is:

reflexive:
(t,t) ∈ E  ∀ t ∈ T
symmetric:
(t,u) ∈ E  ⇒  (u,t) ∈ E
transitive:
(t,u) ∈ E ∧ (u,v) ∈ E  ⇒  (t,v) ∈ E

For ==, the equivalence E is the set of pairs (x,y) for which x == y. So for ==, these properties can also be written as:

reflexive:
t == t  ∀ t ∈ T
symmetric:
t == u  ⇒  u == t
transitive:
t == u ∧ u == v  ⇒  t == v

And similarly for the equals() method.

An equality operation should always be an equivalence relation, or surprising behavior and bugs will result.

reading exercises

No not None

To see how violating the equivalence relations can produce surprising results, let’s imagine we’re the designers of Python trying to decide how == should behave in various situations. To illustrate the consequences of our decisions, we’ll use these set operations, implemented in a straightforward way, which of course depend on the semantics of ==:

def is_member(x, set):
    '''
    Returns True iff x is a member of set.
    Requires that set contain only unique elements.
    '''
    for y in set:
        if x == y:
            return True
    return False

def insert(x, set):
    '''
    Insert an element into a set.
    Modifies the set so that set = set U { x }.
    Requires and guarantees that set contain only unique elements.
    '''
    if not is_member(x, set):
        set.append(x)

Suppose we decide that Python None (like Java null) is so bad that it should never compare equal to anything. So None == x is false for all possible values of x.

This decision violates:

(missing explanation)

With this rule, what is the outcome of this code that inserts the same value x twice into an empty set?

set = []
insert(x, set)
insert(x, set)
print(len(set))
print(is_member(set[0], set))

When x is 5, this code prints:

(missing explanation)

When x is None, this code prints:

(missing explanation)

Convertibles are overrated

Now suppose we decide that == should automatically convert different types before comparing them. If the righthand side of == has a different type than the lefthand side, then the righthand side converts itself to the type of the lefthand side before testing for equality.

So if s is a string and n is an integer, then s == n would be the same as s == str(n), and n == s would be the same as n == int(s). If the conversion fails, the equality test returns false. Thus, for example:

  • "5" == 5 would be the same as "5" == str(5), which is true.
  • 5 == "5" would be the same as 5 == int("5"), which is true.
  • "abc" == 5 would be the same as "abc" == str(5), which is false.
  • 5 == "abc" would be the same as 5 == int("abc"), which fails to convert, so the result is false.

But Ben Bitdiddle points out that the int() operation that converts a string to an integer ignores leading zeroes in a string like "001".

What is the result of these expressions?

7 == "007"

(missing explanation)

"007" == 7

(missing explanation)

These expressions show that this design decision violates:

(missing explanation)

Reading the implementations of insert() and is_member() above carefully, what is printed by this code?

set = []
insert("007", set)
insert(7, set)
print(len(set))

(missing explanation)

And what is printed by this code, whose only difference is the order in which the elements were inserted in the set?

set = []
insert(7, set)
insert("007", set)
print(len(set))

(missing explanation)

Equality of immutable types

Now that we have a mathematical foundation for all equality operators, let’s look at equality in the context of abstract data types, starting with immutable types.

Formally, we can define equality on immutable types in two ways.

Using the abstraction function. Recall that an abstraction function AF: R → A maps concrete instances of a data type to their corresponding abstract values. To use AF as a definition for equality, we would say that a equals b if and only if AF(a) = AF(b).

Using observation. We can say that two objects are equal when they cannot be distinguished by observation – every operation we can apply produces the same result for both objects. Consider the set expressions {1,2} and {2,1}. Using the observer operations available for sets, cardinality |…| and membership ∈, these expressions are indistinguishable:

  • |{1,2}| = 2 and |{2,1}| = 2
  • 1 ∈ {1,2} is true, and 1 ∈ {2,1} is true
  • 2 ∈ {1,2} is true, and 2 ∈ {2,1} is true
  • 3 ∈ {1,2} is false, and 3 ∈ {2,1} is false
  • … and so on

In terms of abstract data types, “observation” means calling operations on the objects. So two objects are equal if and only if they cannot be distinguished by calling any operations of the abstract data type.

Note that “operations of the abstract data type” means the operations that are part of the spec of the ADT. Java has language features that allow a client to break the abstraction barrier and observe differences between objects that are irrelevant to their abstract values. For example, Java’s == can detect that two objects representing identical sets are actually stored at different places in memory. System.identityHashCode() computes a hash code based on an object’s memory address, so it could observe the same distinction. But neither of these operations would be part of the spec of the set ADT, so the distinctions they make are not considered when deciding if two set objects are equal by observation.

The two definitions of equality – by abstraction function, and by observation – should be consistent with each other, or something is wrong.

Example: Duration

Here’s a simple example of an immutable ADT.

public class Duration {
    private final int mins;
    private final int secs;
    // Rep invariant:
    //    mins >= 0, secs >= 0
    // Abstraction function:
    //    AF(min, secs) = the span of time of mins minutes and secs seconds

    /** Make a duration lasting for m minutes and s seconds. */
    public Duration(int m, int s) {
        mins = m; secs = s;
    }
    /** @return length of this duration in seconds */
    public long getLength() {
        return (long)mins*60 + secs;
    }
}

Now which of the following values should be considered equal?

Duration d1 = new Duration (1, 2);
Duration d2 = new Duration (1, 3);
Duration d3 = new Duration (0, 62);
Duration d4 = new Duration (1, 2);

Think in terms of both the abstraction-function definition of equality, and the observational equality definition.

reading exercises

Any second now

Consider the code for Duration and the objects d1, d2, d3, d4 just created above.

Using the abstraction-function notion of equality, which of the following would be considered equal to d1?

(missing explanation)

Eye on the clock

Using the observational notion of equality, which of the following would be considered equal to d1?

(missing explanation)

Letters

Consider the following rep for an abstract data type:

/** Immutable type representing a set of letters, ignoring case */
class LetterSet {
    private String s;

    // Abstraction function:
    //    AF(s) = the set of the letters that are found in s
    //            (ignoring non-letters and alphabetic case)
    // Rep invariant:
    //    true

    /**
     * Make a LetterSet consisting of the letters found in chars 
     * (ignoring alphabetic case and non-letters).
     */
    public LetterSet(String chars) {
        s = chars;
    }

    ... // observer and producer operations
}

Using the abstraction-function definition of equality for LetterSet, which of the following should be considered equal to new LetterSet("abc")?

(missing explanation)

Consider the following observer and producer operations that might be included in the LetterSet type.

/**
 * @return the size of this set
 */
public int size() { ... }

/**
 * @param letter must be a letter 'a'...'z' or 'A'...'Z'
 * @return true iff this set contains letter, ignoring alphabetic case
 */
public boolean contains(char letter) { ... }

/**
 * @return the length of the string that this set was constructed from
 */
public int length() { ... }

/**
 * @return the union of this and that
 */
public LetterSet union(LetterSet that) { ... }

/**
 * @return true if and only if all the letters in this set are lowercase
 */
public boolean isAllLowercase() { ... }

/**
 * @return the first letter in s */
public char first() { ... }

Which choices of observers and producers would yield an observational definition of equality that is consistent with the abstraction-function definition of equality?

(missing explanation)

Which operations are definitely inconsistent with the abstraction function?

(missing explanation)

Lines

Consider this abstract data type, which has a rep and a complete set of operations, but is missing its AF/RI:

/** Immutable type representing lines in the plane that are neither horizontal nor vertical. */
public class MyLine {
    private double a;
    private double b;

    // Abstraction function:
    //    TODO
    // Rep invariant:
    //    TODO

    /**
     *  Make a MyLine that passes through all the points (p[2i], p[2i+1]).
     *  p must contain at least 2 points and all points in p must be collinear
     *  on a line that is neither horizontal nor vertical.
     *  For example, MyLine({ 0,0,  1,2 }) passes through both (0,0) and (1,2).
     */
    public MyLine(int[] p) { ... }

    /**
     * Get the slope of the line.
     */
    public double slope() { ... }
}

Using the observational definition of equality, which of the following should be considered equal to new MyLine(new int[] { 0,0, 1,1 })?

(missing explanation)

Which of these abstraction functions would yield an abstraction-function definition of equality that consistent with the observational definition of equality? (Assume that the rep invariant is defined appropriately in each case.)

(missing explanation)

== vs. equals()

Like many languages, Java has two different operations for testing equality between objects, with different semantics.

  • The == operator compares references. More precisely, it tests reference equality. Two references are == if they point to the same storage in memory. In terms of the snapshot diagrams we’ve been drawing, two references are == if their arrows point to the same object bubble.
  • The equals() operation compares object contents – in other words, object equality, in the sense that we’ve been talking about in this reading. The equals operation has to be defined appropriately for every abstract data type.

For comparison, here are the equality operators in several languages:

reference
equality

object
equality

Java

==

equals()

Objective C

==

isEqual:

C#

==

Equals()

Python

is

==

JavaScript

==

n/a

Note that == unfortunately flips its meaning between Java and Python. Don’t let that confuse you: == in Java just tests reference identity, it doesn’t compare object contents.

Note also that this table is only about equality for object types, like we use to define a new ADT. Built-in primitive types, like int in Java or number in JavaScript, may follow different rules. For int in Java, for example, you can’t use equals(), the == operator does object equality, and there is no notion of reference equality at all.

As programmers in any of these languages, we can’t change the meaning of the reference equality operator. In Java, == always means reference equality. But when we define a new data type, it’s our responsibility to decide what object equality means for values of the data type, and implement the equals() operation appropriately.

Implementing equals()

The equals() method is defined by Object, and its default implementation looks like this:

public class Object {
    ...
    public boolean equals(Object that) {
        return this == that;
    }
}

In other words, the default meaning of equals() is the same as reference equality. For immutable data types, this is almost always wrong. So you have to override the equals() method, replacing it with your own implementation.

Implementing equals() correctly can be subtle. As a prelude, you may want to refresh your understanding of dynamic dispatch.

The wrong way to implement equals()

Here’s our first try at overriding equals() for Duration:

public class Duration {
    ...   
    // Problematic definition of equals()
    public boolean equals(Duration that) {
        return this.getLength() == that.getLength();        
    }
}

There’s a subtle problem here. Why doesn’t this work? Let’s try this code:

Duration d1 = new Duration (1, 2);
Duration d2 = new Duration (1, 2);
Object o2 = d2;
d1.equals(d2) → true
d1.equals(o2) → false

It produces the snapshot diagram on the right, and you can see this code in action. You’ll see that even though d2 and o2 end up referring to the very same object in memory, you still get different results for them from equals().

What’s going on? It turns out that Duration has overloaded the equals() method, because the method signature was not identical to Object’s. We actually have two equals() methods in Duration: an implicit equals(Object) inherited from Object, and the new equals(Duration).

public class Duration extends Object {
    // explicit method that we declared:
    public boolean equals(Duration that) {
        return this.getLength() == that.getLength();
    }
    // implicit method inherited from Object:
    public boolean equals(Object that) {
        return this == that;
    }
}

We’ve seen overloading since the very beginning of the course in Static Checking. Importantly, the Java compiler selects between overloaded operations using the static type of the parameter expressions. For example, when you use the / operator, the compiler chooses either integer division or float division based on whether the arguments are ints or floats. The same compile-time selection happens here. If we pass an Object reference, as in d1.equals(o2), we end up calling the equals(Object) implementation. If we pass a Duration reference, as in d1.equals(d2), we end up calling the equals(Duration) version. This happens even though o2 and d2 both point to the same object at runtime! Equality has become inconsistent.

It’s easy to make a mistake in the method signature, and overload a method when you meant to override it. This is such a common error that Java has a language feature, the annotation @Override, which you should use whenever your intention is to override a method in your superclass. With this annotation, the Java compiler will check that a method with the same signature actually exists in the superclass, and give you a compiler error if you’ve made a mistake in the signature.

A better way to implement equals()

Here’s a better way to implement Duration’s equals() method:

@Override
public boolean equals(Object that) {
    return that instanceof Duration && this.sameValue((Duration)that);
}

// returns true iff this and that represent the same abstract value
private boolean sameValue(Duration that) {
    return this.getLength() == that.getLength();
}

The first method overrides and replaces the equals(Object) method inherited from Object. It tests the type of the that object passed to it to make sure it’s a Duration, and then calls a private helper method sameValue() to test equality. The expression (Duration)that is a typecast expression, which asserts to the Java compiler that you are confident that that points to a Duration object.

This fixes the problem:

Duration d1 = new Duration(1, 2);
Duration d2 = new Duration(1, 2);
Object o2 = d2;
d1.equals(d2) → true
d1.equals(o2) → true

You can see this code in action.

instanceof

The instanceof operator tests whether an object is an instance of a particular type. Using instanceof is dynamic type checking, not the static type checking we vastly prefer. In general, using instanceof in object-oriented programming is a bad smell. In good object-oriented programming, instanceof is disallowed anywhere except for implementing equals. This prohibition also includes other ways of inspecting objects’ runtime types. For example, getClass is also disallowed.

We’ll see examples of when you might be tempted to use instanceof, and how to write alternatives that are safer from bugs and more ready for change, in a future reading.

The Object contract

The specification of the Object class is so important that it is often referred to as the Object Contract. The contract can be found in the method specifications for the Object class. Here we will focus on the contract for equals. When you override the equals method, you must adhere to its general contract. It states that:

  • equals must define an equivalence relation – that is, a relation that is reflexive, symmetric, and transitive;
  • equals must be consistent: repeated calls to the method must yield the same result (as long as neither object has been mutated in a way that affects the equals comparison);
  • for a non-null reference x, x.equals(null) should return false;
  • hashCode must produce the same result for two objects that are deemed equal by the equals method.

Breaking the equivalence relation

Let’s start with the equivalence relation. We have to make sure that the definition of equality implemented by equals() is actually an equivalence relation as defined earlier: reflexive, symmetric, and transitive. If it isn’t, then operations that depend on equality (like set membership or list searching) will behave erratically and unpredictably. You don’t want to program with a data type in which sometimes a equals b, but b doesn’t equal a. Subtle and painful bugs will result.

Here’s an example of how an innocent attempt to make equality more flexible can go wrong. Suppose we wanted to allow for a tolerance in comparing Duration objects, because different computers may have slightly unsynchronized clocks:

@Override
public boolean equals(Object that) {
    return that instanceof Duration && this.sameValue((Duration)that);
}

private static final int CLOCK_SKEW = 5; // seconds

// returns true iff this and that represent the same abstract value within a clock-skew tolerance
private boolean sameValue(Duration that) {
    return Math.abs(this.getLength() - that.getLength()) <= CLOCK_SKEW;
}

Which property of the equivalence relation is violated?

reading exercises

Equals-ish

Consider the latest implementation of Duration in the reading, reprinted here for convenience:

public class Duration {
    private final int mins;
    private final int secs;
    // Rep invariant:
    //    mins >= 0, secs >= 0
    // Abstraction function:
    //    AF(min, secs) = the span of time of mins minutes and secs seconds

    /**
     * Make a duration lasting for m minutes and s seconds.
     */
    public Duration(int m, int s) {
        mins = m; secs = s;
    }
    /**
     * @return length of this duration in seconds
     */
    public long getLength() {
        return (long)mins*60 + secs;
    }

    @Override
    public boolean equals(Object that) {
        return that instanceof Duration && this.sameValue((Duration)that);
    }

    private static final int CLOCK_SKEW = 5; // seconds

    // returns true iff this and that represent the same abstract value within a clock-skew tolerance
    private boolean sameValue(Duration that) {
        return Math.abs(this.getLength() - that.getLength()) <= CLOCK_SKEW;
    }
}

Suppose these Duration objects are created:

Duration d_0_60 = new Duration(0, 60);
Duration d_1_00 = new Duration(1, 0);
Duration d_0_57 = new Duration(0, 57);
Duration d_1_03 = new Duration(1, 3);

(missing explanation)

Skewed up

(missing explanation)

Buggy equality

(missing explanation)

Null, null, null

The Object contract explicitly allows null as a parameter, and specifies the resulting behavior in its postcondition:

  • for a non-null reference x, x.equals(null) should return false;

Suppose Louis Reasoner decides, contrary to this contract, that zero-length Durations should compare equal to null. So new Duration(0, 0).equals(null) should return true.

Alyssa points out the obvious problem: null.equals(new Duration(0, 0)) can’t return true as well.

What property of an equivalence relation is being violated here?

(missing explanation)

In the correct implementation of Duration.equals() below, which line of code below causes equals() to return false when that is null?

1 @Override
2 public boolean equals(Object that) {
3     return that instanceof Duration 
4            && this.sameValue((Duration)that);
  }

  // returns true iff this and that represent the same abstract value
5 private boolean sameValue(Duration that) {
6     return this.getLength() == that.getLength();
  }

(missing explanation)

Oh no

Here is one tempting way to override equals:

@Override
public boolean equals(Object that) {
    return this.toString().equals(that.toString());
}

This immediately has a problem because the default implementation inherited from Object is not useful. So let’s override toString too:

@Override
public String toString() {
    return mins + ":" + secs;  // e.g. "2:25" for 2 minutes, 25 seconds
}

Which of the following expressions would return true?

(missing explanation)

Breaking hash tables

To understand the part of the contract relating to the hashCode method, you’ll need to have some idea of how hash tables work. Two very common collection implementations, HashSet and HashMap, use a hash table data structure, and depend on the hashCode method to be implemented correctly for the objects stored in the set and used as keys in the map.

A hash table is a representation for a mapping: an abstract data type that maps keys to values. Hash tables offer constant time lookup, so they tend to perform better than trees or lists. Keys don’t have to be ordered, or have any particular property, except for offering equals and hashCode.

Here’s how a hash table works. It contains an array that is initialized to a size corresponding to the number of elements that we expect to be inserted. When a key and a value are presented for insertion, we compute the hash code of the key, and convert it into an index in the array’s range (e.g., by a modulo division). The value is then inserted at that index.

The rep invariant of a hash table includes the fundamental constraint that a key can be found by starting from the slot determined by its hash code.

Hash codes are designed so that the keys will be spread evenly over the indices. But occasionally a conflict occurs, and two keys are placed at the same index. So rather than holding a single value at an index, a hash table actually holds a list of key/value pairs, usually called a hash bucket. A key/value pair is implemented in Java simply as an object with two fields. On insertion, you add a pair to the list in the array slot determined by the hash code. For lookup, you hash the key, find the right slot, and then examine each of the pairs until one is found whose key equals the query key.

Now it should be clear why the Object contract requires equal objects to have the same hash code. If two equal objects had distinct hash codes, they might be placed in different slots. So if you attempt to lookup a value using a key equal to the one with which it was inserted, the lookup may fail.

Object’s default hashCode() implementation is consistent with its default equals():

public class Object {
  ...
  public boolean equals(Object that) { return this == that; }
  public int hashCode() { return /* the memory address of this */; }
}

For references a and b, if a == b, then the address of a equals the address of b. So the Object contract is satisfied.

But immutable objects need a different implementation of hashCode(). For Duration, since we haven’t overridden the default hashCode() yet, we’re currently breaking the Object contract:

Duration d1 = new Duration(1, 2);
Duration d2 = new Duration(1, 2);
d1.equals(d2) → true
d1.hashCode() → 2392
d2.hashCode() → 4823

d1 and d2 are equals(), but they have different hash codes. So we need to fix that.

A simple and drastic way to ensure that the contract is met is for hashCode to always return some constant value, so every object’s hash code is the same. This satisfies the Object contract, but it would have a disastrous performance effect, since every key will be stored in the same slot, and every lookup will degenerate to a linear search along a long list.

The standard way to construct a more reasonable hash code that still satisfies the contract is to compute a hash code for each component of the object that is used in the determination of equality (usually by calling the hashCode method of each component), and then combining these, throwing in a few arithmetic operations. For Duration, this is easy, because the abstract value of the class is already an integer value:

@Override
public int hashCode() {
    return (int) getLength();
}

Josh Bloch’s fantastic book, Effective Java, explains this issue in more detail, and gives some strategies for writing decent hash code functions. The advice is summarized in a good StackOverflow post. Recent versions of Java now have a utility method Objects.hash() that makes it easier to implement a hash code involving multiple fields.

Note, however, that as long as you satisfy the requirement that equal objects have the same hash code value, then the particular hashing technique you use doesn’t make a difference to the correctness of your code. It may affect its performance, by creating unnecessary collisions between different objects, but even a poorly-performing hash function is better than one that breaks the contract.

Most crucially, note that if you don’t override hashCode at all, you’ll get the one from Object, which is based on the address of the object. If you have overridden equals, this will mean that you will have almost certainly violated the contract. So as a general rule:

Always override hashCode when you override equals.

Many years ago, in an earlier version of this course, a student spent hours tracking down a bug in a project that amounted to nothing more than misspelling hashCode as hashcode. This created a new method that didn’t override the hashCode method of Object at all, and strange things happened. Use @Override!

reading exercises

Give me the code

Consider the following ADT class:

class Person {
  private String firstName;
  private String lastName;
  ...

  @Override
  public boolean equals(Object that) {
      return that instanceof Person && this.sameValue(that);
  }

  // returns true iff this and that represent the same abstract value
  private boolean sameValue(Person that) {
      return this.lastName.toUpperCase().equals(that.lastName.toUpperCase());
  }

  @Override
  public int hashCode() {
      // TODO
  }
}

(missing explanation)

Equality of mutable types

We’ve been focusing on equality of immutable objects so far in this reading. What about mutable objects?

Equality must still be an equivalence relation, as required by the Object contract.

We also want equality to respect the abstraction function and respect operations. But with mutable objects, there is a new possibility: by calling a mutator on one of the objects before doing the observation, we may change its state and thus create an observable difference between the two objects.

So let’s refine our definition and allow for two notions of equality based on observation:

  • observational equality means that two references cannot be distinguished now, in the current state of the program. A client can try to distinguish them only by calling operations that don’t change the state of either object (i.e. only observers and producers, not mutators) and comparing the results of those operations. This tests whether the two references “look” the same for the current state of the object.
  • behavioral equality means that two references cannot be distinguished now or in the future, even if a mutator is called to change the state of one object but not the other. This tests whether the two references will “behave” the same, in this and all future states.

For immutable objects, observational and behavioral equality are identical, because there aren’t any mutator methods that can change the state of the objects.

For mutable objects, it can be very tempting to use observational equality as the design choice. Java uses observational equality for most of its mutable data types, in fact. If two distinct List objects contain the same sequence of elements, then equals() reports that they are equal.

reading exercises

Observational equality on lists
List<Integer> listA = List.of(1, 2, 3);
List<Integer> listB = new ArrayList<>(listA);
List<Integer> listC = listB;

If we want to explore whether the List values referred to by listA, listB, and listC are equal by observational equality, which of the following thought experiments could we try? (Assume you catch exceptions, so that the thought experiment can keep running without crashing the program.)

(missing explanation)

Which objects are equal by observational equality?

(missing explanation)

Behavioral equality on lists

As before:

List<Integer> listA = List.of(1, 2, 3);
List<Integer> listB = new ArrayList<>(listA);
List<Integer> listC = listB;

If we want to explore whether listA, listB, and listC would be equal by behavioral equality, which of the following thought experiments could we try? (Assume you catch exceptions, so that the thought experiment can keep running without crashing the program.)

(missing explanation)

Which objects are equal by behavioral equality?

(missing explanation)

Java arrays

We just saw that the Java List type uses observational equality. But for Java arrays, equals() uses behavioral equality.

int[] arrayA = new int[] { 1, 2, 3 };
int[] arrayB = arrayA;
int[] arrayC = new int[] { 1, 2, 3 };

If we now want to explore whether arrayA, arrayB, and arrayC would be equal by behavioral equality, which of the following thought experiments could we try?

(missing explanation)

Since Java arrays use behavioral equality, which of the following are true?

(missing explanation)

Observational equality on sets

Let’s look at sets now:

Set<Integer> setA = Set.of(1, 2, 3);
Set<Integer> setB = Set.of(3, 2, 1);
Set<Integer> setC = new HashSet<>(setA);

If we want to explore whether the Set values referred to by setA, setB, and setC are equal by observational equality, which of the following thought experiments could we try?

(missing explanation)

Which objects are equal by observational equality?

(missing explanation)

Breaking a HashSet’s rep invariant

Using observational equality for mutable types seems at first like a reasonable idea. It allows, for example, two Lists of the same integers in the same order to be equal().

But the presence of mutators unfortunately leads to subtle bugs, because it means that equality isn’t consistent over time. Two objects that are observationally equal at one moment in the program may stop being equal after a mutation.

This fact allows us to easily break the rep invariants of other collection data structures. Suppose we make a List, and then drop it into a HashSet:

List<String> list = new ArrayList<>();
list.add("a");

Set<List<String>> set = new HashSet<List<String>>();
set.add(list);

We can check that the set contains the list we put in it, and it does:

set.contains(list) → true

But now we mutate the list:

list.add("goodbye");

And it no longer appears in the set!

set.contains(list) → false!

It’s worse than that, in fact: when we iterate over the members of the set, we still find the list in there, but contains() says it’s not there!

for (List<String> l : set) { 
    set.contains(l) → false! 
}

If the set’s own iterator and its own contains() method disagree about whether an element is in the set, then the set clearly is broken. You can see this code in action.

What’s going on? List<String> is a mutable object. In the standard Java implementation of collection classes like List, mutations affect the result of equals() and hashCode(). When the list is first put into the HashSet, it is stored in the hash bucket corresponding to its hashCode() result at that time. When the list is subsequently mutated, its hashCode() changes, but HashSet doesn’t realize it should be moved to a different bucket. So it can never be found again.

When equals() and hashCode() can be affected by mutation, we can break the rep invariant of a hash table that uses that object as a key.

Here’s a telling quote from the specification of java.util.Set:

Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set.

The Java library is unfortunately inconsistent about its interpretation of equals() for mutable classes. Collections like List, Set, and Map use observational equality, but other mutable classes like StringBuilder, and arrays, use behavioral equality.

The final rule for equals() and hashCode()

The lesson we should draw from this example is that equals() should implement behavioral equality. Otherwise instances of the type cannot be safely stored in a data structure that depends on hashing, like HashMap and HashSet.

For immutable types:

  • Behavioral equality is the same as observational equality.
  • equals() must be overridden to compare abstract values.
  • hashCode() must be overriden to map the abstract value to an integer.

For mutable types:

  • Behavioral equality is different from observational equality.
  • equals() should generally not be overriden, but inherit the implementation from Object that compares references, just like ==.
  • hashCode() should likewise not be overridden, but inherit Object’s implementation that maps the reference into an integer.

For a mutable type that needs a notion of observational equality (whether two mutable objects “look” the same in the current state), it’s better to define a completely new operation, which might be called similar() or sameValue(). Its implementation would be similar to the private sameValue() helper method we have been writing for immutable types, but it would be a public operation available to clients. Unfortunately the Java library did not make that design decision.

reading exercises

Bag

Suppose Bag<E> is a mutable ADT representing what is often called a multiset, an unordered collection of objects where an object can occur more than once. It has the following operations:

/** make an empty bag */
public Bag<E>()

/** modify this bag by adding an occurrence of e, and return this bag */
public Bag<E> add(E e)

/** modify this bag by removing an occurrence of e (if any), and return this bag */
public Bag<E> remove(E e)

/** return number of times e occurs in this bag */
public int count(E e)

Suppose we run this code:

Bag<String> b1 = new Bag<>().add("a").add("b");
Bag<String> b2 = new Bag<>().add("a").add("b");
Bag<String> b3 = b1.remove("b");
Bag<String> b4 = new Bag<>().add("b").add("a"); // swap!

(missing explanation)

Bag behavior

If Bag is implemented with behavioral equality, which of the following expressions are true?

(missing explanation)

Bean bag

If Bag were part of the Java API, it would probably implement observational equality, counter to the recommendation in the reading.

If Bag implemented observational equality despite the dangers, which of the following expressions are true?

(missing explanation)

Autoboxing and equality

One more instructive pitfall in Java. We’ve talked about primitive types and their object type equivalents – for example, int and Integer. The object type implements equals() in the correct way, so that if you create two Integer objects with the same value, they’ll be equals() to each other:

Integer x = new Integer(3);
Integer y = new Integer(3);
x.equals(y) → true

But there’s a subtle problem here; == is overloaded. For reference types like Integer, it implements reference equality:

x == y // returns false

But for primitive types like int, == implements behavioral equality:

(int)x == (int)y // returns true

So you can’t really use Integer interchangeably with int. The fact that Java automatically converts between int and Integer (this is called autoboxing and autounboxing) can lead to subtle bugs! You have to be aware what the compile-time types of your expressions are. Consider this:

Map<String, Integer> a = new HashMap<>(), b = new HashMap<>();
String c = "c";
a.put(c, 130); // put ints into the maps
b.put(c, 130);
a.get(c) == b.get(c) → ?? // what do we get out of the maps? are they equal?

reading exercises

Boxes

In the last code example above…

What is the compile-time type of the expression 130?

(missing explanation)

After executing a.put(c, 130), what is the runtime type that is used to represent the value 130 in the map?

(missing explanation)

What is the compile-time type of a.get(c)?

(missing explanation)

Circles
Map<String, Integer> a = new HashMap<>(), b = new HashMap<>();
String c = "c";
a.put(c, 130); // put ints into the maps
b.put(c, 130);

In the box below, use Snapdown to draw a snapshot diagram that shows the state of the program after the code above has executed. Open the Snapdown help sidebar to get the details of Snapdown syntax – it’s a little language for drawing diagrams with text.

Click to show Snapdown help sidebar

cString"c"aHashMapWhatAmI?"change me!"

Since Snapdown is new: in the box below, please let us know any feedback you have on this exercise and on Snapdown, especially if you find anything challenging or unintuitive:

(missing explanation)

Equals
Map<String, Integer> a = new HashMap<>(), b = new HashMap<>();
String c = "c";
a.put(c, 130); // put ints into the maps
b.put(c, 130);

After this code executes, what would a.get(c).equals(b.get(c)) return?

(missing explanation)

What would a.get(c) == b.get(c) return?

(missing explanation)

Unboxes

Now suppose you assign the get() results to int variables:

int i = a.get(c);
int j = b.get(c);
boolean isEqual = (i == j);

After executing this code, what is the value of isEqual?

(missing explanation)

Summary

  • Equality should be an equivalence relation (reflexive, symmetric, transitive).
  • Equality and hash code must be consistent with each other, so that data structures that use hash tables (like HashSet and HashMap) work properly.
  • The abstraction function is the basis for equality in immutable data types (which means immutable types must override equals(), and therefore hashCode()).
  • Reference equality is the basis for equality in mutable data types; this is the only way to ensure consistency over time and avoid breaking rep invariants of hash tables.

Equality is one part of implementing an abstract data type, and we’ve already seen how important ADTs are to achieving our three primary objectives. Let’s look at equality in particular:

  • Safe from bugs. Correct implementation of equality and hash codes is necessary for use with collection data types like sets and maps. It’s also highly desirable for writing tests. Since every object in Java inherits the Object implementations, immutable types must override them.

  • Easy to understand. Clients and other programmers who read our specs will expect our types to implement an appropriate equality operation, and will be surprised and confused if we do not.

  • Ready for change. Correctly-implemented equality for immutable types separates equality of reference from equality of abstract value, hiding from clients our decisions about whether values are shared. Choosing behavioral rather than observational equality for mutable types helps avoid unexpected aliasing bugs.

More practice

If you would like to get more practice with the concepts covered in this reading, you can visit the question bank. The questions in this bank were written in previous semesters by students and staff, and are provided for review purposes only – doing them will not affect your classwork grades.