Reading 12: Defining ADTs with Interfaces, Generics, Enums, and Functions
Java Tutor exercises
Keep making progress on Java by completing this category in the Java Tutor:
Software in 6.031
Objectives
Today’s class is about various ways to implement abstract data types, including:
- interfaces: separating the interface of an ADT from its implementation;
- generic types: defining a family of ADTs using generic type parameters;
- enumerations: defining an ADT with a small finite set of values;
- global functions operating on an opaque type: rare in Java but common in non-object-oriented languages.
We also discuss subtyping as a relationship between two types determined by their specs, and distinguish it from subclassing, which involves reps as well.
After today’s class, you should be able to:
- define ADTs using classes, interfaces, generics, and enumerations
- determine whether one type is a subtype of another
Interfaces
Java’s interface
is a useful language mechanism for expressing an abstract data type.
An interface in Java is a list of method signatures without method bodies.
A class implements an interface if it declares the interface in its implements
clause, and provides method bodies for all of the interface’s methods.
So one way to define an abstract data type in Java is as an interface, with its implementation as a class implementing that interface.
One advantage of this approach is that the interface specifies the contract for the client and nothing more. The interface is all a client programmer needs to read to understand the ADT. The client can’t create inadvertent dependencies on the ADT’s rep, because fields can’t be put in an interface at all. The implementation is kept well and truly separated, in a different class altogether.
Another advantage is that multiple different representations of the abstract data type can coexist in the same program, as different classes implementing the interface.
When an abstract data type is represented just as a single class, without an interface, it’s harder to have multiple representations.
In the MyString
example from Abstract Data Types, MyString
was a single class.
We explored two different representations for MyString
, but we couldn’t have both representations for the ADT in the same program.
Java’s static type checking allows the compiler to catch many mistakes in implementing an ADT’s contract. For instance, it is a compile-time error to omit one of the required methods, or to give a method the wrong return type. Unfortunately, the compiler doesn’t check for us that the code adheres to the specs of those methods that are written in documentation comments.
About interfaces in Java
Syntactically, a Java interface
contains the spec of an ADT, namely its public method signatures.
Each instance method signature ends with a semicolon.
An interface does not include information about the rep, so it should not declare any instance variables or include instance method bodies*.
Because an interface has no rep, Java doesn’t allow it to be instantiated with new
.
The expression new List<String>()
is a static error.
To make a List
, you need to instead instantiate a class that provides a rep for it, like new ArrayList<String>()
.
List.of()
is a static method in the List
interface.For the same reason, Java interfaces are not allowed to declare constructors.
There is no List()
constructor in the List
interface.
The creator operations of an interface ADT must either be constructors of their implementation classes, like ArrayList()
and LinkedList()
, or static methods like List.of()
**.
reading exercises
Consider this Java interface and Java class, which are intended to represent a curve in the 2D plane:
/** Represents an immutable curve in the plane. */
public interface Curve {
/** make a one-point curve */
/* A */ public Curve(double x, double y);
/** @return true if the point (x,y) lies on this curve */
public boolean contains(double x,y);
/** @return a curve formed by connecting this with that */
/* B */ public ArrayCurve join(Curve that);
}
/** Implementation of Curve. */
public class ArrayCurve implements Curve {
/** make a one-point curve */
public ArrayCurve(double x, double y) { ... }
/** @return a curve formed by connecting this with that */
public ArrayCurve join(Curve that) { ... }
/** extend this curve with a segment to the point (x,y) */
public void add(double x, double y) { ... }
}
The line labeled A
is a problem because Java interfaces can’t have constructors.
(missing explanation)
The line labeled B
is a problem because Curve
mentions ArrayCurve
, but ArrayCurve
also mentions Curve
, which is circular.
(missing explanation)
The line labeled B
is a problem because it isn’t representation-independent.
(missing explanation)
ArrayCurve
doesn’t correctly implement Curve
because it’s missing the contains()
method.
(missing explanation)
ArrayCurve
doesn’t correctly implement Curve
because it includes a method that Curve
doesn’t have.
(missing explanation)
ArrayCurve
doesn’t correctly implement Curve
because ArrayCurve
is mutable while Curve
is immutable.
(missing explanation)
Subtypes
Recall that a type is a set of values.
The Java List
type is defined by an interface.
If we think about all possible List
values, none of them are actually objects of type List
: we cannot create instances of an interface.
Instead, those values are all ArrayList
objects, or LinkedList
objects, or objects of another class that implements List
.
A subtype is simply a subset of the supertype: ArrayList
and LinkedList
are subtypes of List
.
“B is a subtype of A” means “every B is an A.” In terms of specifications: “every B satisfies the specification for A.”
That means B is only a subtype of A if B’s specification is at least as strong as A’s specification. When we declare a class that implements an interface, the Java compiler enforces part of this requirement automatically: for example, it ensures that every method in A appears in B, with a compatible type signature. Class B cannot implement interface A without implementing all of the methods declared in A.
But the compiler cannot check that we haven’t weakened the specification in other ways: strengthening the precondition on some inputs to a method, weakening a postcondition, weakening a guarantee that the interface abstract type advertises to clients. If you declare a subtype in Java — implementing an interface is our current focus — then you must ensure that the subtype’s spec is at least as strong as the supertype’s.
How to declare subtypes in Java
To declare that a class B is a subtype of an interface A, use implements
:
class ArrayList<E> implements List<E> {
...
}
This declaration requires ArrayList
to implement (provide method bodies for) all the method signatures found in List
, with specs at least as strong as the specs in List
.
Java statically checks the method signatures, but a human programmer must check the rest of the spec.
You can also declare that an interface is a subtype of another interface, using extends
:
interface SortedSet<E> extends Set<E> {
...
}
Because SortedSet
is also an interface, it doesn’t provide any implementations of Set
methods.
But it may strengthen the spec of the type, by strengthening existing Set
methods or adding new methods.
- Lesson: Collection Interfaces (1 page)
reading exercises
ArrayList
is mutable, and the type returned by List.of()
is immutable, according to their specs.
Both of these types are legal subtypes of List
.
Assume an immutable element type, like String
, so that it doesn’t enter into consideration.
Pick all that apply:
(missing explanation)
Now consider this (incomplete) spec:
/** ... spec says nothing at all about mutation or mutability ... */
List<String> toUpperCase(List<String> names)
(missing explanation)
/*1*/ List<String> list = new ArrayList<String>();
/*2*/ list = Collections.unmodifiableList(list);
/*3*/ list = List.of("glorp");
/*4*/ list.add("fleeb");
/*5*/ list = new List<String>();
/*6*/ Set<String> set = list; //list has no repeats!
/*7*/ Collection<String> collection = list;
It may help to draw a snapshot diagram as you trace through the code, because the questions ask about the objects that variables point to.
If the answer for some line is “static error” or “dynamic error”, then assume that line is simply commented out so that the rest of the code can still compile and run.
/*1*/ List<String> list = new ArrayList<String>();
list
now points to: (choose most specific answer)
(missing explanation)
/*2*/ list = Collections.unmodifiableList(list);
list
now points to: (choose most specific answer)
(missing explanation)
/*3*/ list = List.of("glorp");
list
now points to: (choose most specific answer)
(missing explanation)
/*4*/ list.add("fleeb");
list
now points to: (choose most specific answer)
(missing explanation)
/*5*/ list = new List<String>();
list
now points to: (choose most specific answer)
(missing explanation)
/*6*/ Set<String> set = list; //list has no repeats!
The new variable set
now points to: (choose most specific answer)
(missing explanation)
/*7*/ Collection<String> collection = list;
The new variable collection
now points to: (choose most specific answer)
(missing explanation)
Let’s define an interface for rectangles:
/** An immutable rectangle. */
public interface ImmutableRectangle {
/** @return the width of this rectangle */
public int getWidth();
/** @return the height of this rectangle */
public int getHeight();
}
We would like to say that every square is a rectangle:
/** An immutable square. */
public class ImmutableSquare implements ImmutableRectangle {
private final int side;
/** Make a new side x side square. */
public ImmutableSquare(int side) { this.side = side; }
/** @return the width of this square */
public int getWidth() { return side; }
/** @return the height of this square */
public int getHeight() { return side; }
}
Does ImmutableSquare.getWidth()
satisfy the spec of ImmutableRectangle.getWidth()
?
Does ImmutableSquare.getHeight()
satisfy the spec of ImmutableRectangle.getHeight()
?
Does the whole ImmutableSquare
spec satisfy the ImmutableRectangle
spec?
(missing explanation)
Now let’s look at mutable versions of rectangles and squares.
/** A mutable rectangle. */
public interface MutableRectangle {
// ... same methods as ImmutableRectangle above ...
/** Set this rectangle's dimensions to width x height. */
public void setSize(int width, int height);
}
Surely every square is still a rectangle?
/** A mutable square. */
public class MutableSquare implements MutableRectangle /* hopefully? */ {
private int side;
// ... same constructor and methods as ImmutableSquare above ...
// TODO implement setSize(..)
}
For each possible MutableSquare.setSize(..)
spec below, is it a valid strengthening of MutableRectangle.setSize()
?
/**
* Set this square's dimensions to width x height.
* Requires width = height.
*/
public void setSize(int width, int height) { ... }
(missing explanation)
/**
* Set this square's dimensions to width x height.
* @throws BadSizeException if width != height
*/
public void setSize(int width, int height) throws BadSizeException { ... }
(missing explanation)
/**
* If width = height, set this square's dimensions to width x height.
* Otherwise, new dimensions are unspecified.
*/
public void setSize(int width, int height) { ... }
(missing explanation)
/**
* Set this square's dimensions to side x side.
*/
public void setSize(int side) { ... }
(missing explanation)
Example: MyString
Let’s revisit MyString
.
Using an interface instead of a class for the ADT, we can support multiple implementations:
/**
* MyString represents an immutable sequence of characters.
*/
public interface MyString {
// We'll skip this creator operation for now
// /**
// * @param b a boolean value
// * @return string representation of b, either "true" or "false"
// */
// public static MyString valueOf(boolean b) { ... }
/**
* @return number of characters in this string
*/
public int length();
/**
* @param i character position (requires 0 <= i < string length)
* @return character at position i
*/
public char charAt(int i);
/**
* Get the substring between start (inclusive) and end (exclusive).
* @param start starting index
* @param end ending index. Requires 0 <= start <= end <= string length.
* @return string consisting of charAt(start)...charAt(end-1)
*/
public MyString substring(int start, int end);
}
We’ll skip the static valueOf
method and come back to it in a minute.
Instead, let’s go ahead using a different technique from our toolbox of ADT concepts in Java: constructors.
Here’s our first implementation:
public class SimpleMyString implements MyString {
private char[] a;
/**
* Create a string representation of b, either "true" or "false".
* @param b a boolean value
*/
public SimpleMyString(boolean b) {
a = b ? new char[] { 't', 'r', 'u', 'e' }
: new char[] { 'f', 'a', 'l', 's', 'e' };
}
// private constructor, used internally by producer operations
private SimpleMyString(char[] a) {
this.a = a;
}
@Override public int length() { return a.length; }
@Override public char charAt(int i) { return a[i]; }
@Override public MyString substring(int start, int end) {
char[] subArray = new char[end - start];
System.arraycopy(this.a, start, subArray, 0, end - start);
return new SimpleMyString(subArray);
}
}
And here’s the optimized implementation:
public class FastMyString implements MyString {
private char[] a;
private int start;
private int end;
/**
* Create a string representation of b, either "true" or "false".
* @param b a boolean value
*/
public FastMyString(boolean b) {
a = b ? new char[] { 't', 'r', 'u', 'e' }
: new char[] { 'f', 'a', 'l', 's', 'e' };
start = 0;
end = a.length;
}
// private constructor, used internally by producer operations.
private FastMyString(char[] a, int start, int end) {
this.a = a;
this.start = start;
this.end = end;
}
@Override public int length() { return end - start; }
@Override public char charAt(int i) { return a[start + i]; }
@Override public MyString substring(int start, int end) {
return new FastMyString(this.a, this.start + start, this.start + end);
}
}
Compare these classes to the implementations of
MyString
in Abstract Data Types. Notice how the code that previously appeared in staticvalueOf
methods now appears in the constructors, slightly changed to refer to the rep ofthis
.Also notice the use of
@Override
. This annotation informs the compiler that the method must have the same signature as one of the methods in the interface we’re implementing. But since the compiler already checks that we’ve implemented all of the interface methods, the primary value of@Override
here is for readers of the code: it tells us to look for the spec of that method in the interface. Repeating the spec wouldn’t be DRY, but saying nothing at all makes the code harder to understand.And notice that we have added a private constructor for producers like
substring(..)
to use to make new instances of the class. Its arguments are the rep fields. We didn’t have to write this special constructor before, because Java provides an empty constructor by default when we don’t declare any other constructors. Adding the constructors that takeboolean b
means we have to declare another constructor explicitly for producer operations to use.
How will clients use this ADT? Here’s an example:
MyString s = new FastMyString(true);
System.out.println("The first character is: " + s.charAt(0));
This code looks very similar to the code we write to use the Java collections classes:
List<String> s = new ArrayList<String>();
...
Unfortunately, this pattern breaks the abstraction barrier we’ve worked so hard to build between the abstract type and its concrete representations. Clients must know the name of the concrete representation class. Because interfaces in Java cannot contain constructors, they must directly call one of the concrete class’ constructors. The spec of that constructor won’t appear anywhere in the interface, so there’s no static guarantee that different implementations will even provide the same constructors.
Fortunately, (as of Java 8) interfaces are allowed to contain static methods, so we can implement the creator operation valueOf
as a static factory method in the interface MyString
:
public interface MyString {
/**
* @param b a boolean value
* @return string representation of b, either "true" or "false"
*/
public static MyString valueOf(boolean b) {
return new FastMyString(b);
}
// ...
Now a client can use the ADT without breaking the abstraction barrier:
MyString s = MyString.valueOf(true);
System.out.println("The first character is: " + s.charAt(0));
Hiding the implementation completely is a tradeoff, because sometimes the client wants a choice of implementations with different characteristics.
That’s why ArrayList
and LinkedList
are exposed by the Java library, because they vary in performance of operations like get()
and insert()
.
reading exercises
Let’s review the code for FastMyString
.
Which of these are useful criticisms:
The abstraction function should be documented
(missing explanation)
The representation invariant should be documented
(missing explanation)
The rep fields should be final
so they cannot be reassigned
(missing explanation)
The private constructor should be public so clients can use it to construct their own arbitrary strings
(missing explanation)
The charAt
specification should not expose that the rep contains individual characters
(missing explanation)
The charAt
implementation should behave more helpfully when i
is greater than the length of the string
(missing explanation)
Why interfaces?
Interfaces are used pervasively in real Java code. Not every class is associated with an interface, but there are a few good reasons to bring an interface into the picture.
- Documentation for both the compiler and for humans. Not only does an interface help the compiler catch ADT implementation bugs, but it is also much more useful for a human to read than the code for a concrete implementation. Such an implementation intersperses ADT-level types and specs with implementation details.
- Allowing performance trade-offs. Different implementations of the ADT can provide methods with very different performance characteristics. Different applications may work better with different choices, but we would like to code these applications in a way that is representation-independent. From a correctness standpoint, it should be possible to drop in any new implementation of a key ADT with simple, localized code changes.
- Methods with intentionally underdetermined specifications. An ADT for finite sets could leave unspecified the element order one gets when converting to a list. Some implementations might use slower method implementations that manage to keep the set representation in some sorted order, allowing quick conversion to a sorted list. Other implementations might make many methods faster by not bothering to support conversion to sorted lists.
- Multiple views of one class. A Java class may implement multiple interfaces. For instance, a user interface widget displaying a drop-down list is natural to view as both a widget and a list. The class for this widget could implement both interfaces. In other words, we don’t always implement an ADT multiple times just because we are choosing different data structures. Those implementations may be various different sorts of objects that can be seen as special cases of the ADT.
- More and less trustworthy implementations. Another reason to implement an interface more than once might be that it is easy to build a simple implementation that you believe is correct, while you can work harder to build a fancier version that is more likely to contain bugs. You can choose implementations for applications based on how bad it would be to get bitten by a bug.
reading exercises
Suppose you have an abstract data type for rational numbers, which is currently represented as a Java class:
public class Rational {
...
}
You decide to change Rational
to a Java interface instead, along with an implementation class called IntFraction
:
public interface Rational {
...
}
public class IntFraction implements Rational {
...
}
For each piece of code below, taken from your original Rational
class, first identify the role it plays in the ADT. Then decide where it should go in your new ADT design: in the interface Rational
, in the implementation class IntFraction
, or both?
Subclassing
Implementing an interface is one way to make a subtype of another type. Another technique, which you may have seen before, is subclassing. Subclassing means defining a class as an extension, or subclass, of another class, so that:
- the subclass automatically inherits the instance methods of its superclass, including their method bodies
- the subclass can override any of those inherited method bodies by substituting its own method body
- the subclass also inherits the fields of its superclass
- the subclass can also add new instance methods and fields for its own purposes
The boldfaced parts in the list above show how subclassing differs from implementing an interface. An interface has neither method bodies nor fields, so it’s not possible to inherit them when you implement an interface.
Subclassing is a common feature of object-oriented programming languages.
You may have already encountered it in Python, where you can write class SpottedTurtle(Turtle): ...
to define a new class SpottedTurtle
as a subclass of the class Turtle
.
Java provides the same effect with the syntax class SpottedTurtle extends Turtle { ... }
.
Just like implementing an interface, subclassing should imply subtyping.
If SpottedTurtle
is a subclass of Turtle
, then SpottedTurtle
needs to have at least as strong a spec as Turtle
, because Java will allow SpottedTurtle
objects to be used anywhere that a Turtle
might be expected.
But unlike implementing an interface, a subclass inherits not only the spec of its superclass but also its rep – the fields that implement it. At first this seems like a great idea. Reusing implementation ought to make our code more DRY, and more ready for change.
But years of experience by API designers have revealed subtle challenges to making subclassing safe from bugs and ready for change. You can read more if you’re interested, but essentially, inheriting the rep means:
- rep exposure between the superclass and all its subclasses
- rep dependence between superclass and subclasses
- superclass and subclass can inadvertently break each other’s rep invariants
Designing for safe subclassing means that the superclass must now offer two contracts: one for interaction with clients, and one for subclasses. These issues simply do not arise with interfaces.
Overriding and dynamic dispatch
In one respect, however, we can’t get away from thinking about subclassing when we write classes in Java.
Every Java class is automatically a subclass of Object
, and automatically inherits methods like toString()
, equals()
, and hashCode()
.
We often need to override these inherited implementations, replacing them with our own implementations, in order for our class to behave correctly.
To illustrate, let’s look at a simpler case: overriding the toString
method for our FastMyString
class.
The default toString()
provided by Object
is not particularly useful for debugging:
FastMyString fms = new FastMyString(true); // recall that this represents the 4-character string "true"
fms.toString() → "FastMyString@504bae78" // not useful! just the class name followed by the object's address in memory
When a method has multiple implementations, such as when a class overrides a method like toString()
that already has an implementation in its superclass Object
, then Java has to determine which of the implementations to use when the method is called.
This process is called dispatching to the method, and Java’s rule is dynamic dispatch – it uses the implementation appropriate to the dynamic type of the object, not the static type of the reference that points to the object.
Let’s make this better by overriding toString()
:
public class FastMyString {
...
@Override
public String toString() {
String s = "";
for (int i = 0; i < this.length(); ++i) {
s += this.charAt(i);
}
return s;
}
}
FastMyString fms = new FastMyString(true);
fms.toString() → "true"
In this example, FastMyString
is the type of both the variable fms
and the object it points to.
But they don’t have to have to be the same type.
The variable can be a supertype, while the object itself is from a subtype.
We have already done this when we used a variable of type MyString
to hold a reference to FastMyString
.
But because Object
is a supertype of all classes in Java, we can also do this:
Object obj = new FastMyString(true);
obj.toString() → ?
But now there seems to be an ambiguity.
Object
has its own implementation of toString()
.
When we call obj.toString()
, does it call the original Object
implementation, or the overridden FastMyString
implementation?
Do we get "FastMyString@504bae78"
or "true"
?
The dynamic dispatch rule says that the type of the object at runtime determines the method implementation to call.
This type is not known until runtime, hence the word “dynamic.”
The type of the obj
variable, by contrast, is known at compile time.
Using dynamic dispatch, here is what we get:
Object obj = new FastMyString(true);
obj.toString() → "true"
…the same as we would get from fms.toString()
.
The effect of the dynamic dispatch rule is that it doesn’t matter what type of reference you have pointing to an object, it will always use the object’s type for method dispatch.
reading exercises
List<String> list = new ArrayList<>(List.of("abc"));
Object obj = list;
/* next expression here */
What does each of the following do, if substituted for /* next expression here */
in the code above?
(missing explanation)
(missing explanation)
(missing explanation)
(missing explanation)
Generic types
A useful feature of the Java type system, and other modern statically-typed languages too, is a generic type: a type whose specification is in terms of a placeholder type to be filled in later.
Java’s collection classes are good examples of this.
Set<E>
is the ADT of finite sets of elements of some other type E
.
Instead of writing separate specifications and implementations for Set<String>
, Set<Integer>
, and so on, we can design one interface Set<E>
, which represents an entire family of set ADTs.
Here is a simplified version of the Set
interface:
/**
* A mutable set.
* @param <E> type of elements in the set
*/
public interface Set<E> {
Let’s start with a creator operation:
// example creator operation
/**
* Make an empty set.
* @param <F> type of elements in the set
* @return a new set instance, initially empty
*/
public static <F> Set<F> make() { ... }
The make
operation is implemented as a static factory method.
Clients will write code like:
Set<String> strings = Set.make();
and the compiler will understand that the new Set
is a set of String
objects.
(An obscure Java syntax requirement arises in the method signature above: a static method like make
needs to independently declare its type parameter at the start of the signature, here shown as public static <F>
.
We called it F
instead of E
just to highlight that it’s a different type parameter.
The instance methods below don’t have this extra <...>
declaration because they always use the type parameter declared by the enclosing class or interface, public interface Set<E>
.)
// example observer operations
/**
* Get size of the set.
* @return the number of elements in this set
*/
public int size();
/**
* Test for membership.
* @param e an element
* @return true iff this set contains e
*/
public boolean contains(E e);
Next we have two observer methods.
Notice how the specs are in terms of our abstract notion of a set; it would be malformed to mention the details of any particular implementation of sets with particular private fields.
These specs should apply to any valid implementation of the Set
ADT.
// example mutator operations
/**
* Modifies this set by adding e to the set.
* @param e element to add
*/
public void add(E e);
/**
* Modifies this set by removing e, if found.
* If e is not found in the set, has no effect.
* @param e element to remove
*/
public void remove(E e);
The story for these mutators is basically the same as for the observers. We still write specs at the level of our abstract model of sets.
reading exercises
Implementing generic interfaces
Suppose we want to implement the generic Set<E>
interface above.
We can either write a non-generic implementation that replaces E
with a specific type, or a generic implementation that keeps the placeholder.
Generic interface, non-generic implementation.
Let’s implement Set<E>
for a particular type E
.
We’ll implement a set of characters:
public class CharSet implements Set<Character>
Wherever the interface mentions placeholder type E
, the CharSet
implementations replace E
with Character
.
For example:
Note that this representation for CharSet
is not suited for representing sets of arbitrary-type elements.
A String
rep cannot represent a Set<Integer>
without careful work to define a new rep invariant and abstraction function that can handle multi-digit numbers.
Generic interface, generic implementation.
We can also implement the generic Set<E>
interface without picking a type for E
.
In that case, we write our code blind to the actual type that clients will choose for E
.
Java’s HashSet
does that for Set
.
Its declaration looks like:
|
|
A generic implementation can only rely on details of a placeholder type E
that are explicitly included in the generic interface specification.
We’ll see, in a future reading about object equality, that HashSet
relies on E
properly implementing certain methods of Object
, specifically hashCode
and equals
.
Every type in Java is automatically a subtype of Object
, so HashSet
can rely on these methods being available.
But HashSet
can’t call methods on its elements that are only declared for some specific type like String
, because E
might be any type.
reading exercises
Consider again our Set
interface:
public interface Set<E> {
public boolean contains(E e);
// ...
}
What is wrong (if anything) with the following pieces of code that try to create new ADTs based on Set
?
/** Represents a set of integers. */
/*1*/ public class IntSet implements Set<Integer> {
/*2*/ public boolean contains(E e) { ... }
// ...
}
(missing explanation)
/** Represents a set that can grow but never shrink. */
/*1*/ public interface GrowingSet<Z> extends Set<E> {
/*2*/ public boolean contains(Z z);
// ...
}
(missing explanation)
/** Represents a set whose for-iteration returns the elements in the order they were added. */
/*1*/ public class OrderedSet<E> implements Set<E> {
/*2*/ public boolean contains(E e) { ... }
// ...
}
(missing explanation)
Consider another possible operation on our Set
interface:
public interface Set<E> {
/**
* Picks an element from the set.
* @return an element in the set
* @throws NoSuchElementException if set is empty
*/
public E pick() throws NoSuchElementException;
// ...
}
This operation is underdetermined – it may have many possible legal return values for a particular set.
Suppose SimpleSet
implements Set
, and wants to write its own spec for pick
:
public class SimpleSet<E> implements Set<E> {
private final List<E> elementList = new ArrayList<>();
// ...
/**
* <new spec here>
*/
@Override
public E pick() throws NoSuchElementException {
...
}
}
(missing explanation)
(missing explanation)
(missing explanation)
In the design of the Java collection classes, in order to use the built-in Java Set
interface, clients must also know about a specific implementation class like HashSet
in order to make a new mutable Set
instance:
Set<String> set = new HashSet<String>();
What if we instead wanted Set
to have its own creator operation empty()
, so that clients didn’t have to know about the existence of HashSet
.
Then creating a set might look like:
Set<String> set = Set.empty();
(missing explanation)
(missing explanation)
Enumerations
Sometimes an ADT has a small, finite set of immutable values, such as:
- months of the year: January, February, …
- days of the week: Monday, Tuesday, …
- compass points: north, south, east, west
- line caps in a line drawing: butt, round, square
A type like this may be used as part of a more complex type (like DateTime
or Latitude
), or as a parameter that changes the behavior of a method (like drawLine
).
When the set of values is small and finite, it makes sense to define all the values as named constants, called an enumeration.
Java has the enum
construct to make this convenient:
public enum Month { JANUARY, FEBRUARY, MARCH, ..., DECEMBER };
This enum
defines a new type name, Month
, in the same way that class
and interface
define new type names.
It also defines a set of named values, which we write in all-caps because they are effectively public static final
constants.
So you can now write:
Month thisMonth = MARCH;
This idea is called an enumeration because you are explicitly listing the elements of the set, and Java is assigning numbers to them as their default rep values.
In the simplest use case for enumerations, the only operation you need is testing equality between values:
if (day.equals(SATURDAY) || day.equals(SUNDAY)) {
System.out.println("It's the weekend");
}
You may see code like this as well, which uses ==
instead of equals()
:
if (day == SATURDAY || day == SUNDAY) {
System.out.println("It's the weekend");
}
If the day were represented as a String
, this code would be very unsafe, because ==
tests whether two expressions refer to the same object at the same memory location, which may or may not be true for two arbitrary strings "Saturday"
.
That’s why we always use equals()
for comparing objects.
But one of the advantages of using an enumeration type is that there is only ever one object in memory representing each value of the enumeration, and there is no way for a client to create more (no constructors!)
So ==
is just as good as equals()
for an enumeration.
In fact ==
is more fail-fast, because it does a static check that both sides have the same enum type, whereas equals()
delays that check until runtime.
In that sense, using an enumeration can feel like you’re using primitive int
constants.
Java even supports using them in switch
statements (which otherwise only allow primitive integer types and their wrappers, and String
, but not other objects):
switch (direction) {
case NORTH: return "polar bears";
case SOUTH: return "penguins";
case EAST: return "elephants";
case WEST: return "llamas";
}
But unlike int
values, enumerations have more static checking:
Month firstMonth = MONDAY; // static error: MONDAY has type DayOfWeek, not type Month
Also unlike primitive values, a variable of enum
type can be null
, which we must avoid the same way we do with other object types.
An enum
declaration can contain all the usual fields and methods that a class
can.
So you can define additional operations for the ADT, and define your own rep as well.
Here is an example that has a rep, an observer, and a producer:
public enum Month {
// the values of the enumeration, written as calls to the private constructor below
JANUARY(31),
FEBRUARY(28),
MARCH(31),
APRIL(30),
MAY(31),
JUNE(30),
JULY(31),
AUGUST(31),
SEPTEMBER(30),
OCTOBER(31),
NOVEMBER(30),
DECEMBER(31);
// rep
private final int daysInMonth;
// enums also have an automatic, invisible rep field:
// private final int ordinal;
// which takes on values 0, 1, ... for each value in the enumeration.
// rep invariant:
// daysInMonth is the number of days in this month in a non-leap year
// abstraction function:
// AF(ordinal,daysInMonth) = the (ordinal+1)th month of the Gregorian calendar
// safety from rep exposure:
// all fields are private, final, and have immutable types
// Make a Month value. Not visible to clients, only used to initialize the
// constants above.
private Month(int daysInMonth) {
this.daysInMonth = daysInMonth;
}
/**
* @param isLeapYear true iff the year under consideration is a leap year
* @return number of days in this month in a normal year (if !isLeapYear)
* or leap year (if isLeapYear)
*/
public int getDaysInMonth(boolean isLeapYear) {
if (this == FEBRUARY && isLeapYear) {
return daysInMonth+1;
} else {
return daysInMonth;
}
}
/**
* @return first month of the semester after this month
*/
public Month startOfNextSemester() {
switch (this) {
case JANUARY:
return FEBRUARY;
case FEBRUARY: // cases with no break or return
case MARCH: // fall through to the next case
case APRIL:
case MAY:
return JUNE;
case JUNE:
case JULY:
case AUGUST:
return SEPTEMBER;
case SEPTEMBER:
case OCTOBER:
case NOVEMBER:
case DECEMBER:
return JANUARY;
default:
throw new RuntimeException("can't get here");
}
}
}
All enum
types also have some automatically-provided operations, defined by Enum
:
ordinal()
is the index of the value in the enumeration, soJANUARY.ordinal()
returns 0.compareTo()
compares two values based on their ordinal numbers.name()
returns the name of the value’s constant as a string, e.g.JANUARY.name()
returns"JANUARY"
.toString()
has the same behavior asname()
.
reading exercises
Consider these three alternative ways to name the semester you’re registering for:
startRegistrationFor("Fall", 2023);
public static final String FALL = "Fall";
...
startRegistrationFor(FALL, 2023);
public enum Semester { IAP, SPRING, SUMMER, FALL };
...
startRegistrationFor(FALL, 2023);
Which of the following are correctly-stated advantages or disadvantages of each approach?
(missing explanation)
(missing explanation)
(missing explanation)
(missing explanation)
(missing explanation)
ADTs in non-OOP languages
One more way to define an abstract data type is simply as a group of globally-accessible functions that operate on an opaque data type. This pattern is rarely seen in object-oriented languages like Java and Python, but often appears in older programming languages like C. Here is an example of some file I/O in C:
FILE* f = fopen("out.txt", "w"); // open a file for writing
fputs("hello", f); // write to the file
fclose(f); // close the file
In this code, the abstract data type is FILE
, representing an open file.
The functions fopen
, fputs
, and fclose
are operations of the type, taking a FILE
as an argument or returning it as a return value.
fputs
expects the file as its second argument, oddly enough.
Because C is not object-oriented, there is no notion of classes, methods, fields, or even private
, but representation independence is still achieved.
The client has no way to look inside a FILE
value.
The only way to use a FILE
is by passing it to operations provided by the type.
One key takeaway from this is that the notion of an abstract data type does not depend on language features like classes, or interfaces, or private
access control.
Data abstraction is a powerful design pattern that is ubiquitous in software engineering.
ADTs in Java
We’ve now completed our Java toolbox of ADT concepts from the first ADTs reading:
Enum |
||
Constructor |
||
Instance method |
||
Static method |
||
Instance method |
||
Static method |
||
Instance method |
||
Static method |
||
Representation |
Summary
Java interfaces help us formalize the idea of an abstract data type as a set of operations that must be supported by a type.
Safe from bugs. An ADT is defined by its operations, and interfaces do just that. When clients use an interface type, static checking ensures that they only use methods defined by the interface. If the implementation class exposes other methods — or worse, has visible representation — the client can’t accidentally see or depend on them. When we have multiple implementations of a data type, interfaces provide static checking of the method signatures.
Easy to understand. Clients and maintainers know exactly where to look for the specification of the ADT. Since the interface doesn’t contain fields or implementations of instance methods, it’s easier to keep details of the implementation out of the specifications.
Ready for change. We can easily add new implementations of a type by adding classes that implement an interface. If we avoid constructors in favor of static factory methods, clients will only see the interface. That means we can switch which implementation class clients are using without changing their code at all.
Java enumerations allow defining an ADT with a small finite set of immutable values. Compared to the old-fashioned alternatives of special integer values or special strings, enumerations help make our code:
Safe from bugs. Static checking ensures that clients can’t use values outside the finite set, or confuse two different enumerated types.
Easy to understand. Named constants are not as magical as integer literals, and named types explain themselves better than
int
orString
.Ready for change. The code that uses a particular enumeration is easy to find by searching for the enumeration’s type name, so that it can be reviewed and changed when the enumeration changes. (IDE refactoring tools can even do some of these changes automatically.) Code that depends on a particular set of string or integer constants is much harder to distinguish from code that uses strings and integers for other purposes.