Reading 19: Writing a Program with Abstract Data Types
Software in 6.031
Objectives
After today’s class, you should:
- Understand how to apply specification, testing, and iteration at different levels of software development: from single methods, to abstract data types, to entire programs
Introduction
Bringing together what we’ve learned about abstract data types and recursive data types, this reading demonstrates the development of a recursive ADT to solve a simple problem.
Recipes for programming
Recall the test-first programming approach for
Writing a method:
- Spec. Write the spec, including the method signature (name, argument types, return types, exceptions), and the precondition and the postcondition as a Javadoc comment.
- Test.
Create systematic test cases and put them in a JUnit class so you can run them automatically.
- You may have to go back and change your spec when you start to write test cases. Just the process of writing test cases puts pressure on your spec, because you’re thinking about how a client would call the method. So steps 1 and 2 iterate until you’ve got a better spec and some good test cases.
- Make sure at least some of your tests are failing at first. A test suite that passes all tests even when you haven’t implemented the method is not a good test suite for finding bugs.
- Implement.
Write the body of the method.
You’re done when the tests are all green in JUnit.
- Implementing the method puts pressure on both the tests and the specs, and you may find bugs that you have to go back and fix. So finishing the method may require changing the implementation, the tests, and the specs, and bouncing back and forth among them.
Let’s broaden this to a recipe for
Writing an abstract data type:
- Spec. Write specs for the operations of the datatype, including method signatures, preconditions, and postconditions.
- Test.
Write test cases for the ADT’s operations.
- Again, this puts pressure on the spec. You may discover that you need operations you hadn’t anticipated, so you’ll have to add them to the spec.
- Implement.
For an ADT, this part expands to:
- Choose rep. Write down the private fields of a class, or the variants of a recursive datatype. Write down the rep invariant and abstraction function as a comment.
- Assert rep invariant.
Implement a
checkRep()
method that enforces the rep invariant. This is critically important if the rep invariant is nontrivial, because it will catch bugs much earlier. - Implement operations.
Write the method bodies of the operations, making sure to call
checkRep()
in them. You’re done when the tests are all green in JUnit.
And let’s broaden it further to a recipe for
Writing a program (consisting of ADTs and static methods):
- Choose datatypes. Decide which ones will be mutable and which immutable.
- Choose static methods.
Write your top-level
main
method and break it down into smaller steps. - Spec. Spec out the ADTs and methods. Keep the ADT operations simple and few at first. Only add complex operations as you need them.
- Test. Write test cases for each unit (ADT or method).
- Implement simply first. Choose simple, brute-force representations. The point here is to put pressure on the specs and the tests, and try to pull your whole program together as soon as possible. Make the whole program work correctly first. Skip the advanced features for now. Skip performance optimization. Skip corner cases. Keep a to-do list of what you have to revisit.
- Iterate. Now that it’s all working, make it work better. Reimplement, optimize, redesign if necessary.
Problem: matrix multiplication
Suppose we want to represent matrix multiplications in a way that allows us to optimize the computation.
For example, if a, b are scalar constants, and X is a big matrix, then
is slow to compute because it loops over the matrix X twice: once to multiply it by a, and then again to multiply it by b. It’d be equivalent and cheaper to do:
That’s just one example of an optimization we could make by rearranging the products in a matrix multiplication. (Remember however that matrix multiplication is associative but not commutative; only scalars commute.)
Spec
Choose datatypes
Let’s call this a MatrixExpression
.
To make the definitions easier to read, we’ll abbreviate that to MatExpr
.
Test
Let’s test optimize()
.
Partitions:
- partition on number of scalars in expression: 0, 1, 2, >2
- partition on position of scalar in expression tree: left child, right child, left-child-of-left-child, left-child-of-right-child, right-child-of-left-child, right-child-of-right-child, more than 2 levels deep
Implement
Choose a rep
This problem is a natural one for a recursive datatype.
MatExpr = Identity + Scalar(value:double) + Matrix(array:double[][]) + Product(m1:MatExpr, m2:MatExpr)
/** Represents an immutable expression of matrix and scalar products. */
public interface MatrixExpression {
// Datatype definition:
// MatExpr = Identity + Scalar(value:double)
// + Matrix(array:double[][]) + Product(m1:MatExpr, m2:MatExpr)
// ...
}
class Identity implements MatrixExpression {
public Identity() {
}
}
class Scalar implements MatrixExpression {
private final double value;
// RI: true
// AF(value) = the real scalar represented by value
public Scalar(double value) {
this.value = value;
}
}
class Matrix implements MatrixExpression {
private final double[][] array;
// RI: array.length > 0, and all array[i] are equal nonzero length
// AF(array) = the matrix with array.length rows and array[0].length columns
// whose (row,column) entry is array[row][column]
public Matrix(double[][] array) {
this.array = array; // note: danger!
}
}
class Product implements MatrixExpression {
private final MatrixExpression m1;
private final MatrixExpression m2;
// RI: m1's column count == m2's row count, or m1 or m2 is scalar
// AF(m1, m2) = the matrix product m1*m2
public Product(MatrixExpression m1, MatrixExpression m2) {
this.m1 = m1;
this.m2 = m2;
}
}
Choose an identity
It’s always good to have a value in the datatype that represents nothing, so that we can avoid using null
.
For a matrix product, the natural choice is the identity matrix — an empty product expression is just the identity anyway.
So let’s define that:
/** Identity for all matrix computations. */
public static final MatrixExpression I = new Identity();
Unfortunately, we’ll see that this is not a perfect situation: other MatExprs might also be the identity.
Implementing make
: use factory methods
Let’s start implementing, starting with the make()
creators.
We don’t want to expose our concrete rep classes Scalar
, Matrix
, and Product
, so that clients won’t depend on them and we’ll be able to change them later (being ready for change).
So we’ll create static factory methods in MatrixExpression
to implement make()
:
/** @return a matrix expression consisting of just the scalar value */
public static MatrixExpression make(double value) {
return new Scalar(value);
}
/** @return a matrix expression consisting of just the matrix given */
public static MatrixExpression make(double[][] array) {
return new Matrix(array);
}
Implementing isIdentity
: don’t use instanceof
Now let’s implement the isIdentity
observer.
Here’s a bad way to do it:
// don't do this
public static boolean isIdentity(MatrixExpression m) {
if (m instanceof Scalar) {
return ((Scalar)m).value == 1;
} else if (m instanceof Matrix) {
// ... check for 1s on the diagonal and 0s everywhere else
} else ... // do the right thing for other variant classes
}
In general, using instanceof
in object-oriented programming is a bad smell.
Break the operation down into pieces that are appropriate for each class, and write instance methods instead:
reading exercises
class Matrix implements MatrixExpression {
private final double[][] array;
// RI: array.length > 0, and all array[i] are equal nonzero length
// AF(array) = the matrix with array.length rows and array[0].length columns
// whose (row,column) entry is array[row][column]
...
public boolean isIdentity() {
if (array.length != array[0].length) {
}
for (int row = 0; row < array.length; row++) {
for (int col = 0; col < array[row].length; col++) {
double expected = (row == col) ? 1 : 0;
if (array[row][col] != expected) {
}
}
}
}
}
(missing explanation)
class Product implements MatrixExpression {
private final MatrixExpression m1, m2;
// RI: m1's column count == m2's row count, or m1 or m2 is scalar
// AF(m1, m2) = the matrix product m1*m2
...
public boolean isIdentity() {
return m1.isIdentity() ❓ m2.isIdentity();
}
}
Pick the best operator to replace ❓:
(missing explanation)
Implementing isIdentity
exposes an issue that we should have first discovered by writing test cases: we will not always return true
for a Product
whose value is the identity matrix (e.g. A × A-1).
We probably want to resolve this by weakening the spec of isIdentity
, so that it doesn’t guarantee to identify every multiplicative identity:
Implementing optimize
without instanceof
Let’s implement optimize()
.
Again, here’s a bad way to do it, which will quickly get us mired in weeds:
// don't do this
class Product implements MatrixExpression {
public MatrixExpression optimize() {
if (m1 instanceof Scalar) {
...
} else if (m2 instanceof Scalar) {
...
}
...
}
}
If you find yourself writing code with instanceof
checks all over the place, you need to take a step back and rethink the problem.
In particular, to optimize the scalars, we’re going to need two recursive helper operations, so we’ll add them to our abstract datatype:
These expressions will allow us to pull the scalars out of an expression and move them together in a single multiplication expression.
/** Represents an immutable expression of matrix and scalar products. */
public interface MatrixExpression {
// ...
/** @return the product of all the scalars in this expression */
public MatrixExpression scalars();
/** @return the product of all the matrices in this expression in order.
* times(scalars(), matrices()) is equivalent to this expression. */
public MatrixExpression matrices();
}
Now we’ll implement them as expected:
class Identity implements MatrixExpression {
// no fields
...
public MatrixExpression scalars() { return this; }
public MatrixExpression matrices() { return this; }
}
class Scalar implements MatrixExpression {
private final double value;
...
public MatrixExpression scalars() { return this; }
public MatrixExpression matrices() { return I; }
}
class Matrix implements MatrixExpression {
private final double[][] array;
...
public MatrixExpression scalars() { return I; }
public MatrixExpression matrices() { return this; }
}
class Product implements MatrixExpression {
private final MatrixExpression m1, m2;
...
public MatrixExpression scalars() {
return times(m1.▶▶A◀◀, m2.▶▶B◀◀);
}
public MatrixExpression matrices() {
return times(m1.▶▶C◀◀, m2.▶▶D◀◀);
}
}
Recall that MatrixExpression.I
is the constant we defined to represent the identity matrix.
The implementations of Scalar.matrices()
and Matrix.scalars()
return I
to represent an empty product.
We could also have returned new Identity()
here, but there is no need to create a new object when we already have an immutable constant I
.
reading exercises
With these helper functions, optimize()
can just separate the scalars and the matrices:
class Identity implements MatrixExpression {
...
public MatrixExpression optimize() { return this; }
}
class Scalar implements MatrixExpression {
...
public MatrixExpression optimize() { return this; }
}
class Matrix implements MatrixExpression {
...
public MatrixExpression optimize() { return ▶▶A◀◀; }
}
class Product implements MatrixExpression {
...
public MatrixExpression optimize() {
return times(▶▶B◀◀);
}
}
Summary
This reading was a case study of developing a recursive data type, so it applies ideas we have already seen – including static typing, testing, specifications, immutability, and interfaces – in support of the three properties of good software.
More practice
If you would like to get more practice with the concepts covered in this reading, you can visit the question bank. The questions in this bank were written in previous semesters by students and staff, and are provided for review purposes only – doing them will not affect your classwork grades.