Reading 9: Avoiding Debugging

Software in 6.031

Safe from bugs	Easy to understand	Ready for change
Correct today and correct in the unknown future.	Communicating clearly with future programmers, including future you.	Designed to accommodate change without rewriting.

Objectives

The topic of today’s class is debugging. More precisely, we’re going to look at ways to write code that either avoids debugging entirely, or at least makes it easy when we have to do it.

First defense: make bugs impossible

The best defense against bugs is to make them impossible by design.

One way that we’ve already talked about is static checking. Static checking eliminates many bugs by catching them at compile time.

We also saw some examples of dynamic checking in earlier readings. For example, Python makes array overflow bugs impossible by catching them dynamically. If you try to use an index outside the bounds of a list, then Python automatically produces an error. TypeScript is not as good, because it quietly returns undefined from a bad read. The situation is even worse in older languages like C and C++, which silently allow bad writes beyond the bounds of a fixed-size array, which leads to bugs and security vulnerabilities.

Immutability is another design principle that prevents bugs.

String is an immutable type. There are no methods that you can call on a string that will change the sequence of characters that it represents. Strings can be passed around and shared without fear that they will be modified by other code.

TypeScript also gives us unreassignable references: variables declared with the keyword const, which can be assigned once but never reassigned. It’s good practice to use const wherever possible when declaring local variables. Like the type of the variable, these declarations are important documentation, which is useful to the reader of the code and statically checked by the compiler.

Consider this example:

const letters: Array<string> = ['a', 'e', 'i', 'o', 'u'];

The letters variable is declared const, but is it really unchanging? Which of the following statements will be illegal (caught statically by the compiler), and which will be allowed?

letters = ['x', 'y', 'z']; 
letters[0] = 'z';

You’ll find the answers in the exercise below. Be careful about what const means! It only makes the reference unreassignable, but the object that the reference points to may be mutable.

reading exercises

Unreassignable variables, immutable objects

Consider this (legal) TypeScript code, executed in order:

let text0: string = 'a';
const text1: string = text0;

let text2: string = text1 + "eiou";
const text3: string = text2;

let text4: Array<string> = [text0, 'e', 'i', 'o', 'u'];
const text5: Array<string> = text4;

Which of the following statements are also legal TypeScript (i.e. produce no compiler error if placed after the code above)?

(missing explanation)

Afterwards

After all the legal statements in the previous exercise’s answers are executed:

let text0: string = 'a';
const text1: string = text0;

let text2: string = text1 + "eiou";
const text3: string = text2;

let text4: Array<string> = [text0, 'e', 'i', 'o', 'u'];
const text5: Array<string> = text4;

text0 = 'y';
text2 = "uoie" + text1;
text4 = text5;
text4[0] = 'x';
text5[0] = 'z';

… what is the resulting value of each variable? Write just the sequence of letters found in the variable’s value, with no punctuation or spaces. For example:

text0

(missing explanation)

text1

(missing explanation)

text2

(missing explanation)

text3

(missing explanation)

text4

(missing explanation)

text5

(missing explanation)

Second defense: localize bugs

If we can’t prevent bugs, we can try to localize them to a small part of the program, so that we don’t have to look too hard to find the cause of a bug. When localized to a single method or small module, bugs may be found simply by studying the program text.

We already talked about fail fast: the earlier a problem is observed (the closer to its cause), the easier it is to fix.

Let’s begin with a simple example:

/**
 * @param x  requires x >= 0
 * @returns approximation to square root of x
 */
function sqrt(x: number): number { ... }

Now suppose somebody calls sqrt with a negative argument. What’s the best behavior for sqrt? Since the caller has failed to satisfy the requirement that x should be nonnegative, sqrt is no longer bound by the terms of its contract, so it is technically free to do whatever it wants: return an arbitrary value, or enter an infinite loop, or melt down the CPU. Since the bad call indicates a bug in the caller, however, the most useful behavior would point out the bug as early as possible. We do this by inserting a runtime check of the precondition. Here is one way we might write that check:

/**
 * @param x  requires x >= 0
 * @returns approximation to square root of x
 */
function sqrt(x: number): number { 
    if (! (x >= 0)) throw new Error("required x >= 0, but was: " + x);
    ...
}

When the precondition is not satisfied, this code terminates the program by throwing an error. The effects of the caller’s bug are prevented from propagating.

Checking preconditions is an example of defensive programming. Real programs are rarely bug-free. Defensive programming offers a way to mitigate the effects of bugs even if you don’t know where they are.

Assertions

It is common practice to define a function for these kinds of defensive checks, usually called assert. This approach abstracts away from what exactly happens when the assertion fails. The failed assertion might exit; it might record an event in a log file; it might email a report to a maintainer.

JavaScript/TypeScript does not have a built-in assert, but we will be using Node’s assert package in 6.031. The simplest form of assertion takes a boolean expression and throws AssertionError if the boolean expression evaluates to false:

assert(x >= 0);

Assertions have the added benefit of documenting an assumption about the state of the program at that point. To somebody reading your code, assert(x >= 0) says “at this point, it should always be true that x >= 0.” Unlike a comment, however, an assertion is executable code that enforces the assumption at runtime.

An assertion may also include a string message, which is printed when the assertion fails. This can be used to provide additional details to the programmer about the cause of the failure. For example:

assert(x >= 0, "x is " + x);

If x === -1, then this assertion fails with the error message

x is -1

along with a stack trace that tells you where the assertion was found in your code and the sequence of calls that brought the program to that point. This information is often enough to get started in finding the bug.

One thing to note about assertions is that, in many languages (such as Java), they are off by default. This means that if you just run your program as usual, none of your assertions will be checked! One reason why a language designer might choose to do this is because checking assertions can sometimes be costly to performance. For example, a function that searches an array using binary search has a requirement that the array be sorted. Asserting this requirement requires scanning through the entire array, however, turning an operation that should run in logarithmic time into one that takes linear time. You should be willing (eager!) to pay this cost during testing, since it makes debugging much easier, but not after the program is released to users. For most applications, however, assertions are not expensive compared to the rest of the code, and the benefit they provide in bug-checking is worth that small cost in performance.

In languages which have assertions off by default, make sure to enable them explicitly. In Java, you can do this by passing -ea (which stands for enable assertions) to the Java virtual machine. The Node assert library that we are using in 6.031 always runs its assert statements, so there is no need to worry about explicitly turning on assertions.

What to assert

Here are some things you should assert:

Method argument requirements, like we saw for sqrt above.

Method return value requirements. This kind of assertion is sometimes called a self check. For example, the sqrt method might square its result to check whether it is reasonably close to x:

function sqrt(x: number): number {
    assert(x >= 0);
    let r: number;
    ... // compute result r
    assert(Math.abs(r*r - x) < .0001);
    return r;
}

When should you write runtime assertions? As you write the code, not after the fact. When you’re writing the code, you have the invariants in mind. If you postpone writing assertions, you’re less likely to do it, and you’re liable to omit some important invariants.

What not to assert

Runtime assertions are not free. They can clutter the code, so they must be used judiciously. Avoid trivial assertions, just as you would avoid uninformative comments. For example:

// don't do this:
x = y + 1;
assert(x === y+1);

This assertion doesn’t find bugs in your code. It finds bugs in the compiler or the JavaScript interpreter, which are components that you should trust until you have good reason to doubt them. If an assertion is obvious from its local context, leave it out.

Never use assertions to test conditions that are external to your program, such as the existence of files, the availability of the network, or the correctness of input typed by a human user. Assertions test the internal state of your program to ensure that it is within the bounds of its specification. When an assertion fails, it indicates that the program has run off the rails in some sense, into a state in which it was not designed to function properly. Assertion failures therefore indicate bugs. External failures are not bugs, and there is no change you can make to your program in advance that will prevent them from happening. External failures should be handled using exceptions instead.

Many assertion mechanisms are designed so that assertions are executed only during testing and debugging, and turned off when the program is released to users. (As mentioned above, this is not the case with Node assertions, which can’t be disabled; but Java assertions are actually disabled by default.) Since assertions are sometimes disabled, it’s good practice to ensure that the correctness of your program does not depend on whether or not the assertion expressions are executed. In particular, asserted expressions should not have side-effects. For example, if you want to pop an element from a list and assert that the list was non-empty before you popped, don’t write it like this:

// don't do this:
assert(list.pop());

If assertions are disabled, the entire expression is skipped, and no item is popped from the list. Write it like this instead:

const found = list.pop();
assert(found);

Similarly, if a conditional statement or switch does not cover all the possible cases, it is good practice to use a check to block the illegal cases.

switch (vowel) {
  case 'a':
  case 'e':
  case 'i':
  case 'o':
  case 'u': return "A";
  default: assert.fail("should never get here");
}

The default clause has the effect of asserting that vowel must be one of the five vowel letters.

reading exercises

Assertions

Consider this (incomplete) function:

/**
 * Solves quadratic equation ax^2 + bx + c = 0.
 * 
 * @param a quadratic coefficient, requires a != 0
 * @param b linear coefficient
 * @param c constant term
 * @returns a list of the distinct real roots of the equation
 */
function quadraticRoots(a: number, b: number, c: number): Array<number>  {
    const roots: Array<number> = [];
    // A
    ... // compute roots        
    // B
    return roots;
}

What statements would be reasonable to write at position A?

assert(a !== 0);

assert(b !== 0);

assert(c !== 0);

assert(roots.length >= 0);

assert(roots.length <= 2);

for (const x of roots) { assert(Math.abs(a*x*x + b*x + c) < 0.0001); }

What statements would be reasonable to write at position B?

assert(a !== 0);

assert(b !== 0);

assert(c !== 0);

assert(roots.length >= 0);

assert(roots.length <= 2);

for (const x of roots) { assert(Math.abs(a*x*x + b*x + c) < 0.0001); }

(missing explanation)

Incremental development

A great way to localize bugs to a tiny part of the program is incremental development. Build only a bit of your program at a time, and test that bit thoroughly before you move on. That way, when you discover a bug, it’s more likely to be in the part that you just wrote, rather than anywhere in a huge pile of code.

Our class on testing talked about two techniques that help with this:

Unit testing: when you test a module in isolation, you can be confident that any bug you find is in that unit – or maybe in the test cases themselves.
Regression testing: when you’re adding a new feature to a big system, run the regression test suite as often as possible. If a test fails, the bug is probably in the code you just changed.

Modularity & encapsulation

You can also localize bugs by better software design.

Modularity means dividing up a system into components, or modules, each of which can be designed, implemented, tested, reasoned about, and reused separately from the rest of the system. The opposite of a modular system is a monolithic system – big and with all of its pieces tangled up and dependent on each other.

A program consisting of a single, very long function is monolithic – harder to understand, and harder to isolate bugs in. By contrast, a program broken up into small functions and classes is more modular.

Encapsulation means building walls around a module so that the module is responsible for its own internal behavior, and bugs in other parts of the system can’t damage its integrity.

One kind of encapsulation is access control, using public and private to control the visibility and accessibility of your variables and methods. A public variable or method can be accessed by any code (assuming the class containing that variable or method is also public). A private variable or method can only be accessed by code in the same class. Keeping things private as much as possible, especially for variables, provides encapsulation, since it limits the code that could inadvertently cause bugs.

Another kind of encapsulation comes from variable scope. The scope of a variable is the portion of the program text over which that variable is defined, in the sense that expressions and statements can refer to the variable. A function parameter’s scope is the body of the function. A local variable’s scope extends from its declaration to the next closing curly brace. Keeping variable scopes as small as possible makes it much easier to reason about where a bug might be in the program. For example, suppose you have a loop like this:

for (i = 0; i < 100; ++i) {
    ...
    doSomeThings();
    ...
}

… and you’ve discovered that this loop keeps running forever – i never reaches 100. Somewhere, somebody is changing i. But where? If i is declared as a global variable like this:

let i: number;
...
for (i = 0; i < 100; ++i) {
    ...
    doSomeThings();
    ...
}

… then its scope is the entire program. It might be changed anywhere in your program: by doSomeThings(), by some other method that doSomeThings() calls, by a concurrent thread running some completely different code. But if i is instead declared as a local variable with a narrow scope, like this:

for (let i = 0; i < 100; ++i) {
    ...
    doSomeThings();
    ...
}

… then the only place where i can be changed is within the for statement – in fact, only in the ... parts that we’ve omitted. You don’t even have to consider doSomeThings(), because doSomeThings() doesn’t have access to this local variable.

Minimizing the scope of variables is a powerful practice for bug localization. Here are a few rules that are good for TypeScript:

Always declare a loop variable in the for-loop initializer. So rather than declaring it before the loop:
```
let i: number;
for (i = 0; i < 100; ++i) {
```
which makes the scope of the variable the entire rest of the outer curly-brace block containing this code, you should do this:
```
for (let i = 0; i < 100; ++i) {
```
which makes the scope of i limited just to the for loop.
Always use const or let, never var. The scope of a const or let variable is the smallest curly-brace block that surrounds it, while the scope of a var declaration is the entire function that surrounds it. You will see in a lot of old JavaScript code on the web that uses var; avoid it.
Declare a variable only when you first need it, and in the innermost curly-brace block that you can. Put your variable declaration in the innermost block that contains all the expressions that need to use the variable. Don’t declare all your variables at the start of the function – it makes their scopes unnecessarily large.
Avoid global variables. Using global variables is a very bad idea, especially as programs get large. Global variables are often used as a shortcut to provide a parameter to several parts of your program. It’s better to just pass the parameter into the code that needs it, rather than putting it in global space where it can inadvertently be reassigned.

reading exercises

Variable scope

Consider the following code (which is missing some variable declarations):

 1 class Apartment {
 2
 3     private bathrooms: number;
 4     // declaration of roommates variable
 5 
 6     public constructor(bathrooms: number) {
 7         this.bathrooms = bathrooms;
 8         this.roommates = new Set<Person>();
 9     }
 10    
 11    public addRoommate(newRoommate: Person): void {
 12         this.roommates.add(newRoommate);
 13         if (this.roommates.size > PEOPLE_PER_BATHROOM * this.bathrooms) {
 14             this.roommates.delete(newRoommate);
 15             throw new Error('too many people');
 16         }
 17     }
 18     
 19     public getMaximumOccupancy(): number {
 20         return PEOPLE_PER_BATHROOM * this.bathrooms;
 21     }
 22 }

Which of these lines are within the scope of the newRoommate variable?

line 3

line 12

line 14

line 20

(missing explanation)

Which part of the file is the scope of the (currently undeclared) roommates instance variable?

lines 2-21

lines 7-8

lines 12-16

line 20

(missing explanation)

Out of the choices below, what is the best declaration for the roommates instance variable?

roommates:Array<Person>;

roommates:Set<Person>;

readonly roommates:Set<Person>;

(missing explanation)

Out of the choices below, what is the best declaration for PEOPLE_PER_BATHROOM?

let PEOPLE_PER_BATHROOM: number = 5;

public PEOPLE_PER_BATHROOM: number = 5;

const PEOPLE_PER_BATHROOM: number = 5;

(missing explanation)

Snapshots of scope

const PEOPLE_PER_BATHROOM = 5;

class Apartment {

    private bathrooms: number;
    private roommates: Set<Person>;

    public constructor(bathrooms: number) {
        this.bathrooms = bathrooms;
        this.roommates = new Set<Person>();
    }

    public addRoommate(newRoommate: Person): void {
        this.roommates.add(newRoommate);
        if (this.roommates.size > PEOPLE_PER_BATHROOM * this.bathrooms) {
            this.roommates.delete(newRoommate);
            throw new Error('too many people');
        }
    }

    public getMaximumOccupancy(): number {
        return PEOPLE_PER_BATHROOM * this.bathrooms;
    }
}

function main() {
    const apt: Apartment = new Apartment(1);
    apt.addRoommate(new Person("Sherlock Holmes"));
}
main();

The same code is shown at right, now with all variable declarations included, plus a main method.

Suppose we stop the code just inside the call to addRoommate(). An incomplete snapshot diagram for this point is shown at the right. Fill in the snapshot diagram with the names that would appear at each location. If you don’t remember what each labeled box represents, refer back to the kinds of variables in snapshot diagrams.

location A, inside the box labeled addRoommate

(missing explanation)

location B, inside the box labeled main

(missing explanation)

location C, inside the oval labeled Apartment

(missing explanation)

location D, outside of all shapes, labeled global

(missing explanation)

Which names are out of scope and do not exist in the state of the program at this point?

(missing explanation)

Which names are out of scope at this point (i.e. inaccessible for the code inside addRoommate()) but still exist in the state of the program?

(missing explanation)

Summary

In this reading, we looked at some ways to minimize the cost of debugging:

Avoid debugging
- make bugs impossible with techniques like static typing, automatic dynamic checking, and immutability
Keep bugs confined
- failing fast with assertions keeps a bug’s effects from spreading
- incremental development and unit testing confine bugs to your recent code
- scope minimization reduces the amount of the program you have to search

Thinking about our three main measures of code quality:

Safe from bugs. We’re trying to prevent them and get rid of them.
Easy to understand. Techniques like static typing, const/readonly declarations, and assertions are additional documentation of the assumptions in your code. Variable scope minimization makes it easier for a reader to understand how the variable is used, because there’s less code to look at.
Ready for change. Assertions and static typing document the assumptions in an automatically-checkable way, so that when a future programmer changes the code, accidental violations of those assumptions are detected.