6.031
6.031 — Software Construction
Spring 2019

Reading 13: Debugging

Software in 6.031

Safe from bugsEasy to understandReady for change
Correct today and correct in the unknown future. Communicating clearly with future programmers, including future you. Designed to accommodate change without rewriting.

Objectives

The topic of today’s class is systematic debugging.

Sometimes you have no choice but to debug – particularly when the bug is found only when you plug the whole system together, or reported by a user after the system is deployed, in which case it may be hard to localize it to a particular module. For those situations, we can suggest a systematic strategy for more effective debugging.

A good book about systematic debugging is Why Programs Fail by Andreas Zeller. Much of this reading is inspired by that book.

Reproduce the Bug

Start by finding a small, repeatable test case that produces the failure. If the bug was found by regression testing, then you’re in luck; you already have a failing test case in your test suite. If the bug was reported by a user, it may take some effort to reproduce the bug. For graphical user interfaces and multithreaded programs, a bug may be hard to reproduce consistently if it depends on timing of events or thread execution.

Nevertheless, any effort you put into making the test case small and repeatable will pay off, because you’ll have to run it over and over while you search for the bug and develop a fix for it. Furthermore, after you’ve successfully fixed the bug, you’ll want to keep the test case in your regression test suite, so that the bug never crops up again. Once you have a test case for the bug, making this test work becomes your goal.

Here’s an example. Suppose you have written this function:

/**
 * Find the most common word in a string.
 * @param text string containing zero or more words, where a word
 *     is a string of alphanumeric characters bounded by nonalphanumerics.
 * @return a word that occurs maximally often in text, ignoring alphabetic case.
 */
public static String mostCommonWord(String text) {
    ...
}

A user passes the whole text of Shakespeare’s plays into your method, something like mostCommonWord(allShakespearesPlaysConcatenated), and discovers that instead of returning a predictably common English word like "the" or "a", the method returns something unexpected, perhaps "e".

Shakespeare’s plays have 100,000 lines containing over 800,000 words, so this input would be very painful to debug by normal methods, like print-debugging and breakpoint-debugging. Debugging will be easier if you first work on reducing the size of the buggy input to something manageable that still exhibits the same (or very similar) bug:

  • does the first half of Shakespeare show the same bug? (Binary search! Always a good technique. More about this below.)
  • does a single play have the same bug?
  • does a single speech have the same bug?

Once you’ve found a small test case, find and fix the bug using that smaller test case, and then go back to the original buggy input and confirm that you fixed the same bug.

reading exercises

Reducing a bug to a test case

Suppose a user reports that mostCommonWord("chicken chicken chicken beef") returns "beef" instead of "chicken".

Give all answers that make sense, not just the simplest one (because the simplest one sometimes no longer exhibits the bug!)

(missing explanation)

Regression testing

Suppose you reduce the "chicken chicken chicken beef" input down to "c c b", which also has a problem. You find a bug, fix it, and observe that both "c c b" and "chicken chicken chicken beef" are now returning the right answer.

(missing explanation)

Find the Bug Using the Scientific Method

To localize the bug and its cause, you can use the scientific method:

  1. Study the data. Look at the test input that causes the bug, and the incorrect results, failed assertions, and stack traces that result from it.

  2. Hypothesize. Propose a hypothesis, consistent with all the data, about where the bug might be, or where it cannot be. It’s good to make this hypothesis general at first.

  3. Experiment. Devise and run an experiment that tests your hypothesis. It’s good to make the experiment an observation at first – a probe that collects information but disturbs the system as little as possible.

  4. Repeat. Add the data you collected from your experiment to what you knew before, and make a fresh hypothesis. Hopefully you have ruled out some possibilities and narrowed the set of possible locations and reasons for the bug.

This kind of deliberate process isn’t needed for every bug. With good fail-fast design, you’ll hopefully get an exception very close to the source of the bug, the stack trace will lead you right to it, and the mistake will be obvious from inspecting the code. So when do you need to apply the scientific method? A good rule of thumb is the 10-minute rule. If you’ve spent 10 minutes hunting for a bug using ad-hoc, unsystematic inspection, then it’s time to take a step back and start applying the scientific method instead.

As part of this transition, you should also move your debugging process out of your head – which has a very limited working memory of what you’ve tried and what you’ve learned from it – and start taking notes, either on paper or on your laptop. In each iteration of the process, you’ll be writing down:

  • Hypothesis. Based on what you’ve learned so far, what’s your next hypothesis about the location or cause of the bug?
  • Experiment. What are you about to try that will shed light (verify or falsify) on the hypothesis?
  • Predictions. What do you expect, based on your hypothesis, to be the result of the experiment?
  • Observations. What actually happened when you did the experiment?

These kinds of questions should be very familiar to you from your past experience in science classes. In the next few sections, we’ll see what kind of form they take when you’re debugging code. In debugging, some hypotheses and some experiments are better than others.

reading exercises

Scientific method

The process that we already discussed for simplifying a bug report down to a simple test case is an example of applying the scientific method. For each of these moments in a particular test-simplification process, which step of the scientific method above is being applied?

A user reports that mostCommonWord("chicken chicken chicken beef") returns "beef" instead of "chicken".

(missing explanation)

The particular words “chicken” and “beef” don’t cause the failure – what matters is the number of times they occur.

(missing explanation)

Run a test case mostCommonWord("a a a b").

(missing explanation)

1. Study the Data

One important form of data is the stack trace from an exception. Practice reading the stack traces that you get, because they will give you enormous amounts of information about where and what the bug might be.

In a Java stack trace, the latest call is on top, and the oldest call is on the bottom. Sometimes the topmost call is your own code, but the exception may also have been thrown by library code that your code called, so the topmost call is not code that you wrote. Similarly, the bottommost call on the stack may be the main method that you wrote, but it may also be library code that you didn’t write, but which eventually called your code.

So your own code — where the bug is most likely to be — is often somewhere in the middle of the stack trace.

reading exercises

Reading a stack trace

Suppose Ben Bitdiddle is working on a Java program, and gets this stack trace — note that you must scroll the box to see the whole trace:

java.lang.NullPointerException
  at java.util.Objects.requireNonNull(Objects.java:221)
  at java.util.AbstractSet.removeAll(AbstractSet.java:169)
  at turtle.TurtleSoup.drawPersonalArt(TurtleSoup.java:29)
  at turtle.TurtleSoupTest.testPersonalArt(TurtleSoupTest.java:39)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:564)
  at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:436)
  at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:115)
  at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:170)
  at org.junit.jupiter.engine.execution.ThrowableCollector.execute(ThrowableCollector.java:40)
  at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:166)
  at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:113)
  at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:58)
  at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.lambda$executeRecursively$3(HierarchicalTestExecutor.java:112)
  at org.junit.platform.engine.support.hierarchical.SingleTestExecutor.executeSafely(SingleTestExecutor.java:66)
  at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.executeRecursively(HierarchicalTestExecutor.java:108)
  at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.execute(HierarchicalTestExecutor.java:79)
  at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.lambda$executeRecursively$2(HierarchicalTestExecutor.java:120)
  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
  at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
  at java.util.Iterator.forEachRemaining(Iterator.java:133)
  at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
  at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
  at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
  at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
  at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
  at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.lambda$executeRecursively$3(HierarchicalTestExecutor.java:120)
  at org.junit.platform.engine.support.hierarchical.SingleTestExecutor.executeSafely(SingleTestExecutor.java:66)
  at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.executeRecursively(HierarchicalTestExecutor.java:108)
  at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.execute(HierarchicalTestExecutor.java:79)
  at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.lambda$executeRecursively$2(HierarchicalTestExecutor.java:120)
  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
  at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
  at java.util.Iterator.forEachRemaining(Iterator.java:133)
  at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
  at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
  at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
  at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
  at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
  at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.lambda$executeRecursively$3(HierarchicalTestExecutor.java:120)
  at org.junit.platform.engine.support.hierarchical.SingleTestExecutor.executeSafely(SingleTestExecutor.java:66)
  at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.executeRecursively(HierarchicalTestExecutor.java:108)
  at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.execute(HierarchicalTestExecutor.java:79)
  at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:55)
  at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:43)
  at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:170)
  at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:154)
  at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:90)
  at org.eclipse.jdt.internal.junit5.runner.JUnit5TestReference.run(JUnit5TestReference.java:86)
  at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
  at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:538)
  at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:760)
  at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:460)
  at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:206)

What line of code actually threw the exception?

filename:

line number:

When the exception was thrown, what was the last line being executed in Ben’s own code?

filename:

line number:

What was the entry point to Ben’s code, i.e. what method of Ben’s code was called first in this stack trace?

method name:

(missing explanation)

2. Hypothesize

The point where the program actually failed, by throwing an exception or producing a wrong answer, isn’t necessarily where the bug is. The buggy code may have propagated some bad values through good parts of the program before it eventually failed. So your hypotheses should be about where the bug actually is (or is not), and what might have caused it.

It can help to think about your program as a flow of data, or steps in an algorithm, and try to rule out whole sections of the program at once. Let’s think about this in the context of the mostCommonWord() example, fleshed out a little more with three helper methods:

/**
 * Find the most common word in a string.
 * ...
 */
public static String mostCommonWord(String text) {
    List<String> words = splitIntoWords(text);
    Map<String,Integer> frequencies = countOccurrences(words);
    String winner = findMostFrequent(frequencies);
    return winner;
}
data flowing through the modules of the program

The flow of data in mostCommonWord() is shown at right.

Suppose that we’re getting an unexpected exception in countOccurrences(). That’s the failure that we’re investigating. Then we can rule out everything downstream of that point as a possible location for the bug. There’s no point in looking for the bug in findMostFrequent(), for example, because it hasn’t even been executed yet when the failure occurs.

So here are some hypotheses consistent with the information we have so far. They’re listed in reverse order of time from the failure:

  • the bug is in countOccurrences: its input is valid but then it throws an exception
  • the bug is in the connection between splitIntoWords and countOccurrences: both methods meet their contracts, but the postcondition guaranteed by the former doesn’t satisfy the precondition expected by the latter
  • the bug is in splitIntoWords: its input is valid but it produces bad output
  • the bug is in the original input to mostCommonWord: text doesn’t satisfy the precondition of the whole method

Which hypothesis to try first? Debugging is a search process, and you can use binary search to speed up the process. To do a binary search, you could divide this dataflow in half, perhaps guessing that the bug is in the connection between the first helper method and the second, and use one of the experiments below (like print statements, breakpoints, or assertions) to check the value there. From the answer to that experiment, you would know whether to pursue the bug in the earlier half of the dataflow or the later half.

Slicing

The mostCommonWord() data flow we just looked at is an example of slicing, which means finding the parts of a program that contributed to computing a particular value. When you have a failure — a bad value at a particular point in the program — the slice for that value consists of the lines of the program that helped compute that bad value. The bug that caused the bad value lies somewhere in that slice, so that’s your search space.

Automated tools for program slicing do exist, though they are not very practical yet. But programmers also do slicing naturally, in their heads, when looking at code to generate a hypothesis about where a bug might or might not be. It’s a useful skill for reasoning about programs, so let’s understand it better.

Here’s an example. Suppose x is a local integer variable which is never supposed to be negative. At some point a debugging print statement reports that it has gone bad:

int x = 0; // must be >= 0
...
System.out.println("x=" + x);  // prints a negative number

What is the slice for this bad value? What lines of code helped compute the negative value that x has when it reaches that print statement? Let’s dig into the ... in the code above to find what code might be responsible.

Lines that directly change the value of x are part of the slice:

int x = 0; // must be >= 0
...
    x += bonus;
...
System.out.println("x=" + x);  // prints a negative number

The value of bonus at that point also contributes to the value of x, so that means that its slice does too. So we have to look at the lines that helped compute bonus at that point:

int x = 0; // must be >= 0
final int bonus = getBonus();
...
    x += bonus;
...
System.out.println("x=" + x);  // prints a negative number

So the function getBonus() is now included in the slice, because it was responsible for computing bonus. (Technically we only need to include the slice of its return value, but we can simplify and just say that getBonus() itself is suspect.)

The slice also includes control statements that affected the execution of statements already in the slice:

int x = 0;
final int bonus = getBonus();
...
  if (isWorthABonus(s)) {
    x += bonus;
  }
...
System.out.println("x=" + x);  // prints a negative number

The if statement around x += bonus is part of the slice because it controls whether or not x was actually changed at that point. This also pulls in the function isWorthABonus(), and the value of s and its slice:

int x = 0;
final int bonus = getBonus();
...
for (final Sale s : salesList) {
  ...
  if (isWorthABonus(s)) {
    x += bonus;
  }
  ...
}
...
System.out.println("x=" + x);  // prints a negative number

The enclosing for statement is included in the slice for two reasons: because it’s part of the slice of s, so it affects the if statement which is already in our slice, and because it affects the number of times that statements in our slice are executed (the if statement and x += bonus).

And now, because the for uses the variable salesList, we have to include its slice as well. It happens to be a method parameter:

int calculateTotalBonus(final List<Sale> salesList) {
  ...
  int x = 0;
  final int bonus = getBonus();
  ...
  for (final Sale s : salesList) {
    ...
    if (isWorthABonus(s)) {
      x += bonus;
    }
    ...
  }
  ...
  System.out.println("x=" + x);  // prints a negative number
  ...
}

We could dig further and see where salesList came from in the rest of the program, but let’s stop there for now. We’ve found the lines of code in this method that might be responsible for the bad value of x we discovered. Studying these lines can generate some useful hypotheses:

  • from the x+=bonus statement: maybe bonus is negative, so x+=bonus goes immediately negative. This hypothesis would further imply that getBonus() returned a negative value for bonus.
  • from the if statement: maybe isWorthABonus() returns true for too many sales, so the sum accumulated in x overflows the size of an int and goes negative.
  • from the for loop: maybe the entire loop is executing over too many sales, again making x overflow from all the bonuses.
  • from the method signature: maybe a bad value is being passed in for salesList, with far too many sales on it.

The upshot is that careful slicing – which you can do just by reading the code – can help you systematically identify places where the bug could be, and also where it could not be. We can rule out the ... code in this example, because it doesn’t contribute to the bad value of x.

It’s worth noting that our design choices can help or hinder the efficiency of reasoning by slicing. One such design choice is immutability. When we were slicing for the value of bonus, and saw that its declaration was final int bonus = getBonus(), we immediately knew that we didn’t have to look any further – no other lines can be in the slice for bonus, because it’s an unreassignable reference to an immutable value. Immutability saved us a lot of time in reasoning and searching. When we saw final Sale s, though, we didn’t have quite the same confidence. Certainly s could never be reassigned, but if Sale is a mutable type, we would have to check the ... code to make sure no mutators were called on s.

Another design choice that helps slicing is scope minimization. All the variables in this example were local variables, with minimal scope, so we only had to look close by for lines of code that might affect them. With instance variables, the slicing search might have to expand to include the entire class. For a global variable (gasp), the search expands to include the entire program.

reading exercises

Slicing

In the following code, which lines are part of the slice for the value of the variable tax at the end of the code?

double total = 0.0;
double tax = 0.0;
double taxRate = 0.06;
for (final Item item : items) {
  total += item.getPrice();
  if (isOutOfStateCustomer) {
    taxRate /= 2;
  }
  if (item.isTaxable()) {
    tax += item.getPrice() * taxRate;
  }
}
total += tax;
return total;

(missing explanation)

Slicing 2

In the code below, which lines are part of the slice for the value of a at the end of the code?

int[] incrementAll(int[] a) {
  int[] b = a;
  for (int i = 0; i < a.length; ++i) {
    ++b[i];
  }
  return b;
}

(missing explanation)

Delta debugging

The process of isolating a small test case may also give you data that helps form a hypothesis, if it uncovers two closely-related test cases that bracket the bug, in the sense that one succeeds and one fails. For example, maybe mostCommonWords("c c, b") is broken, but mostCommonWords("c c b") is fine. Now you can examine the difference between the execution of these two test cases to help form your hypothesis. Which code is executed for the passing test case, but skipped for the failing test case, and vice versa? One hypothesis is that the bug lies in those lines of code, the delta between the passing run and the failing run.

This is a specific example of a general idea in bug finding called delta debugging, which defines the cause of a bug as a difference between successful execution and failing execution, and then searches for a minimal cause in order to find the problem. Delta debugging tools can automate this process, though like slicing tools they are not widely used yet.

Prioritize Hypotheses

When making your hypothesis, you may want to keep in mind that different parts of the system have different likelihoods of failure. For example, old, well-tested code is probably more trustworthy than recently-added code. Java library code is probably more trustworthy than yours. The Java compiler and runtime, operating system platform, and hardware are increasingly more trustworthy, because they are more tried and tested. You should trust these lower levels until you’ve found good reason not to.

reading exercises

Priorities

Suppose you are debugging the quadraticRoots function, which appears to be producing wrong answers sometimes.

/**
 * Solves quadratic equation ax^2 + bx + c = 0.
 * 
 * @param a quadratic coefficient, requires a != 0
 * @param b linear coefficient
 * @param c constant term
 * @return a list of the real roots of the equation
 */
public static List<Double> quadraticRoots(int a, int b, int c) { ... }

Put the following items in the order that you should try them: 1, 2, 3, 4.

(missing explanation)

3. Experiment

Your hypothesis should lead to a prediction, such as “I think variable x has a bad value at this point” or even “I think this code is never executed.” Your experiment should be chosen to test the prediction. The best experiment is a probe, a gentle observation of the system that disturbs it as little as possible.

One familiar probe is a print statement. Print debugging has the advantage that it works for virtually every programming language. It has the disadvantage that it makes a change to the program, which you have to remember to revert. It’s too easy to end up with a program littered with print statements after a long debugging session. It’s also wise to be a little thoughtful when you write a debugging print statement. Rather than printing the same hi! in fifteen different places to trace how the program is executing, and losing track of which is which, print something clear and descriptive like start of calculateTotalBonus.

A more elaborate kind of print debugging is logging, which keeps informative print statements permanently in the code and turns them on or off with a global setting, like a boolean DEBUG constant or a log level variable. A logging framework like Log4j can also direct the logging to a file or to a server across the network, can log structured data as well as human-readable strings, and can be used in deployment, not just development. Large, complex systems would be very hard to operate without logging.

Another kind of probe is an assertion that tests variable values or other internal state. In the example above where x is never allowed to be negative, we could insert assert(x >= 0); as a probe at any point. Assertions have the advantage that they don’t require you to inspect output by hand, and can even be left in the code after debugging if the test is universally true (some debugging assertions are only true for the specific test case you’re debugging, and those would need to be removed). Assertions have the disadvantage that they’re not turned on by default, so you can be fooled by an assertion that seems to be passing but is actually not even running. We talked about this problem in a previous reading.

A third kind of probe is setting a breakpoint with a debugger, which stops the running program at that point, and allows you to single-step through the program and inspect values of variables. A debugger is a powerful tool that rewards the effort put into learning how to use it.

reading exercises

Probes

Here is some code that has been successfully debugged, but still has some debugging probes left in it:

/**
 * Convert from one currency to another.
 * @param fromCurrency currency that customer has (e.g. DOLLAR)
 * @param fromValue value of fromCurrency that customer has (e.g. $145.23)
 * @param toCurrency currency that customer wants (e.g. EURO).  
 *                   Must be different from fromCurrency.
 * @return value of toCurrency that customer will get,
 *         after applying the conversion rate and bank fee
 */
public static double convertCurrency(Currency fromCurrency, double fromValue, Currency toCurrency) {
  assert(fromCurrency != null && toCurrency != null);
  assert(! fromCurrency.equals(toCurrency));

  double rate = getConversionRate(fromCurrency, toCurrency);
  System.out.println("conversion rate is " + rate);

  double fee = getFee();
  assert(fee == 0.01); // right now the bank charges 1%

  return fromValue * rate * (1-fee);
}

Which lines should be removed before committing and pushing?

(missing explanation)

Using a debugger

For this exercise, you’ll need to start Eclipse.

Create a new Java class called Hailstone.java. You can make a new project for it, or just put it in a project from in-class exercises. Here is the code for the class:

import java.util.ArrayList;
import java.util.List;

public class Hailstone {
    public static List<Integer> hailstoneSequence(int n) {
        List<Integer> list = new ArrayList<Integer>();
        while (n != 1) {
            list.add(n);
            if (n % 2 == 0) {
                n = n / 2;
            } else {
                n = 3 * n + 1;
            }
        }
        list.add(n);
        System.out.println(maxOfList(list));
        return list;
    }

    public static int maxOfList(List<Integer> list) {
        int max = Integer.MIN_VALUE;
        for (int n : list) {
            max = Math.max(n, max);
        }
        return max;
    }

    public static void main(String[] args) {
        System.out.println("hailstoneSequence(5)=" + hailstoneSequence(5));
    }

}

Set a breakpoint on line 8 (list.add(n)) and then use Run → Debug to run the program in the debugger. It should stop at the breakpoint.

How many elements are in list at this point? (Use the Variables pane to find out. Clicking on a variable will show its toString value at the bottom of the pane, which is often more useful for complex abstract types like lists, sets, and maps. Expanding the variable shows the type’s concrete rep, which is not what you want to see as a client of the type.)

(missing explanation)

Now use Step Over (Run → Step Over, but also a toolbar button which is more convenient) to step over 8 statements.

How many elements are in list now?

(missing explanation)

Move forward until you reach line 16 System.out.println(...), either by using Step Over repeatedly, or by setting a breakpoint there and using Resume (Run → Resume) to reach the breakpoint.

Now use Step Into to step into the method call on line 16.

What is the name of the method you are now in?

(missing explanation)

Swap Components

If you hypothesize that the bug is in a module, and you happen to have a different implementation of it that satisfies the same interface, then one experiment you can do is to try swapping in the alternative. For example:

  • If you suspect your binarySearch() implementation, then substitute a simpler linearSearch() instead.
  • If you suspect java.util.ArrayList, swap in java.util.LinkedList instead.
  • If you suspect the Java runtime, run with a different version of Java.
  • If you suspect the operating system, run your program on a different OS.
  • If you suspect the hardware, run on a different machine.

You can waste a lot of time swapping unfailing components, however, so don’t do this unless you have good reason to suspect a component. As we discussed under prioritizing hypotheses, the programming language, operating system, or hardware should be very low on your list of suspects.

Don’t Fix Yet

It’s tempting to try to do an experiment that seems to fix the hypothesized bug, instead of a mere probe. This is almost always the wrong thing to do. First, it leads to a kind of ad-hoc guess-and-test programming, which produces awful, complex, hard-to-understand code. Second, your fix may just mask the true bug without actually removing it – treating the symptom rather than the disease.

For example, if you’re getting an ArrayOutOfBoundsException, don’t just add code that catches the exception and ignores it, or avoids the exception by testing the index first. Make sure you’ve understood why that exception was being thrown, and fix the real problem.

reading exercises

Premature Fixes

The following code has been debugged for a while:

/**
 * @return true if and only if word1 is an anagram of word2
 *         (i.e. a permutation of its characters)
 */
boolean isAnagram(String word1, String word2) {
  try {
    if (word1.equals("")) return word2.equals("");

    for (int i = 0; i < word1.length; ++i) {
      if (! word2.contains(word1.charAt(i))) return false;
    }

    if (! isAnagram(word2, word1)) return false;
    else if (word2.length() == word1.length()) return true;
    else return false;

  } catch (StackOverflowError e) { return true; }
}

Which of its six (!) return statements were probably added to patch over bugs, rather than fixing the real problem by rewriting the algorithm?

(missing explanation)

4. Repeat

After the experiment, reflect on the results and modify your hypothesis. If what you observed disagreed with what the hypothesis predicted, then reject that hypothesis and make a new one. If the observation agreed with the prediction, then refine the hypothesis to narrow down the location and cause of the bug. Then repeat the process with this fresh hypothesis.

Make sure your source code and object code are up to date. If none of your observations seem to make sense, one possible hypothesis is that the code you’re running (the object code) doesn’t match the code you’re reading (the source). To test this, pull the latest version from the repository, and delete all your binary files and recompile everything (in Eclipse, this is done by Project → Clean).

Fix the Bug

Once you’ve found the bug and understand its cause, the third step is to devise a fix for it. Avoid the temptation to slap a patch on it and move on. Ask yourself whether the bug was a coding error, like a misspelled variable or interchanged method parameters, or a design error, like an underspecified or insufficient interface. Design errors may suggest that you step back and revisit your design, or at the very least consider all the other clients of the failing interface to see if they suffer from the bug too.

Think also whether the bug has any relatives. If I just found a divide-by-zero error here, did I do that anywhere else in the code? Try to make the code safe from future bugs like this. Also consider what effects your fix will have. Will it break any other code?

Finally, after you have applied your fix, add the bug’s test case to your regression test suite, and run all the tests to assure yourself that (a) the bug is fixed, and (b) no new bugs have been introduced.

Other tips

Get help. It often helps to explain your problem to someone else, even if the person you’re talking to has no idea what you’re talking about. This is sometimes called rubber-duck debugging or teddy-bear debugging. One computer lab had a big teddy bear sitting next to their help desk, with the rule that you had to “tell it to the bear” before you tried the human. The bear dealt with a surprising number of problems all by itself. Talking aloud about why you expect your code to work, and what it’s doing instead, can be good at helping you realize the problem yourself.

When the teddy bear doesn’t help, 6.031 staff and fellow students usually know what you’re talking about, so they’re even better to get help from.

Sleep on it. If you’re too tired, you won’t be an effective debugger. Trade latency for efficiency.

Summary

In this reading, we looked at how to debug systematically:

  • reproduce the bug as a test case, and put it in your regression suite
  • find the bug using the scientific method:
    • generate hypotheses using slicing, binary search, and delta debugging
    • use minimially-invasive probes, like print statements, assertions, or a debugger, to observe program behavior and test the prediction of your hypotheses
  • fix the bug thoughtfully, not slapdash

Thinking about our three main measures of code quality, systematic debugging is essentially about safety from bugs: we’re trying to get rid of a bug, while using regression testing to keep it from coming back.