Reading 4: Code Review

You must complete the exercises in this reading by 10:00 pm US Eastern time the night before class.

Java Tutor exercises

Keep making progress on Java by completing the next three categories in the Java Tutor:

Software in 6.031

Safe from bugs	Easy to understand	Ready for change
Correct today and correct in the unknown future.	Communicating clearly with future programmers, including future you.	Designed to accommodate change without rewriting.

Objectives

In today’s class, we will practice:

code review: reading and discussing code written by somebody else
general principles of good coding: things you can look for in every code review, regardless of programming language or program purpose

Code review

Code review is careful, systematic study of source code by people who are not the original author of the code. It’s analogous to proofreading a paper.

Code review really has two purposes:

Improving the code. Finding bugs, anticipating possible bugs, checking the clarity of the code, and checking for consistency with the project’s style standards.
Improving the programmer. Code review is an important way that programmers learn and teach each other, about new language features, changes in the design of the project or its coding standards, and new techniques. In open source projects, particularly, much conversation happens in the context of code reviews.

Code review is widely practiced in open source projects like Apache and Mozilla. Code review is also widely practiced in industry. In Google’s code review process, for example, you can’t push any code into the main repository until another engineer has signed off on it in a code review.

Why is code review widely used? There are many ideas out there for software development processes (some with buzzwords like Agile and Scrum). But code review is a practice with good evidence that it actually works. Hillel Wayne has collected some of the research highlights, among them that code review can find 70-90% of software defects, and that overwhelming numbers of software engineers at Google and Microsoft find it valuable and worth doing.

In 6.031, we’ll do code review on problem sets, as described in the Code Reviewing document on the course website.

Style standards

Most companies and large projects have coding style standards. These can get pretty detailed, even to the point of specifying whitespace (how deep to indent) and where curly braces and parentheses should go. These kinds of questions often lead to holy wars since they end up being a matter of taste and style.

In 6.031, we have no official style guide. If you are new to Java and want a recommendation, Google Java Style is widely used and appealing for its readability, particularly rules like these:

if (isOdd(n)) {
    n = 3*n + 1;
}

space after keyword (if), but not after function name (isOdd)
{ at the end of a line, } on a line by itself
{…} around all blocks, even empty or one-statement blocks

Eclipse has an autoformatter (Source → Format) whose default rules are similar to Google style.

But we won’t dictate where to put your curly braces in this class. That’s a personal decision that each programmer should make. It’s important to be self-consistent, however, and it’s very important to follow the conventions of the project you’re working on. If you’re the programmer who reformats every module you touch to match your personal style, your teammates will hate you, and rightly so. Be a team player.

But there are some rules that are quite sensible and target our big three properties, in a stronger way than placing curly braces. The rest of this reading talks about some of these rules, at least the ones that are relevant at this point in the course, where we’re mostly talking about writing basic Java. These are some things you should start to look for when you’re code reviewing other students, and when you’re looking at your own code for improvement. Don’t consider it an exhaustive list of code style guidelines, however. Over the course of the semester, we’ll talk about a lot more things — specifications, abstract data types with representation invariants, concurrency and thread safety — which will then become fodder for code review.

Smelly example #1

Programmers often describe bad code as having a “bad smell” that needs to be removed. “Code hygiene” is another word for this. Let’s start with some smelly code.

public static int dayOfYear(int month, int dayOfMonth, int year) {
    if (month == 2) {
        dayOfMonth += 31;
    } else if (month == 3) {
        dayOfMonth += 59;
    } else if (month == 4) {
        dayOfMonth += 90;
    } else if (month == 5) {
        dayOfMonth += 31 + 28 + 31 + 30;
    } else if (month == 6) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31;
    } else if (month == 7) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30;
    } else if (month == 8) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31;
    } else if (month == 9) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31;
    } else if (month == 10) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30;
    } else if (month == 11) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31;
    } else if (month == 12) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31 + 31;
    }
    return dayOfMonth;
}

The next few sections and exercises will pick out the particular smells in this code example.

Don’t repeat yourself (DRY)

Duplicated code is a risk to safety. If you have identical or very similar code in two places, then the fundamental risk is that there’s a bug in both copies, and some maintainer fixes the bug in one place but not the other.

Avoid duplication like you’d avoid crossing the street without looking. Copy-and-paste is an enormously tempting programming tool, and you should feel a frisson of danger run down your spine every time you use it. The longer the block you’re copying, the riskier it is.

Don’t Repeat Yourself, or DRY for short, has become a programmer’s mantra.

The dayOfYear example is full of identical code. How would you DRY it out?

reading exercises

Don’t repeat yourself

Some of the repetition in dayOfYear() is repeated values. How many times is the number of days in April written in dayOfYear()?

(missing explanation)

Don’t repeat yourself

One reason why repeated code is bad is because a problem in the repeated code has to be fixed in many places, not just one. Suppose our calendar changed so that February really has 30 days instead of 28. How many numbers in this code have to be changed?

(missing explanation)

Don’t repeat yourself

Another kind of repetition in the code is dayOfMonth+=. Assume you have an array:
int[] monthLengths = new int[] { 31, 28, 31, 30, ..., 31 };
Which of the following code skeletons could be used to DRY the code out enough so that dayOfMonth+= appears only once?

for (int m = 1; m < month; ++m) { ... }

switch (month) { case 1: ...; break; case 2: ...; break; ... }

while (m < month) { ...; m += 1; }

if (month == 1) { ... } else { ... dayOfYear(month-1, dayOfMonth, year) ... }

(missing explanation)

Comments where needed

Good software developers write comments in their code, and do it judiciously. Good comments should make the code easier to understand, safer from bugs (because important assumptions have been documented), and readier for change.

One kind of crucial comment is a specification, which appears above a method or above a class and documents the behavior of the method or class. In Java, this is conventionally written as a Javadoc comment, meaning that it starts with /** and includes @-syntax, like @param and @return for methods. Here’s an example of a spec:

/**
 * Compute the hailstone sequence.
 * See http://en.wikipedia.org/wiki/Collatz_conjecture#Statement_of_the_problem
 * @param n starting number of sequence; requires n > 0.
 * @return the hailstone sequence starting at n and ending with 1.
 *         For example, hailstone(3)=[3,10,5,16,8,4,2,1].
 */
public static List<Integer> hailstoneSequence(int n) {
    ...
}

Specifications document assumptions. We’ve already mentioned specs a few times, and there will be much more to say about them in a future reading.

Another crucial comment is one that specifies the provenance or source of a piece of code that was copied or adapted from elsewhere. This is vitally important for practicing software developers, and is required by the 6.031 collaboration policy when you adapt code you found on the web. Here is an example:

// read a web page into a string
// see http://stackoverflow.com/questions/4328711/read-url-to-string-in-few-lines-of-java-code
String mitHomepage = new Scanner(new URL("http://www.mit.edu").openStream(), "UTF-8").useDelimiter("\\A").next();

One reason for documenting sources is to avoid violations of copyright. Small snippets of code on Stack Overflow are typically in the public domain, but code copied from other sources may be proprietary or covered by other kinds of open source licenses, which are more restrictive. Another reason for documenting sources is that the code can fall out of date; the Stack Overflow answer from which this code came has evolved significantly in the years since it was first answered.

Some comments are bad and unnecessary. Direct transliterations of code into English, for example, do nothing to improve understanding, because you should assume that your reader at least knows Java:

while (n != 1) { // test whether n is 1   (don't write comments like this!)
   ++i; // increment i
   l.add(n); // add n to l
}

But obscure code should get a comment:

int sum = n*(n+1)/2;  // Gauss's formula for the sum of 1...n

// here we're using the sin x ~= x approximation, which works for very small x
double moonDiameterInMeters = moonDistanceInMeters * apparentAngleInRadians;

The dayOfYear code needs some comments — where would you put them? For example, where would you document whether month runs from 0 to 11 or from 1 to 12?

reading exercises

Comments where needed

Which comments are useful additions to the code? Consider each comment independently, as if the other comments weren’t there.

/**
 * @param month month of the year, where January=1 and December=12  [C1]
 */
public static int dayOfYear(int month, int dayOfMonth, int year) {
    if (month == 2) {      // we're in February  [C2]
        dayOfMonth += 31;  // add in the days of January that already passed  [C3]
    } else if (month == 3) {
        dayOfMonth += 59;  // month is 3 here  [C4]
    } else if (month == 4) {
        dayOfMonth += 90;
    }
    ...
    } else if (month == 12) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31 + 31;
    }
    return dayOfMonth; // the answer  [C5]
}

(missing explanation)

Fail fast

Failing fast means that code should reveal its bugs as early as possible. The earlier a problem is observed (the closer to its cause), the easier it is to find and fix. As we saw in the first reading, static checking fails faster than dynamic checking, and dynamic checking fails faster than producing a wrong answer that may corrupt subsequent computation.

The dayOfYear function doesn’t fail fast — if you pass it the arguments in the wrong order, it will quietly return the wrong answer. In fact, the way dayOfYear is designed, it’s highly likely that a non-American will pass the arguments in the wrong order! It needs more checking — either static checking or dynamic checking.

reading exercises

Fail fast

public static int dayOfYear(int month, int dayOfMonth, int year) {
    if (month == 2) {
        dayOfMonth += 31;
    } else if (month == 3) {
        dayOfMonth += 59;
    } else if (month == 4) {
        dayOfMonth += 90;
    } else if (month == 5) {
        dayOfMonth += 31 + 28 + 31 + 30;
    } else if (month == 6) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31;
    } else if (month == 7) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30;
    } else if (month == 8) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31;
    } else if (month == 9) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31;
    } else if (month == 10) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30;
    } else if (month == 11) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31;
    } else if (month == 12) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31 + 31;
    }
    return dayOfMonth;
}

Suppose the date is February 9, 2019. The correct dayOfYear() result for this date is 40, since it’s the fortieth day of the year.

People write dates in various ways, and date libraries can represent dates in various ways. A programmer’s prior experience can easily lead them to make wrong assumptions about the arguments of dayOfYear().

Here are some ways that a programmer might try to call dayOfYear() for February 9, 2019. Which ones are buggy, and does the bug lead to a static error, dynamic error, or wrong answer?

dayOfYear(2, 9, 2019)

(missing explanation)

dayOfYear(1, 9, 2019)

(missing explanation)

dayOfYear(9, 2, 2019)

(missing explanation)

dayOfYear("February", 9, 2019)

(missing explanation)

dayOfYear(2019, 2, 9)

(missing explanation)

dayOfYear(2, 2019, 9)

(missing explanation)

Fail faster

Which of the following design changes (considered separately) would make the code fail faster if it were called with arguments in the wrong order?

public static int dayOfYear(String month, int dayOfMonth, int year) {
    ...
}

(missing explanation)

public static int dayOfYear(int month, int dayOfMonth, int year) {
    if (month < 1 || month > 12) {
        return -1;
    }
    ...
}

(missing explanation)

public static int dayOfYear(int month, int dayOfMonth, int year) {
    if (month < 1 || month > 12) {
        throw new IllegalArgumentException();
    }
    ...
}

(missing explanation)

public enum Month { JANUARY, FEBRUARY, MARCH, ..., DECEMBER };
public static int dayOfYear(Month month, int dayOfMonth, int year) {
    ...
}

(missing explanation)

public static int dayOfYear(int month, int dayOfMonth, int year) {
    if (month == 1) {
        ...
    } else if (month == 2) {
        ...
    }
    ...
    } else if (month == 12) {
        ...
    } else {
        throw new IllegalArgumentException("month out of range");
    }
}

(missing explanation)

Avoid magic numbers

There’s a computer science joke that the only numbers that computer scientists understand are 0, 1, and sometimes 2.

All other constants are called magic because they appear as if out of thin air with no explanation.

One way to explain a number is with a comment, but a far better way is to declare the number as a named constant with a good, clear name.

dayOfYear is full of magic numbers, which illustrate some of the reasons they should be avoided:

A number is less readable than a name. In dayOfYear, the month values 2, …, 12 would be far more readable as FEBRUARY, …, DECEMBER.
Constants may need to change in the future. Using a named constant, instead of repeating the literal number in various places, is more ready for change.
Constants may be dependent on other constants. In dayOfYear, the mysterious numbers 59 and 90 are particularly pernicious examples. Not only are they uncommented and undocumented, they are actually the result of a computation done by hand by the programmer – summing the lengths of certain months. This is less easy to understand, less ready for change, and definitely less safe from bugs. Don’t hardcode numbers that you’ve computed by hand. Use a named constant that visibly computes the relationship in terms of other named constants.

What about constants that seem constant and eternal, like π, or the total angle of a triangle, or the gravitational constant G? Is it necessary to give names to fundamental constants of mathematics and physics that don’t depend on other constants? First, the number itself may be complex to express, so repeating it multiple times is not safe from bugs. Is it easy to tell which is right: 3.14159265358979323846 or 3.1415926538979323846? Second, even these numbers encode design decisions that might change in the future, such as the number of digits of precision, or the units of measurement. A named constant is more ready for change when those design decisions change.

If you have a profusion of magic numbers in your code, it can be a sign that you need to take a step back and treat those numbers as data rather than named constants, and put them into a data structure that allows for simpler code. In dayOfYear, the lengths of the months (30, 31, 28, etc.) would be more readable, and far more DRY, if they were instead stored in a data structure like an array, list, or map, e.g. MONTH_LENGTH[month].

reading exercises

Avoid magic numbers

In the code:

if (month == 2) { ... }

what might a reasonable programmer plausibly assume about the meaning of the magic number 2?

2 might mean January

2 might mean February

2 might mean March

2 might mean the year 2 AD

(missing explanation)

What happens when you assume

Suppose you’re reading some code that uses a turtle graphics library that you don’t know well, and you see the code:

turtle.rotate(3);

Just from reading this line (not consulting the documentation), which of the following are likely assumptions you might make about the meaning of the number 3?

3 might mean 3 degrees clockwise

3 might mean 3 degrees counterclockwise

3 might mean 3 radians clockwise

3 might mean 3 full revolutions

(missing explanation)

Names instead of numbers

Consider this code, which is intended to draw a regular pentagon:

for (int i = 0; i < 5; ++i) {
    turtle.forward(36);
    turtle.turn(72);
}

The magic numbers in this code cause it to fail all three of our measures of code quality: it’s not safe from bugs (SFB), not easy to understand (ETU) and not ready for change (RFC).

For each of the following rewrites, judge whether it improves SFB, ETU, and/or RFC, or none of the above.

final int five = 5;
final int thirtySix = 36;
final int seventyTwo = 72;
for (int i = 0; i < five; ++i) {
    turtle.forward(thirtySix);
    turtle.turn(seventyTwo);
}

no improvement (or worse)

safer from bugs

easier to understand

more ready for change

(missing explanation)

int[] numbers = new int[] { 5, 36, 72 };
for (int i = 0; i < numbers[0]; ++i) {
    turtle.forward(numbers[1]);
    turtle.turn(numbers[2]);
}

no improvement (or worse)

safer from bugs

easier to understand

more ready for change

(missing explanation)

int x = 5;
for (int i = 0; i < x; ++i) {
    turtle.forward(36);
    turtle.turn(360.0 / x);
}

no improvement (or worse)

safer from bugs

easier to understand

more ready for change

(missing explanation)

final double oneRevolution = 360.0;
final int numSides = 5;
final int sideLength = 36;
for (int i = 0; i < numSides; ++i) {
    turtle.forward(sideLength);
    turtle.turn(oneRevolution / numSides);
}

no improvement (or worse)

safer from bugs

easier to understand

more ready for change

(missing explanation)

Now you see it, now you don’t

In which of the following lines of code is 0 a magic number?

for (int i = 0; i < total; ++i) { ... }

if (date.getMonth() == 0) { ... }

if (list.size() == 0) { ... }

int count = 0;

setSortingAlgorithm(0);

(missing explanation)

One purpose for each variable

In the dayOfYear example, the parameter dayOfMonth is reused to compute a very different value — the return value of the function, which is not the day of the month.

Don’t reuse parameters, and don’t reuse variables. Variables are not a scarce resource in programming. Introduce them freely, give them good names, and just stop using them when you stop needing them. You will confuse your reader if a variable that used to mean one thing suddenly starts meaning something different a few lines down.

Not only is this an ease-of-understanding question, but it’s also a safety-from-bugs and ready-for-change question.

Method parameters, in particular, should generally be left unmodified. (This is important for being ready-for-change — in the future, some other part of the method may want to know what the original parameters of the method were, so you shouldn’t blow them away while you’re computing.) It’s a good idea to use final for method parameters, and as many other variables as you can. The final keyword says that the variable should never be reassigned, and the Java compiler will check it statically. For example:

public static int dayOfYear(final int month, final int dayOfMonth, final int year) {
    ...
}

Smelly example #2

There was a latent bug in dayOfYear. It didn’t handle leap years at all. As part of fixing that, suppose we write a leap-year method.

public static boolean leap(int y) {
    String tmp = String.valueOf(y);
    if (tmp.charAt(2) == '1' || tmp.charAt(2) == '3' || tmp.charAt(2) == 5 || tmp.charAt(2) == '7' || tmp.charAt(2) == '9') {
        if (tmp.charAt(3)=='2'||tmp.charAt(3)=='6') return true; /*R1*/
        else
            return false; /*R2*/
    }else{
        if (tmp.charAt(2) == '0' && tmp.charAt(3) == '0') {
            return false; /*R3*/
        }
        if (tmp.charAt(3)=='0'||tmp.charAt(3)=='4'||tmp.charAt(3)=='8')return true; /*R4*/
    }
    return false; /*R5*/
}

What are the bugs hidden in this code? And what style problems that we’ve already talked about?

reading exercises

Mental execution 2016

What happens when you call:

leap(2016)

returns true on line R1

returns false on line R2

returns false on line R3

returns true on line R4

returns false on line R5

error before program starts

error while program is running

(missing explanation)

Mental execution 2017

What happens when you call:

leap(2017)

returns true on line R1

returns false on line R2

returns false on line R3

returns true on line R4

returns false on line R5

error before program starts

error while program is running

(missing explanation)

Mental execution 2050

What happens when you call:

leap(2050)

returns true on line R1

returns false on line R2

returns false on line R3

returns true on line R4

returns false on line R5

error before program starts

error while program is running

(missing explanation)

Mental execution 10016

What happens when you call:

leap(10016)

returns true on line R1

returns false on line R2

returns false on line R3

returns true on line R4

returns false on line R5

error before program starts

error while program is running

(missing explanation)

Mental execution 916

What happens when you call:

leap(916)

returns true on line R1

returns false on line R2

returns false on line R3

returns true on line R4

returns false on line R5

error before program starts

error while program is running

(missing explanation)

Magic numbers

How many magic numbers are in this code? Count every occurrence if some appear more than once.

(missing explanation)

DRYing out

Suppose you wrote the helper function:

public static boolean isDivisibleBy(int number, int factor) { return number % factor == 0; }

If leap() were rewritten to use isDivisibleBy(year, ...), and to correctly follow the leap year algorithm, how many magic numbers would be in the code?

(missing explanation)

Use good names

Good method and variable names are long and self-descriptive. Comments can often be avoided entirely by making the code itself more readable, with better names that describe the methods and variables.

For example, you can rewrite

int tmp = 86400;  // tmp is the number of seconds in a day (don't do this!)

as:

final int secondsPerDay = 86400;

In general, variable names like tmp, temp, and data are awful, symptoms of extreme programmer laziness. Every local variable is temporary, and every variable is data, so those names are generally meaningless. Better to use a longer, more descriptive name, so that your code reads clearly all by itself.

Follow the lexical naming conventions of the language. In both Python and Java, classes typically start with a capital letter, and variable names and method names start with a lowercase letter. But the two languages differ in how multi-word phrases are rendered into a method or variable name. Python uses snake_case (underscores separating the words of the phrase), while Java uses camelCase (capitalizing each word after the first, as in startsWith or getFirstName).

Java also uses capitalization to distinguish global constants (public static final) from variables and local constants. ALL_CAPS_WITH_UNDERSCORES is used for static final constants. But the local variables declared inside a method, including local constants like secondsPerDay above, use camelCaseNames.

In any language, method names are usually verb phrases, like getDate or isUpperCase, while variable and class names are usually noun phrases. Choose short words, and be concise, but avoid abbreviations. For example, message is clearer than msg, and word is so much better than wd. Keep in mind that many of your teammates in class and in the real world will not be native English speakers, and abbreviations can be even harder for non-native speakers.

Avoid single-character variable names entirely except where they are easily understood by convention. For example, x and y make sense for Cartesian coordinates, and i and j as integer variables in for loops. But if your code is full of variables like e, f, g, and h, because you’re just picking them from the alphabet, then it will be incredibly hard to read.

Effectively Naming Software Thingies has some excellent advice about naming. Robert Martin’s Clean Code (chapter 2) is also highly recommended.

The leap method has bad names: the method name itself, and the local variable name. What would you call them instead?

reading exercises

Better method names

public static boolean leap(int y) {
    String tmp = String.valueOf(y);
    if (tmp.charAt(2) == '1' || tmp.charAt(2) == '3' || tmp.charAt(2) == 5 || tmp.charAt(2) == '7' || tmp.charAt(2) == '9') {
        if (tmp.charAt(3)=='2'||tmp.charAt(3)=='6') return true;
        else
            return false;
    }else{
        if (tmp.charAt(2) == '0' && tmp.charAt(3) == '0') {
            return false;
        }
        if (tmp.charAt(3)=='0'||tmp.charAt(3)=='4'||tmp.charAt(3)=='8')return true;
    }
    return false;
}

Which of the following are good names for the leap() method?

leap

isLeapYear

IsLeapYear

is_divisible_by_4

(missing explanation)

Better variable names

Which of the following are good names for the tmp variable inside leap()?

(missing explanation)

Skip the comment

Rename this variable so that the comment can be deleted entirely:

int a = 18;  // this variable stores the age when you can first vote

Rename a to:

(missing explanation)

Types

Good names imply something about their type – what kinds of values they can represent. Which of these names say something to the reader about their type?

(missing explanation)

Use whitespace to help the reader

Use consistent indentation. The leap example is bad at this. The dayOfYear example is much better. In fact, dayOfYear nicely lines up all the numbers into columns, making them easy for a human reader to compare and check. That’s a great use of whitespace.

Put spaces within code lines to make them easy to read. The leap example has some lines that are packed together — put in some spaces.

Never use tab characters for indentation, only space characters. Note that we say characters, not keys. We’re not saying you should never press the Tab key, only that your editor should never put a tab character into your source file in response to your pressing the Tab key. The reason for this rule is that different tools treat tab characters differently — sometimes expanding them to 4 spaces, sometimes to 2 spaces, sometimes to 8. If you run “git diff” on the command line, or if you view your source code in a different editor, then the indentation may be completely screwed up. Just use spaces. Always set your programming editor to insert space characters when you press the Tab key. The 6.031 Eclipse installation has already been set up this way.

Smelly example #3

Here’s a third example of smelly code that will illustrate the remaining points of this reading.

public static int LONG_WORD_LENGTH = 5;
public static String longestWord;

public static void countLongWords(String text) {
    String[] words = text.split(' ');
    if (words.length == 0) {
        System.out.println("0");
        return;
    }
    int n = 0;
    longestWord = "";
    for (String word: words) {
        if (word.length() > LONG_WORD_LENGTH) ++n;
        if (word.length() > longestWord.length()) longestWord = word;
    }
    System.out.println(n);
}

Don’t use global variables

Avoid global variables. Let’s break down what we mean by global variable. A global variable is:

a variable, a name whose value can be changed
that is global, accessible and changeable from anywhere in the program.

Global Variables Are Bad has a good list of the dangers of global variables.

In Java, a global variable is declared public static. The public modifier makes it accessible anywhere, and static means there is a single instance of the variable.

However, with one additional modifier, public static final, and if furthermore the variable’s type is immutable, then the name becomes a global constant. A global constant can be read anywhere in the program but never reassigned or mutated, so the risks go away. Global constants are common and useful.

In general, convert global variables into parameters and return values, or put them inside objects that you’re calling methods on. We’ll see many techniques for doing that in future readings.

reading exercises

Identifying global variables

In Smelly Example #3 above, which of these are global variables?

(missing explanation)

Effect of final

Making a variable into a constant by adding the final keyword can eliminate the risk of global variables. What happens to each of these when the final keyword is added? Look carefully at how the variable is used in Smelly Example #3.

n

(missing explanation)

LONG_WORD_LENGTH

(missing explanation)

longestWord

(missing explanation)

word

(missing explanation)

words

(missing explanation)

Kinds of variables in snapshot diagrams

When we’re drawing snapshot diagrams, it’s important to distinguish between different kinds of variables:

a local variable inside a method
an instance variable inside an instance of an object
- an instance variable may also be called a field (particularly in Java), a property (TypeScript/JavaScript), a member variable (C++), or an attribute (Python).
a static variable associated with a class

A local variable comes into existence when a method is called, and then disappears when the method returns. If multiple calls to the same method are in progress (for example because of recursion), then each call will have its own independent local variables.

An instance variable comes into existence when an object is created with new, and then disappears when the object is no longer accessible and becomes garbage-collected. Each object instance has its own instance variables.

A static variable comes into existence when the program starts (or more precisely when the class containing the variable is first loaded, since that might be delayed), and exists for the rest of the life of the program.

Here is a simple piece of code that uses all three kinds of variables:

class Payment {
    public double value;
    public static double taxRate = 0.05;
    public static void main(String[] args) {
        Payment p = new Payment();
        p.value = 100;
        taxRate = 0.05;
        System.out.println(p.value * (1 + taxRate));
    }
}

In the next two exercises, you will construct a snapshot diagram that shows the state of this program.

reading exercises

Drawing the heap

In the box below, use Snapdown to draw a snapshot diagram with the variables p, taxRate, and the values they point to.

Open the Snapdown help sidebar to get the details of Snapdown syntax – it’s a little language for drawing diagrams with text. The first three sections of the sidebar – Primitives, Objects, and Fields – will be useful for this exercise.

Click to show Snapdown help sidebar

(missing explanation)

Drawing the stack

The variable p is local to the main() method. The variable Payment.taxRate is a static variable of the Payment class.

Using stack frames, draw a snapshot diagram that demonstrates the difference between these two variables. The Stack frames section of the Snapdown help sidebar will be useful here.

In the diagram, make sure to use the label Payment.taxRate to indicate that taxRate is part of the Payment class.

Click to show Snapdown help sidebar

Since Snapdown is new: in the box below, please let us know any feedback you have on this sequence of exercises and on Snapdown, especially if you find anything challenging or unintuitive.

(missing explanation)

Methods should return results, not print them

countLongWords isn’t ready for change. It sends some of its result to the console, System.out. That means that if you want to use it in another context — where the number is needed for some other purpose, like computation rather than human eyes — it would have to be rewritten.

In general, only the highest-level parts of a program should interact with the human user or the console. Lower-level parts should take their input as parameters and return their output as results. The sole exception here is debugging output, which can of course be printed to the console. But that kind of output shouldn’t be a part of your design, only a part of how you debug your design.

Avoid special-case code

Programmers are often tempted to write special code to deal with what seem like special cases – parameters that are 0, for example, or empty lists, or empty strings. The countLongWords example falls into this trap by handling an empty words list specially:

    if (words.size() == 0) {
        System.out.println(0);
        return;
    }
    int n = 0;
    longestWord = "";
    for (String word: words) {
        if (word.length() > LONG_WORD_LENGTH) ++n;
        if (word.length() > longestWord.length()) longestWord = word;
    }
    System.out.println(n);
}

The starting if statement is unnecessary. If it were omitted, and the words list is empty, then the for loop simply does nothing, and 0 is printed anyway. In fact, handling this special case separately has led to a possible bug – a difference in the way an empty list is handled, compared to a nonempty list that happens to have no long words on it.

Actively resist the temptation to handle special cases separately. If you find yourself writing an if statement for a special case, stop what you’re doing, and instead think harder about the general-case code, either to confirm that it can actually already handle the special case you’re worrying about (which is often true!), or put in a little more effort to make it handle the special case. If you haven’t even written the general-case code yet, but are just trying to deal with the easy cases first, then you’re doing it in the wrong order. Tackle the general case first.

Writing broader, general-case code pays off. It results in a shorter method, which is easier to understand and has fewer places for bugs to hide. It is likely to be safer from bugs, because it makes fewer assumptions about the values it is working with. And it is more ready for change, because there are fewer places to update when a change to the method’s behavior is made.

Some programmers justify handling special cases separately with a belief that it increases the overall performance of the method, by returning a hardcoded answer for a special case right away. For example, when writing a sort algorithm, it can be tempting to check whether the size of the list is 0 or 1 at the very start of the method, since you can then return immediately with no need to sort at all. Or if the size is 2, just do a comparison and a possible swap. These optimizations might indeed make sense — but not until you have evidence that they actually would matter to the speed of the program! If the sort method is almost never called with these special cases, then adding code for them just adds complexity, overhead, and hiding-places for bugs, without any practical improvement in performance. Write a clean, simple, general-case algorithm first, and optimize it later, only if it would actually help.

reading exercises

Bug hiding in a special case

Smelly Example #3 (reproduced below) starts with an if statement that handles a special case, prints 0, and returns immediately. But this makes the behavior of the special case inconsistent with other cases where the code would print 0, for example when the list is nonempty but has no long words on it. Which line of code from the general-case code did the special case also need to do?

         if (words.size() == 0) {
             System.out.println(0);
             return;
         }
/*A*/    int n = 0;
/*B*/    longestWord = "";
/*C*/    for (String word: words) {
/*D*/        if (word.length() > LONG_WORD_LENGTH) ++n;
/*E*/        if (word.length() > longestWord.length()) longestWord = word;
         }
/*F*/    System.out.println(n);
}

(missing explanation)

Summary

Code review is a widely-used technique for improving software quality by human inspection. Code review can detect many kinds of problems in code, but as a starter, this reading talked about these general principles of good code:

Don’t Repeat Yourself (DRY)
Comments where needed
Fail fast
Avoid magic numbers
One purpose for each variable
Use good names
Use whitespace to help the reader
Don’t use global variables
Methods should return results, not print them
Avoid special-case code

The topics of today’s reading connect to our three key properties of good software as follows:

Safe from bugs. In general, code review uses human reviewers to find bugs. DRY code lets you fix a bug in only one place, without fear that it has propagated elsewhere. Documenting your assumptions with clear comments makes it less likely that another programmer will introduce a bug. The Fail Fast principle detects bugs as early as possible. Avoiding global variables makes it easier to localize bugs related to variable values, since non-global variables can be changed in only limited places in the code.
Easy to understand. Code review is really the only way to find obscure or confusing code, because other people are reading it and trying to understand it. Using judicious comments, avoiding magic numbers, keeping one purpose for each variable, using good names, and using whitespace well can all improve the understandability of code.
Ready for change. Code review helps here when it’s done by experienced software developers who can anticipate what might change and suggest ways to guard against it. DRY code is more ready for change, because a change only needs to be made in one place. Returning results instead of printing them makes it easier to adapt the code to a new purpose.

Code review is not the only software development practice with strong supporting evidence. Another is: sleep.

Asking questions about the reading

Have a question about the reading, or about a reading exercise? Post your question on Piazza, and include a link that jumps directly to the part of the reading or the reading exercise that you’re asking about. You can get that link by moving your mouse cursor into the left margin until an octothorpe # link appears next to what you want to refer to. Click on that #, and your browser’s address bar now has the link you want.

reading exercises

Every question is a good question

Find the paragraph in this reading that defines camelCase, get the link that jumps right to that paragraph, and paste the link here:

(missing explanation)

Jumping in

Find the link that jumps right to this exercise, and paste the link here:

Remember the exercises

At this point you should have completed all the reading exercises above.

Completing the reading exercises prepares you for the nanoquiz at the beginning of each class meeting, and submitting the exercises is required by 10:00pm US Eastern time the evening before class.

At this point you should have also completed these additional levels of the Java Tutor:

More practice

If you would like to get more practice with the concepts covered in this reading, you can visit the question bank. The questions in this bank were written in previous semesters by students and staff, and are provided for review purposes only – doing them will not affect your classwork grades.