Reading 3: Code Review
Reading exercises and Java Tutor exercises are due the night before class. You must complete the exercises in this reading by Tuesday, September 8 at 10:00 pm MIT time.
Java Tutor exercises
Keep making progress on Java by completing the next three categories in the Java Tutor:
Software in 6.031
Objectives
In today’s class, we will practice:
- code review: reading and discussing code written by somebody else
- general principles of good coding: things you can look for in every code review, regardless of programming language or program purpose
Code review
Code review is careful, systematic study of source code by people who are not the original author of the code. It’s analogous to proofreading a paper.
Code review really has two purposes:
- Improving the code. Finding bugs, anticipating possible bugs, checking the clarity of the code, and checking for consistency with the project’s style standards.
- Improving the programmer. Code review is an important way that programmers learn and teach each other, about new language features, changes in the design of the project or its coding standards, and new techniques. In open source projects, particularly, much conversation happens in the context of code reviews.
Code review is widely practiced in open source projects like Apache and Mozilla. Code review is also widely practiced in industry. In Google’s code review process, for example, you can’t push any code into the main repository until another engineer has signed off on it in a code review.
Why is code review widely used? There are many ideas out there for software development processes (some with buzzwords like Agile and Scrum). But code review is a practice with good evidence that it actually works. Hillel Wayne has collected some of the research highlights, among them that code review can find 70-90% of software defects, and that overwhelming numbers of software engineers at Google and Microsoft find it valuable and worth doing.
In 6.031, we’ll do code review on problem sets, as described in the Code Reviewing document on the course website.
Style standards
Most companies and large projects have coding style standards. These can get pretty detailed, even to the point of specifying whitespace (how deep to indent) and where curly braces and parentheses should go. These kinds of questions often lead to holy wars since they end up being a matter of taste and style.
In 6.031, we have no official style guide. If you are new to Java and want a recommendation, Google Java Style is widely used and appealing for its readability, particularly rules like these:
- space after keyword (
if
), but not after function name (isOdd
) {
at the end of a line,}
on a line by itself{
…}
around all blocks, even empty or one-statement blocks
Eclipse has an autoformatter (Source → Format) whose default rules are similar to Google style.
But we won’t dictate where to put your curly braces in this class. That’s a personal decision that each programmer should make. It’s important to be self-consistent, however, and it’s very important to follow the conventions of the project you’re working on. If you’re the programmer who reformats every module you touch to match your personal style, your teammates will hate you, and rightly so. Be a team player.
But there are some rules that are quite sensible and target our big three properties, in a stronger way than placing curly braces. The rest of this reading talks about some of these rules, at least the ones that are relevant at this point in the course, where we’re mostly talking about writing basic Java. These are some things you should start to look for when you’re code reviewing other students, and when you’re looking at your own code for improvement. Don’t consider it an exhaustive list of code style guidelines, however. Over the course of the semester, we’ll talk about a lot more things — specifications, abstract data types with representation invariants, concurrency and thread safety — which will then become fodder for code review.
Smelly example #1
Programmers often describe bad code as having a “bad smell” that needs to be removed. “Code hygiene” is another word for this. Let’s start with some smelly code.
public static int dayOfYear(int month, int dayOfMonth, int year) {
if (month == 2) {
dayOfMonth += 31;
} else if (month == 3) {
dayOfMonth += 59;
} else if (month == 4) {
dayOfMonth += 90;
} else if (month == 5) {
dayOfMonth += 31 + 28 + 31 + 30;
} else if (month == 6) {
dayOfMonth += 31 + 28 + 31 + 30 + 31;
} else if (month == 7) {
dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30;
} else if (month == 8) {
dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31;
} else if (month == 9) {
dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31;
} else if (month == 10) {
dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30;
} else if (month == 11) {
dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31;
} else if (month == 12) {
dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31 + 31;
}
return dayOfMonth;
}
The next few sections and exercises will pick out the particular smells in this code example.
Don’t repeat yourself (DRY)
Duplicated code is a risk to safety. If you have identical or very similar code in two places, then the fundamental risk is that there’s a bug in both copies, and some maintainer fixes the bug in one place but not the other.
Avoid duplication like you’d avoid crossing the street without looking. Copy-and-paste is an enormously tempting programming tool, and you should feel a frisson of danger run down your spine every time you use it. The longer the block you’re copying, the riskier it is.
Don’t Repeat Yourself, or DRY for short, has become a programmer’s mantra.
The dayOfYear
example is full of identical code. How would you DRY it out?
reading exercises
Comments where needed
Good software developers write comments in their code, and do it judiciously. Good comments should make the code easier to understand, safer from bugs (because important assumptions have been documented), and readier for change.
One kind of crucial comment is a specification, which appears above a method or above a class and documents the behavior of the method or class.
In Java, this is conventionally written as a Javadoc comment, meaning that it starts with /**
and includes @
-syntax, like @param
and @return
for methods.
Here’s an example of a spec:
/**
* Compute the hailstone sequence.
* See http://en.wikipedia.org/wiki/Collatz_conjecture#Statement_of_the_problem
* @param n starting number of sequence; requires n > 0.
* @return the hailstone sequence starting at n and ending with 1.
* For example, hailstone(3)=[3,10,5,16,8,4,2,1].
*/
public static List<Integer> hailstoneSequence(int n) {
...
}
Specifications document assumptions. We’ve already mentioned specs a few times, and there will be much more to say about them in a future reading.
Another crucial comment is one that specifies the provenance or source of a piece of code that was copied or adapted from elsewhere. This is vitally important for practicing software developers, and is required by the 6.031 collaboration policy when you adapt code you found on the web. Here is an example:
// read a web page into a string
// see http://stackoverflow.com/questions/4328711/read-url-to-string-in-few-lines-of-java-code
String mitHomepage = new Scanner(new URL("http://www.mit.edu").openStream(), "UTF-8").useDelimiter("\\A").next();
One reason for documenting sources is to avoid violations of copyright. Small snippets of code on Stack Overflow are typically in the public domain, but code copied from other sources may be proprietary or covered by other kinds of open source licenses, which are more restrictive. Another reason for documenting sources is that the code can fall out of date; the Stack Overflow answer from which this code came has evolved significantly in the years since it was first answered.
Some comments are bad and unnecessary. Direct transliterations of code into English, for example, do nothing to improve understanding, because you should assume that your reader at least knows Java:
while (n != 1) { // test whether n is 1 (don't write comments like this!)
++i; // increment i
l.add(n); // add n to l
}
But obscure code should get a comment:
int sum = n*(n+1)/2; // Gauss's formula for the sum of 1...n
// here we're using the sin x ~= x approximation, which works for very small x
double moonDiameterInMeters = moonDistanceInMeters * apparentAngleInRadians;
The dayOfYear
code needs some comments — where would you put them?
For example, where would you document whether month
runs from 0 to 11 or from 1 to 12?
reading exercises
Which comments are useful additions to the code? Consider each comment independently, as if the other comments weren’t there.
/**
* @param month month of the year, where January=1 and December=12 [C1]
*/
public static int dayOfYear(int month, int dayOfMonth, int year) {
if (month == 2) { // we're in February [C2]
dayOfMonth += 31; // add in the days of January that already passed [C3]
} else if (month == 3) {
dayOfMonth += 59; // month is 3 here [C4]
} else if (month == 4) {
dayOfMonth += 90;
}
...
} else if (month == 12) {
dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31 + 31;
}
return dayOfMonth; // the answer [C5]
}
(missing explanation)
Fail fast
Failing fast means that code should reveal its bugs as early as possible. The earlier a problem is observed (the closer to its cause), the easier it is to find and fix. As we saw in the first reading, static checking fails faster than dynamic checking, and dynamic checking fails faster than producing a wrong answer that may corrupt subsequent computation.
The dayOfYear
function doesn’t fail fast — if you pass it the arguments in the wrong order, it will quietly return the wrong answer.
In fact, the way dayOfYear
is designed, it’s highly likely that a non-American will pass the arguments in the wrong order!
It needs more checking — either static checking or dynamic checking.
reading exercises
public static int dayOfYear(int month, int dayOfMonth, int year) {
if (month == 2) {
dayOfMonth += 31;
} else if (month == 3) {
dayOfMonth += 59;
} else if (month == 4) {
dayOfMonth += 90;
} else if (month == 5) {
dayOfMonth += 31 + 28 + 31 + 30;
} else if (month == 6) {
dayOfMonth += 31 + 28 + 31 + 30 + 31;
} else if (month == 7) {
dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30;
} else if (month == 8) {
dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31;
} else if (month == 9) {
dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31;
} else if (month == 10) {
dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30;
} else if (month == 11) {
dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31;
} else if (month == 12) {
dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31 + 31;
}
return dayOfMonth;
}
Suppose the date is February 9, 2019. The correct dayOfYear()
result for this date is 40, since it’s the fortieth day of the year.
People write dates in various ways, and date libraries can represent dates in various ways.
A programmer’s prior experience can easily lead them to make wrong assumptions about the arguments of dayOfYear()
.
Here are some ways that a programmer might try to call dayOfYear()
for February 9, 2019.
Which ones are buggy, and does the bug lead to a static error, dynamic error, or wrong answer?
dayOfYear(2, 9, 2019)
(missing explanation)
dayOfYear(1, 9, 2019)
(missing explanation)
dayOfYear(9, 2, 2019)
(missing explanation)
dayOfYear("February", 9, 2019)
(missing explanation)
dayOfYear(2019, 2, 9)
(missing explanation)
dayOfYear(2, 2019, 9)
(missing explanation)
Which of the following design changes (considered separately) would make the code fail faster if it were called with arguments in the wrong order?
public static int dayOfYear(String month, int dayOfMonth, int year) {
...
}
(missing explanation)
public static int dayOfYear(int month, int dayOfMonth, int year) {
if (month < 1 || month > 12) {
return -1;
}
...
}
(missing explanation)
public static int dayOfYear(int month, int dayOfMonth, int year) {
if (month < 1 || month > 12) {
throw new IllegalArgumentException();
}
...
}
(missing explanation)
public enum Month { JANUARY, FEBRUARY, MARCH, ..., DECEMBER };
public static int dayOfYear(Month month, int dayOfMonth, int year) {
...
}
(missing explanation)
public static int dayOfYear(int month, int dayOfMonth, int year) {
if (month == 1) {
...
} else if (month == 2) {
...
}
...
} else if (month == 12) {
...
} else {
throw new IllegalArgumentException("month out of range");
}
}
(missing explanation)
Avoid magic numbers
There’s a computer science joke that the only numbers that computer scientists understand are 0, 1, and sometimes 2.
All other constants are called magic because they appear as if out of thin air with no explanation.
One way to explain a number is with a comment, but a far better way is to declare the number as a named constant with a good, clear name.
dayOfYear
is full of magic numbers, which illustrate some of the reasons they should be avoided:
A number is less readable than a name. In
dayOfYear
, the month values 2, …, 12 would be far more readable asFEBRUARY
, …,DECEMBER
.Constants may need to change in the future. Using a named constant, instead of repeating the literal number in various places, is more ready for change.
Constants may be dependent on other constants. In
dayOfYear
, the mysterious numbers 59 and 90 are particularly pernicious examples. Not only are they uncommented and undocumented, they are actually the result of a computation done by hand by the programmer – summing the lengths of certain months. This is less easy to understand, less ready for change, and definitely less safe from bugs. Don’t hardcode numbers that you’ve computed by hand. Use a named constant that visibly computes the relationship in terms of other named constants.
What about constants that seem constant and eternal, like π, or the total angle of a triangle, or the gravitational constant G
?
Is it necessary to give names to fundamental constants of mathematics and physics that don’t depend on other constants?
First, the number itself may be complex to express, so repeating it multiple times is not safe from bugs.
Is it easy to tell which is right: 3.14159265358979323846
or 3.1415926538979323846
?
Second, even these numbers encode design decisions that might change in the future, such as the number of digits of precision, or the units of measurement.
A named constant is more ready for change when those design decisions change.
If you have a profusion of magic numbers in your code, it can be a sign that you need to take a step back and treat those numbers as data rather than named constants, and put them into a data structure that allows for simpler code.
In dayOfYear
, the lengths of the months (30, 31, 28, etc.) would be more readable, and far more DRY, if they were instead stored in a data structure like an array, list, or map, e.g. MONTH_LENGTH[month]
.
reading exercises
Suppose you’re reading some code that uses a turtle graphics library that you don’t know well, and you see the code:
turtle.rotate(3);
Just from reading this line (not consulting the documentation), which of the following are likely assumptions you might make about the meaning of the number 3?
(missing explanation)
Consider this code, which is intended to draw a regular pentagon:
for (int i = 0; i < 5; ++i) {
turtle.forward(36);
turtle.turn(72);
}
The magic numbers in this code cause it to fail all three of our measures of code quality: it’s not safe from bugs (SFB), not easy to understand (ETU) and not ready for change (RFC).
For each of the following rewrites, judge whether it improves SFB, ETU, and/or RFC, or none of the above.
final int five = 5;
final int thirtySix = 36;
final int seventyTwo = 72;
for (int i = 0; i < five; ++i) {
turtle.forward(thirtySix);
turtle.turn(seventyTwo);
}
(missing explanation)
int[] numbers = new int[] { 5, 36, 72 };
for (int i = 0; i < numbers[0]; ++i) {
turtle.forward(numbers[1]);
turtle.turn(numbers[2]);
}
(missing explanation)
int x = 5;
for (int i = 0; i < x; ++i) {
turtle.forward(36);
turtle.turn(360.0 / x);
}
(missing explanation)
final double oneRevolution = 360.0;
final int numSides = 5;
final int sideLength = 36;
for (int i = 0; i < numSides; ++i) {
turtle.forward(sideLength);
turtle.turn(oneRevolution / numSides);
}
(missing explanation)
One purpose for each variable
In the dayOfYear
example, the parameter dayOfMonth
is reused to compute a very different value — the return value of the function, which is not the day of the month.
Don’t reuse parameters, and don’t reuse variables. Variables are not a scarce resource in programming. Introduce them freely, give them good names, and just stop using them when you stop needing them. You will confuse your reader if a variable that used to mean one thing suddenly starts meaning something different a few lines down.
Not only is this an ease-of-understanding question, but it’s also a safety-from-bugs and ready-for-change question.
Method parameters, in particular, should generally be left unmodified. (This is important for being ready-for-change — in the future, some other part of the method may want to know what the original parameters of the method were, so you shouldn’t blow them away while you’re computing.)
It’s a good idea to use final
for method parameters, and as many other variables as you can. The final
keyword says that the variable should never be reassigned, and the Java compiler will check it statically. For example:
public static int dayOfYear(final int month, final int dayOfMonth, final int year) {
...
}
Smelly example #2
There was a latent bug in dayOfYear
. It didn’t handle leap years at all. As part of fixing that, suppose we write a leap-year method.
public static boolean leap(int y) {
String tmp = String.valueOf(y);
if (tmp.charAt(2) == '1' || tmp.charAt(2) == '3' || tmp.charAt(2) == 5 || tmp.charAt(2) == '7' || tmp.charAt(2) == '9') {
if (tmp.charAt(3)=='2'||tmp.charAt(3)=='6') return true; /*R1*/
else
return false; /*R2*/
}else{
if (tmp.charAt(2) == '0' && tmp.charAt(3) == '0') {
return false; /*R3*/
}
if (tmp.charAt(3)=='0'||tmp.charAt(3)=='4'||tmp.charAt(3)=='8')return true; /*R4*/
}
return false; /*R5*/
}
reading exercises
Suppose you wrote the helper function:
public static boolean isDivisibleBy(int number, int factor) { return number % factor == 0; }
If leap()
were rewritten to use isDivisibleBy(year, ...)
, and to correctly follow the leap year algorithm, how many magic numbers would be in the code?
(missing explanation)
Use good names
Good method and variable names are long and self-descriptive. Comments can often be avoided entirely by making the code itself more readable, with better names that describe the methods and variables.
int tmp = 86400; // tmp is the number of seconds in a day (don't do this!)
as:
final int secondsPerDay = 86400;
In general, variable names like tmp
, temp
, and data
are awful, symptoms of extreme programmer laziness. Every local variable is temporary, and every variable is data, so those names are generally meaningless. Better to use a longer, more descriptive name, so that your code reads clearly all by itself.
Follow the lexical naming conventions of the language.
In both Python and Java, classes typically start with a capital letter, and variable names and method names start with a lowercase letter.
But the two languages differ in how multi-word phrases are rendered into a method or variable name.
Python uses snake_case (underscores separating the words of the phrase), while Java uses camelCase (capitalizing each word after the first, as in startsWith
or getFirstName
).
Java also uses capitalization to distinguish global constants (public static final
) from variables and local constants.
ALL_CAPS_WITH_UNDERSCORES
is used for static final
constants.
But the local variables declared inside a method, including local constants like secondsPerDay
above, use camelCaseNames.
In any language, method names are usually verb phrases, like getDate
or isUpperCase
, while variable and class names are usually noun phrases. Choose short words, and be concise, but avoid abbreviations. For example, message
is clearer than msg
, and word
is so much better than wd
. Keep in mind that many of your teammates in class and in the real world will not be native English speakers, and abbreviations can be even harder for non-native speakers.
Avoid single-character variable names entirely except where they are easily understood by convention.
For example, x
and y
make sense for Cartesian coordinates, and i
and j
as integer variables in for
loops.
But if your code is full of variables like e
, f
, g
, and h
, because you’re just picking them from the alphabet, then it will be incredibly hard to read.
Effectively Naming Software Thingies has some excellent advice about naming.
The leap
method has bad names: the method name itself, and the local variable name. What would you call them instead?
reading exercises
public static boolean leap(int y) {
String tmp = String.valueOf(y);
if (tmp.charAt(2) == '1' || tmp.charAt(2) == '3' || tmp.charAt(2) == 5 || tmp.charAt(2) == '7' || tmp.charAt(2) == '9') {
if (tmp.charAt(3)=='2'||tmp.charAt(3)=='6') return true;
else
return false;
}else{
if (tmp.charAt(2) == '0' && tmp.charAt(3) == '0') {
return false;
}
if (tmp.charAt(3)=='0'||tmp.charAt(3)=='4'||tmp.charAt(3)=='8')return true;
}
return false;
}
Which of the following are good names for the leap()
method?
(missing explanation)
Use whitespace to help the reader
Use consistent indentation. The leap
example is bad at this. The dayOfYear
example is much better. In fact, dayOfYear
nicely lines up all the numbers into columns, making them easy for a human reader to compare and check. That’s a great use of whitespace.
Put spaces within code lines to make them easy to read. The leap example has some lines that are packed together — put in some spaces.
Never use tab characters for indentation, only space characters. Note that we say characters, not keys. We’re not saying you should never press the Tab key, only that your editor should never put a tab character into your source file in response to your pressing the Tab key. The reason for this rule is that different tools treat tab characters differently — sometimes expanding them to 4 spaces, sometimes to 2 spaces, sometimes to 8. If you run “git diff” on the command line, or if you view your source code in a different editor, then the indentation may be completely screwed up. Just use spaces. Always set your programming editor to insert space characters when you press the Tab key. The 6.031 Eclipse installation has already been set up this way.
Smelly example #3
Here’s a third example of smelly code that will illustrate the remaining points of this reading.
public static int LONG_WORD_LENGTH = 5;
public static String longestWord;
public static void countLongWords(String text) {
String[] words = text.split(' ');
if (words.length == 0) {
System.out.println("0");
return;
}
int n = 0;
longestWord = "";
for (String word: words) {
if (word.length() > LONG_WORD_LENGTH) ++n;
if (word.length() > longestWord.length()) longestWord = word;
}
System.out.println(n);
}
Don’t use global variables
Avoid global variables. Let’s break down what we mean by global variable. A global variable is:
- a variable, a name whose value can be changed
- that is global, accessible and changeable from anywhere in the program.
Global Variables Are Bad has a good list of the dangers of global variables.
In Java, a global variable is declared public static
. The public
modifier makes it accessible anywhere, and static
means there is a single instance of the variable.
However, with one additional modifier, public static final
, and if furthermore the variable’s type is immutable, then the name becomes a global constant.
A global constant can be read anywhere in the program but never reassigned or mutated, so the risks go away.
Global constants are common and useful.
In general, convert global variables into parameters and return values, or put them inside objects that you’re calling methods on. We’ll see many techniques for doing that in future readings.
Kinds of variables in snapshot diagrams
When we’re drawing snapshot diagrams, it’s important to distinguish between different kinds of variables:
- local variables inside a method
- instance variables inside an instance of an object
- static variables associated with a class
A local variable comes into existence when a method is called, and then disappears when the method returns. If multiple calls to the same method are in progress (for example because of recursion), then each call will have its own independent local variables.
An instance variable comes into existence when an object is created with new
, and then disappears when the object is no longer accessible and becomes garbage-collected.
Each object instance has its own instance variables.
A static variable comes into existence when the program starts (or more precisely when the class containing the variable is first loaded, since that might be delayed), and exists for the rest of the life of the program.
Here is a simple piece of code that uses all three kinds of variables:
class Payment {
public double value;
public static double taxRate = 0.05;
public static void main(String[] args) {
Payment p = new Payment();
p.value = 100;
taxRate = 0.05;
System.out.println(p.value * (1 + taxRate));
}
}
A snapshot diagram shows each variable differently (see the diagram on the right).
The local variables p
and args
are shown in a stack frame representing a call to the main
method.
Stack frames are rectangular and pile on top of each other.
For example, when main
calls println
, then a stack frame for println
appears on top of the main
frame, containing the local variables for println
.
The frame disappears when println
returns.
The instance variable value
appears inside every Payment object.
The static variable taxRate
appears outside any Payment object, because it belongs to the class.
Any number of Payment
objects can be created and destroyed, and their value
variables along with them, but there is one and only one Payment class in the whole program, so there is likewise exactly one Payment.taxRate
variable.
System.out
is another static variable used in the code, so the snapshot diagram shows it as well.
reading exercises
Making a variable into a constant by adding the final
keyword can eliminate the risk of global variables. What happens to each of these when the final
keyword is added? Look carefully at how the variable is used in Smelly Example #3.
(missing explanation)
(missing explanation)
(missing explanation)
(missing explanation)
(missing explanation)
Methods should return results, not print them
countLongWords
isn’t ready for change. It sends some of its result to the console, System.out
. That means that if you want to use it in another context — where the number is needed for some other purpose, like computation rather than human eyes — it would have to be rewritten.
In general, only the highest-level parts of a program should interact with the human user or the console. Lower-level parts should take their input as parameters and return their output as results. The sole exception here is debugging output, which can of course be printed to the console. But that kind of output shouldn’t be a part of your design, only a part of how you debug your design.
reading exercises
If you were going to change countLongWords
to return the result that it currently prints to the console, which of the following method signatures would be capable of returning that value?
Which would be the best choice among the given options?
Capable of returning the result:
Best choice for returning the result:
(missing explanation)
The countLongWords
method actually has two results – the n
currently printed to the console, and longestWord
currently stored in a global variable.
Which of the following method signatures would be capable of returning both results, without printing to the console or using a global variable?
Which would be the best choice among the given options?
Capable of returning both results:
Best choice for returning both results:
(missing explanation)
Avoid special-case code
Programmers are often tempted to write special code to deal with what seem like special cases – parameters that are 0, for example, or empty lists, or empty strings.
The countLongWords
example falls into this trap by handling an empty words
list specially:
if (words.size() == 0) {
System.out.println(0);
return;
}
int n = 0;
longestWord = "";
for (String word: words) {
if (word.length() > LONG_WORD_LENGTH) ++n;
if (word.length() > longestWord.length()) longestWord = word;
}
System.out.println(n);
}
The starting if
statement is unnecessary.
If it were omitted, and the words
list is empty, then the for
loop simply does nothing, and 0
is printed anyway.
In fact, handling this special case separately has led to a possible bug – a difference in the way an empty list is handled, compared to a nonempty list that happens to have no long words on it.
Actively resist the temptation to handle special cases separately.
If you find yourself writing an if
statement for a special case, stop what you’re doing, and instead think harder about the general-case code, either to confirm that it can actually already handle the special case you’re worrying about (which is often true!), or put in a little more effort to make it handle the special case.
If you haven’t even written the general-case code yet, but are just trying to deal with the easy cases first, then you’re doing it in the wrong order.
Tackle the general case first.
Writing broader, general-case code pays off. It results in a shorter method, which is easier to understand and has fewer places for bugs to hide. It is likely to be safer from bugs, because it makes fewer assumptions about the values it is working with. And it is more ready for change, because there are fewer places to update when a change to the method’s behavior is made.
Some programmers justify handling special cases separately with a belief that it increases the overall performance of the method, by returning a hardcoded answer for a special case right away. For example, when writing a sort algorithm, it can be tempting to check whether the size of the list is 0 or 1 at the very start of the method, since you can then return immediately with no need to sort at all. Or if the size is 2, just do a comparison and a possible swap. These optimizations might indeed make sense — but not until you have evidence that they actually would matter to the speed of the program! If the sort method is almost never called with these special cases, then adding code for them just adds complexity, overhead, and hiding-places for bugs, without any practical improvement in performance. Write a clean, simple, general-case algorithm first, and optimize it later, only if it would actually help.
reading exercises
Smelly Example #3 (reproduced below) starts with an if
statement that handles a special case, prints 0
, and returns immediately.
But this makes the behavior of the special case inconsistent with other cases where the code would print 0
, for example when the list is nonempty but has no long words on it.
Which line of code from the general-case code did the special case also need to do?
if (words.size() == 0) {
System.out.println(0);
return;
}
/*A*/ int n = 0;
/*B*/ longestWord = "";
/*C*/ for (String word: words) {
/*D*/ if (word.length() > LONG_WORD_LENGTH) ++n;
/*E*/ if (word.length() > longestWord.length()) longestWord = word;
}
/*F*/ System.out.println(n);
}
(missing explanation)
Summary
Code review is a widely-used technique for improving software quality by human inspection. Code review can detect many kinds of problems in code, but as a starter, this reading talked about these general principles of good code:
- Don’t Repeat Yourself (DRY)
- Comments where needed
- Fail fast
- Avoid magic numbers
- One purpose for each variable
- Use good names
- Use whitespace to help the reader
- Don’t use global variables
- Methods should return results, not print them
- Avoid special-case code
The topics of today’s reading connect to our three key properties of good software as follows:
Safe from bugs. In general, code review uses human reviewers to find bugs. DRY code lets you fix a bug in only one place, without fear that it has propagated elsewhere. Documenting your assumptions with clear comments makes it less likely that another programmer will introduce a bug. The Fail Fast principle detects bugs as early as possible. Avoiding global variables makes it easier to localize bugs related to variable values, since non-global variables can be changed in only limited places in the code.
Easy to understand. Code review is really the only way to find obscure or confusing code, because other people are reading it and trying to understand it. Using judicious comments, avoiding magic numbers, keeping one purpose for each variable, using good names, and using whitespace well can all improve the understandability of code.
Ready for change. Code review helps here when it’s done by experienced software developers who can anticipate what might change and suggest ways to guard against it. DRY code is more ready for change, because a change only needs to be made in one place. Returning results instead of printing them makes it easier to adapt the code to a new purpose.
Code review is not the only software development practice with strong supporting evidence. Another is: sleep.
Asking questions about the reading
Have a question about the reading, or about a reading exercise?
Post your question on Piazza, and include a link that jumps directly to the part of the reading or the reading exercise that you’re asking about.
You can get that link by moving your mouse cursor into the left margin until an octothorpe #
link appears next to what you want to refer to.
Click on that #
, and your browser’s address bar now has the link you want.
reading exercises
Remember the exercises
At this point you should have completed all the reading exercises above.
Completing the reading exercises prepares you for the nanoquiz at the beginning of each class meeting, and submitting the exercises is required by 10:00pm MIT time the evening before class.
At this point you should have also completed these additional levels of the Java Tutor: