6.102
6.102 — Software Construction
Spring 2023

Reading 3: Code Review

Praxis Tutor exercises

Keep making progress on TypeScript by completing the following categories in the Praxis Tutor:

Software in 6.102

Safe from bugsEasy to understandReady for change
Correct today and correct in the unknown future. Communicating clearly with future programmers, including future you. Designed to accommodate change without rewriting.

Objectives

In today’s class, we will practice:

  • code review: reading and discussing code written by somebody else
  • general principles of good coding: things you can look for in every code review, regardless of programming language or program purpose

Code review

Code review is careful, systematic study of source code by people who are not the original author of the code. It’s analogous to proofreading a paper.

Code review really has two purposes:

  • Improving the code. Finding bugs, anticipating possible bugs, checking the clarity of the code, and checking for consistency with the project’s style standards.
  • Improving the programmers. Code review is an important way that programmers learn and teach each other — about new language features, changes in the design of the project or its coding standards, and new techniques. In open source projects, particularly, much conversation happens in the context of code reviews.

Code review is widely practiced in open source projects like Apache and Mozilla. Code review is also widely practiced in industry. In Google’s code review process, for example, you can’t push any code into the main repository until another engineer has signed off on it in a code review.

Why is code review widely used? There are many ideas out there for software development processes (some with buzzwords like Agile and Scrum). But code review is a practice with good evidence that it actually works. Hillel Wayne has collected some of the research highlights, among them that code review can find 70-90% of software defects, and that overwhelming numbers of software engineers at Google and Microsoft find it valuable and worth doing.

In 6.102, we’ll do code review on problem sets, as described in the Code Reviewing document on the course website.

Style standards

Most companies and large projects have coding style standards. These can get pretty detailed, even to the point of specifying whitespace (e.g., how deep to indent) and where curly braces and parentheses should go. These kinds of questions often lead to holy wars since they end up being a matter of taste and style.

In 6.102, we have no official style guide. If you are new to JavaScript and TypeScript and want a recommendation, Idiomatic JavaScript is appealing for its readability, and Google TypeScript Style Guide is good for TypeScript-specific concerns.

Visual Studio Code has an autoformatter (right-click and Format Document) whose default rules are similar to Google style.

But we won’t dictate where to put your curly braces in this class. That’s a personal decision that each programmer should make. It’s important to be self-consistent, however, and it’s very important to follow the conventions of the project you’re working on. If you’re the programmer who reformats every module you touch to match your personal style, your teammates will hate you, and rightly so. Be a team player.

But there are some rules that are quite sensible and target our big three properties in a stronger way than placing curly braces. The rest of this reading talks about some of these rules, at least the ones that are relevant at this point in the course, where we’re mostly talking about writing basic TypeScript. These are some things you should start to look for when you’re code reviewing other students, and when you’re looking at your own code for improvement. Don’t consider it an exhaustive list of code style guidelines, however. Over the course of the semester, we’ll talk about a lot more things — specifications, abstract data types with representation invariants, concurrency and thread safety — which will then become fodder for code review.

reading exercises

Caesar

The code-reviewing system we will use in this class is called Caesar. It’s specifically designed to automatically and randomly distribute code-reviewing to other students in the class.

We’ll use it for the first time during the Code-Reviewing class, but please visit Caesar now so that you can authenticate yourself before class starts.

Smelly example #1

Programmers often describe bad code as having a “bad smell” that needs to be removed. “Code hygiene” is another word for this. Let’s start with some smelly code.

function dayOfYear(month: number, dayOfMonth: number, year: number): number {
    if (month === 2) {
        dayOfMonth += 31;
    } else if (month === 3) {
        dayOfMonth += 59;
    } else if (month === 4) {
        dayOfMonth += 90;
    } else if (month === 5) {
        dayOfMonth += 31 + 28 + 31 + 30;
    } else if (month === 6) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31;
    } else if (month === 7) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30;
    } else if (month === 8) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31;
    } else if (month === 9) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31;
    } else if (month === 10) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30;
    } else if (month === 11) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31;
    } else if (month === 12) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31 + 31;
    }
    return dayOfMonth;
}

The next few sections and exercises will pick out the particular smells in this code example.

Don’t repeat yourself (DRY)

Duplicated code is a risk to safety. If you have identical or very similar code in two places, then the fundamental risk is that there’s a bug in both copies, and some maintainer fixes the bug in one place but not the other.

Avoid duplication like you’d avoid crossing the street without looking. Copy-and-paste is an enormously tempting programming tool, and you should feel a frisson of danger run down your spine every time you use it. The longer the block you’re copying, the riskier it is.

Don’t Repeat Yourself, or DRY for short, has become a programmer’s mantra.

The dayOfYear example is full of identical code. How would you DRY it out?

reading exercises

Don’t repeat yourself

Some of the repetition in dayOfYear() is repeated values. How many times is the number of days in April written in dayOfYear()?

(missing explanation)

Don’t repeat yourself

One reason why repeated code is bad is because a problem in the repeated code has to be fixed in many places, not just one. Suppose our calendar changed so that February really has 30 days instead of 28. How many numbers in this code have to be changed?

(missing explanation)

Don’t repeat yourself

Another kind of repetition in the code is dayOfMonth+=. Assume you have a list: const monthLength: Array<number> = [undefined, 31, 28, 31, 30, ..., 31]; so that monthLength[month] is the number of days in month, with January represented by 1.

Which of the following code skeletons could be used to DRY the code out enough so that dayOfMonth+= appears only once?

(missing explanation)

Comment where needed

Good software developers write comments in their code, and do it judiciously. Good comments should make the code easier to understand, safer from bugs (because important assumptions have been documented), and readier for change.

One kind of crucial comment is a specification, which appears above a function or above a class and documents the behavior of the function or class. In TypeScript, this is conventionally written as a documentation comment, starting with /** and including @-syntax, like @param and @returns for functions. Here’s an example of a spec:

/**
 * Compute the hailstone sequence.
 * See https://en.wikipedia.org/wiki/Collatz_conjecture#Statement_of_the_problem
 * @param n starting number of sequence; requires integer n > 0.
 * @returns the hailstone sequence starting at n and ending with 1.
 *         For example, hailstone(3) = [3, 10, 5, 16, 8, 4, 2, 1].
 */
function hailstoneSequence(n: number): Array<number> {
    ...
}

Specifications document assumptions. We’ve already mentioned specs a few times, and there will be much more to say about them in a future reading.

Another crucial comment is one that specifies the provenance or source of a piece of code that was copied or adapted from elsewhere. This is vitally important for practicing software developers, and is required by the 6.102 collaboration policy when you adapt code you found on the web. Here is an example:

// reverse a string of digits
// see https://stackoverflow.com/questions/1611427/reversing-a-string-in-javascript
const digitsReversed = digits.split('').reverse().join('');

One reason for documenting sources is to avoid violations of copyright. Small snippets of code on Stack Overflow are typically in the public domain, but code copied from other sources may be proprietary or covered by other kinds of open source licenses, which are more restrictive. Another reason for documenting sources is that the borrowed code can be buggy or fall out of date; the Stack Overflow post from which this code came has a lot of discussion about how this simple way to reverse a string does not work for strings with non-ASCII (i.e. non-English) characters in it!

Some comments are bad and unnecessary. Direct transliterations of code into English, for example, do nothing to improve understanding, because you should assume that your reader at least knows TypeScript:

while (n !== 1) { // test whether n is 1   (don't write comments like this!)
   ++i; // increment i
   l.push(n); // add n to l
}

But obscure code should get a comment:

const sum = n*(n+1)/2;  // Gauss's formula for the sum of 1...n

// here we're using the sin x ~= x approximation, which works for very small x
const moonDiameterInMeters = moonDistanceInMeters * apparentAngleInRadians;

It’s often good to divide your code into paragraphs, groups of lines with focused purposes, and start each paragraph with a comment as a topic sentence, explaining that purpose:

// prepare the rocket
...
...
...

// launch the rocket
...
...
...

// enter orbit
...
...

Once you start thinking about your code this way, you may realize that some or all of these paragraphs deserve to be helper functions, and then the comments might naturally become well-named function calls instead:

prepareRocket(...);
launchRocket(...);
enterOrbit(...);

A clear name often obviates the need for a comment.

The dayOfYear code needs some comments — where would you put them? For example, where would you document whether month runs from 0 to 11 or from 1 to 12?

reading exercises

Comment where needed

Which comments are useful additions to the code? Consider each comment independently, as if the other comments weren’t there.

/**
 * @param month month of the year, where January=1 and December=12  [C1]
 */
function dayOfYear(month: number, dayOfMonth: number, year: number): number {
    if (month === 2) {     // we're in February  [C2]
        dayOfMonth += 31;  // add in the days of January that already passed  [C3]
    } else if (month === 3) {
        dayOfMonth += 59;  // month is 3 here  [C4]
    } else if (month === 4) {
        dayOfMonth += 90;
    }
    ...
    } else if (month === 12) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31 + 31;
    }
    return dayOfMonth; // the answer  [C5]
}

(missing explanation)

Fail fast

Failing fast means that code should reveal its bugs as early as possible. The earlier a problem is observed (the closer to its cause), the easier it is to find and fix. As we saw in the first reading, static checking fails faster than dynamic checking, and dynamic checking fails faster than producing a wrong answer that may corrupt subsequent computation.

The dayOfYear function doesn’t fail fast — if you pass it the arguments in the wrong order, it will quietly return the wrong answer. In fact, the way dayOfYear is designed, it’s highly likely that a non-American will pass the arguments in the wrong order! It needs more checking — either static checking or dynamic checking.

reading exercises

Fail fast
function dayOfYear(month: number, dayOfMonth: number, year: number): number {
    if (month === 2) {
        dayOfMonth += 31;
    } else if (month === 3) {
        dayOfMonth += 59;
    } else if (month === 4) {
        dayOfMonth += 90;
    } else if (month === 5) {
        dayOfMonth += 31 + 28 + 31 + 30;
    } else if (month === 6) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31;
    } else if (month === 7) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30;
    } else if (month === 8) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31;
    } else if (month === 9) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31;
    } else if (month === 10) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30;
    } else if (month === 11) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31;
    } else if (month === 12) {
        dayOfMonth += 31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31 + 31;
    }
    return dayOfMonth;
}

Suppose the date is February 9, 2019. The correct dayOfYear() result for this date is 40, since it’s the fortieth day of the year.

People write dates in various ways, and date libraries can represent dates in various ways. A programmer’s prior experience can easily lead them to make wrong assumptions about the arguments of dayOfYear().

Here are some ways that a programmer might try to call dayOfYear() for February 9, 2019. Which ones are buggy, and does the bug lead to a static error, dynamic error, or wrong answer?

dayOfYear(2, 9, 2019)

(missing explanation)

dayOfYear(1, 9, 2019)

(missing explanation)

dayOfYear(9, 2, 2019)

(missing explanation)

dayOfYear("February", 9, 2019)

(missing explanation)

dayOfYear(2019, 2, 9)

(missing explanation)

dayOfYear(2, 2019, 9)

(missing explanation)

Fail faster

Which of the following design changes (considered separately) would make the code fail faster if it were called with arguments in the wrong order?

function dayOfYear(month: string, dayOfMonth: number, year: number): number {
    ...
}

(missing explanation)

function dayOfYear(month: number, dayOfMonth: number, year: number): number {
    if (month < 1 || month > 12) {
        return -1;
    }
    ...
}

(missing explanation)

function dayOfYear(month: number, dayOfMonth: number, year: number): number {
    if (month < 1 || month > 12) {
        throw new Error("month out of range");
    }
    ...
}

(missing explanation)

enum Month { JANUARY, FEBRUARY, MARCH, ..., DECEMBER };
function dayOfYear(month: Month, dayOfMonth: number, year: number): number {
    ...
}

(missing explanation)

function dayOfYear(month: number, dayOfMonth: number, year: number): number {
    if (month === 1) {
        ...
    } else if (month === 2) {
        ...
    }
    ...
    } else if (month === 12) {
        ...
    } else {
        throw new Error("month out of range");
    }
}

(missing explanation)

Avoid magic numbers

There’s a computer science joke that the only numbers that computer scientists understand are 0, 1, and sometimes 2.

All other constants are called magic because they appear as if out of thin air with no explanation.

One way to explain a number is with a comment, but a far better way is to declare the number as a named constant with a good, clear name.

dayOfYear is full of magic numbers, which illustrate some of the reasons they should be avoided:

  • A number is less readable than a name. In dayOfYear, the month values 2, …, 12 would be far more readable as FEBRUARY, …, DECEMBER.

  • Constants may need to change in the future. Using a named constant, instead of repeating the literal number in various places, is more ready for change.

  • Constants may be dependent on other constants. In dayOfYear, the mysterious numbers 59 and 90 are particularly pernicious examples. Not only are they uncommented and undocumented, they are actually the result of a computation done by hand by the programmer – summing the lengths of certain months. This is less easy to understand, less ready for change, and definitely less safe from bugs. Don’t hardcode numbers that you’ve computed by hand. Use a named constant that visibly computes the relationship in terms of other named constants.

What about constants that seem constant and eternal, like π, or the total angle of a triangle, or the gravitational constant G? Is it necessary to give names to fundamental constants of mathematics and physics that don’t depend on other constants? First, the number itself may be complex to express, so repeating it multiple times is not safe from bugs. Is it easy to tell which is right: 3.14159265358979323846 or 3.1415926538979323846? Second, even these numbers encode design decisions that might change in the future, such as the number of digits of precision, or the units of measurement. A named constant is more ready for change when those design decisions change.

If you have a profusion of magic numbers in your code, it can be a sign that you need to take a step back and treat those numbers as data rather than named constants, and put them into a data structure that allows for simpler code. In dayOfYear, the lengths of the months (30, 31, 28, etc.) would be more readable, and far more DRY, if they were instead stored in a data structure like a list or map, e.g. monthLength[month].

reading exercises

Avoid magic numbers

In the code:

if (month === 2) { ... }

what might a reasonable programmer plausibly assume about the meaning of the magic number 2?

(missing explanation)

What happens when you assume

Suppose you’re reading some code that uses a turtle graphics library that you don’t know well, and you see the code:

turtle.rotate(3);

Just from reading this line (not consulting the documentation), which of the following are likely assumptions you might make about the meaning of the number 3?

(missing explanation)

Names instead of numbers

Consider this code, which is intended to draw a regular pentagon:

for (let i = 0; i < 5; ++i) {
    turtle.forward(36);
    turtle.turn(72);
}

The magic numbers in this code cause it to fail all three of our measures of code quality: it’s not safe from bugs (SFB), not easy to understand (ETU) and not ready for change (RFC).

For each of the following rewrites, judge whether it improves SFB, ETU, and/or RFC, or none of the above.

const five = 5;
const thirtySix = 36;
const seventyTwo = 72;
for (let i = 0; i < five; ++i) {
    turtle.forward(thirtySix);
    turtle.turn(seventyTwo);
}

(missing explanation)

const numbers: Array<number> = [5, 36, 72];
for (let i = 0; i < numbers[0]; ++i) {
    turtle.forward(numbers[1]);
    turtle.turn(numbers[2]);
}

(missing explanation)

const x = 5;
for (let i = 0; i < x; ++i) {
    turtle.forward(36);
    turtle.turn(360.0 / x);
}

(missing explanation)

const oneRevolution = 360.0;
const numSides = 5;
const sideLength = 36;
for (let i = 0; i < numSides; ++i) {
    turtle.forward(sideLength);
    turtle.turn(oneRevolution / numSides);
}

(missing explanation)

Now you see it, now you don’t

In which of the following lines of code is 0 a magic number?

(missing explanation)

Magic strings

Louis Reasoner is designing a function to pluralize words in different languages. His first version looked like this:

/** pluralize `word` using the pluralization rules of `language` */
function pluralize(word: string, language: number) { ... }

pluralize('dog', 8);  // returns 'dogs' in English
pluralize('Hund', 7); // returns 'Hunde' in German

After receiving code review comments that his design had magic numbers in it, Louis changed his design to:

/** pluralize `word` using the pluralization rules of `language` */
function pluralize(word: string, language: string) { ... }

pluralize('dog', 'English');  // returns 'dogs'
pluralize('Hund', 'Deutsch'); // returns 'Hunde'

Which of the following comments are valid for Louis’s new design?

(missing explanation)

One purpose for each variable

In the dayOfYear example, the parameter dayOfMonth is reused to compute a very different value — the return value of the function, which is not the day of the month.

Don’t reuse parameters, and don’t reuse variables. Variables are not a scarce resource in programming. Introduce them freely, give them good names, and just stop using them when you stop needing them. You will confuse your reader if a variable that used to mean one thing suddenly starts meaning something different a few lines down.

Not only is this an ease-of-understanding question, but it’s also a safety-from-bugs and readiness-for-change question.

Function parameters, in particular, should generally be left unmodified. (This is important for being ready-for-change — in the future, some other part of the function may want to know what the original parameters of the function were, so you shouldn’t blow them away while you’re computing.) It’s a good idea to use const for as many variables as you can, because the const keyword says that the variable should never be reassigned, and the TypeScript compiler will check it statically. Unfortunately, TypeScript does not (yet) support declaring a function parameter as const, to prevent it from being reassigned in the body of the function.

Smelly example #2

There was a latent bug in dayOfYear: it didn’t handle leap years at all. As part of fixing that, suppose we write a leap-year function.

function leap(y:number):boolean {
    let tmp=y.toString();
    if(tmp[2]==='1'||tmp[2]==='3'||tmp[2]==='5'||tmp[2]==='7'||tmp[2]==='9') {
        if(tmp[3]==='2'||tmp[3]==='6') return true; /*R1*/
        else
            return false; /*R2*/
    }else{
        if(tmp[2]==='0' && tmp[3]==='0') {
            return false; /*R3*/
        }
        if(tmp[3]==='0'||tmp[3]==='4'||tmp[3]==='8') return true; /*R4*/
    }
    return false; /*R5*/
}

What are the bugs hidden in this code? And what style problems that we’ve already talked about?

reading exercises

Mental execution 2016

What happens when you call:

leap(2016);

(missing explanation)

Mental execution 2017

What happens when you call:

leap(2017);

(missing explanation)

Mental execution 10016

What happens when you call:

leap(10016);

(missing explanation)

Mental execution 916

What happens when you call:

leap(916);

(missing explanation)

Magic numbers

How many magic numbers are in this code? Count every occurrence if some appear more than once.

(missing explanation)

DRYing out

Suppose you wrote the helper function:

function isDivisibleBy(num: number, factor: number): boolean {
    return num % factor === 0;
}

If leap() were rewritten to use isDivisibleBy(year, ...), and to correctly follow the leap year algorithm, how many magic numbers would be in the code?

(missing explanation)

Use good names

Good function and variable names are long and self-descriptive. Comments can often be avoided entirely by making the code itself more readable, with better names that describe the functions and variables.

For example, you can rewrite

const tmp = 86400;  // tmp is the number of seconds in a day (don't do this!)

as:

const secondsPerDay = 86400;

In general, variable names like tmp, temp, and data are awful — symptoms of extreme programmer laziness. Every local variable is temporary, and every variable is data, so those names are generally meaningless. Better to use a longer, more descriptive name, so that your code reads clearly all by itself.

Follow the lexical naming conventions of the language. In both Python and TypeScript languages, classes typically start with a capital letter, and variable names and function names start with a lowercase letter. But the two languages differ in how multi-word phrases are rendered into a function or variable name. Python uses snake_case (underscores separating the words of the phrase), while TypeScript uses camelCase (capitalizing each word after the first, as in startsWith or getFirstName).

Java-like languages also use capitalization to distinguish global constants (const declarations outside of any function) from variables and local constants. ALL_CAPS_WITH_UNDERSCORES is used for global constants. But the local variables declared inside a function, including local constants like secondsPerDay above, use camelCaseNames.

In any language, function names are usually verb phrases, like getDate or isUpperCase, while variable and class names are usually noun phrases. Choose short words, and be concise, but avoid abbreviations. For example, message is clearer than msg, and word is so much better than wd. Keep in mind that many of your teammates in class and in the real world will not be native English speakers, and abbreviations can be even harder for non-native speakers.

Avoid single-character variable names entirely except where they are easily understood by convention. For example, x and y make sense for Cartesian coordinates, and i and j as integer variables in for loops. But if your code is full of variables like e, f, g, and h, because you’re just picking them from the alphabet, then it will be incredibly hard to read. Putting a little effort into choosing good names is worth the investment, because your code will need to be read (by you and by other people) more often than it’s written, and modern code editors make it easy to autocomplete a name.

Effectively Naming Software Thingies has some excellent advice about naming. Robert Martin’s Clean Code (chapter 2) is also highly recommended.

The leap function has bad names: the function name itself, and the local variable name. What would you call them instead?

reading exercises

Better function names
function leap(y:number):boolean {
    let tmp = y.toString();
    if (tmp[2] === '1' || tmp[2] === '3' || tmp[2] === '5' || tmp[2] === '7' || tmp[2] === '9') {
        if (tmp[3] === '2' || tmp[3] === '6') return true;
        else
            return false;
    } else {
        if (tmp[2] === '0' && tmp[3] === '0') {
            return false;
        }
        if (tmp[3] === '0' || tmp[3] === '4' || tmp[3] === '8') return true;
    }
    return false;
}

Which of the following are good names for the leap() function?

(missing explanation)

Better variable names

Which of the following are good names for the tmp variable inside leap()?

(missing explanation)

Skip the comment

Rename this variable so that the comment can be deleted entirely:

const a = 18;  // this variable stores the age when you can first vote

Rename a to:

(missing explanation)

Types

Good names imply something about their type – what kinds of values they can represent. Which of these names say something to the reader about their type?

(missing explanation)

Use whitespace and punctuation to help the reader

Use consistent indentation. The leap example is bad at this; The dayOfYear example is much better. In fact, dayOfYear nicely lines up all the numbers into columns, making them easy for a human reader to compare and check. That’s a great use of whitespace.

Put spaces within code lines to make them easy to read. The leap example has many lines that are packed together unreadably:

    if(tmp[2]==='1'||tmp[2]==='3'||tmp[2]==='5'||tmp[2]==='7'||tmp[2]==='9') {

Use spaces around binary operators like === and || to set them off and make them more visible, and use alignment on multiple lines to make similarities and differences pop out to the reader:

    if (   tmp[2] === '1'
        || tmp[2] === '3'
        || tmp[2] === '5'
        || tmp[2] === '7'
        || tmp[2] === '9' ) {

Never use tab characters for indentation, only space characters. Note that we say characters, not keys. We’re not saying you should never press the Tab key; only that your editor should never put a tab character into your source file in response to your pressing the Tab key. The reason for this rule is that different tools treat tab characters differently — sometimes expanding them to 4 spaces, sometimes to 2 spaces, sometimes to 8. If you run git diff on the command line, or if you view your source code in a different editor, then the indentation may be completely screwed up. Just use spaces. Always set your programming editor to insert space characters when you press the Tab key. That’s the default for Visual Studio Code.

Put a semicolon at the end of each statement when you are writing in a C-like language like TypeScript (or JavaScript, or C, or C++, or Java). Even though TypeScript/JavaScript has “automatic semicolon insertion” that puts in missing semicolons automatically, it doesn’t always do what you expect. Be consistent and punctuate all your statements.

Similarly, always use curly braces around the body of an if, while, or for statement. Even though C-like languages allow you to omit the curly braces when the body contains only one statement, this shortcut can lead to bugs. Be consistent and put curly braces around all blocks of code, no matter how many statements they have.

Smelly example #3

Here’s a third example of smelly code that will illustrate the remaining points of this reading.

let LONG_WORD_LENGTH = 5;
let longestWord;

function countLongWords(text: string): void {
    let words: Array<string> = text.split(' ');
    if (words.length === 0) {
        console.log("0");
        return;
    }
    let n = 0;
    longestWord = "";
    for (let word of words) {
        if (word.length > LONG_WORD_LENGTH) ++n;
        if (word.length > longestWord.length) longestWord = word;
    }
    console.log(n);
}

Don’t use global variables

Avoid global variables. Let’s break down what we mean by global variable. A global variable is:

  • a variable: a name whose value can be changed
  • that is global: accessible and changeable from anywhere in the program.

Global Variables Are Bad has a good list of the dangers of global variables.

In TypeScript, a global variable is declared outside of any function definition, using let or var.

However, if it is declared with const instead, and if furthermore the variable’s type is immutable, then the name becomes a global constant. A global constant can be read anywhere but never reassigned or mutated, so the risks go away. Global constants are common and useful.

In general, convert global variables into parameters and return values, or put them inside objects that you’re calling functions on. We’ll see many techniques for doing that in future readings.

reading exercises

Identifying global variables

In Smelly Example #3 above, which of these are global variables?

(missing explanation)

Effect of const

Making a variable into a constant by declaring it with the const keyword can eliminate the risk of global variables. What happens to each of these when the const keyword is used? Look carefully at how the variable is used in Smelly Example #3.

n

(missing explanation)

LONG_WORD_LENGTH

(missing explanation)

longestWord

(missing explanation)

word

(missing explanation)

words

(missing explanation)

Kinds of variables in snapshot diagrams

When we’re drawing snapshot diagrams, it’s important to distinguish between different kinds of variables:

  • a local variable inside a function
  • an instance variable inside an instance of an object
    • an instance variable may also be called a property (particularly in TypeScript/JavaScript), a member variable (C++), an attribute (Python), or a field (Java).
  • a global variable outside of any function or object

A local variable comes into existence when a function is called, and then disappears when the function returns. If multiple calls to the same function are in progress (for example because of recursion), then each call will have its own independent local variables.

An instance variable comes into existence when an object is created with new, and then disappears when the object is no longer accessible and becomes garbage-collected. Each object instance has its own instance variables.

A global variable comes into existence when its declaration is executed, and exists for the rest of the life of the program.

Here is a simple piece of code that uses all three kinds of variables:

let taxRate = 0.05;

class Payment {
    public value: number;
}

function main(): void {
    let payment: Payment = new Payment();
    payment.value = 100;
    taxRate = 0.05;
    console.log(payment.value * (1 + taxRate));
}

A snapshot diagram shows each variable differently. In the next two exercises, you will construct a snapshot diagram that shows the state of this program.

reading exercises

Drawing the heap

Snapdown is a little language for drawing snapshot diagrams with text. Open the Snapdown help sidebar to get the details of the syntax. The first three sections of the sidebar — Primitives, Objects, and Fields — will be useful for this exercise.

Edit the Snapdown code below so that the diagram on the right shows the final state of payment and taxRate at the end of the main function above.

paymentTODOtaxRate"change me! am I even the correct type?"

(missing explanation)

Drawing the stack

The variable payment is local to the main() function. The variable taxRate is global.

Using stack frames, draw a snapshot diagram that demonstrates the difference between these two variables. The Stack frames section of the Snapdown help sidebar will be useful here.

paymentTODOtaxRateChangeMyName"change me! am I even the correct type?"

(missing explanation)

Functions should return results, not print them

countLongWords isn’t ready for change. It sends some of its result to the console, using console.log(). That means that if you want to use it in another context — where the number is needed for some other purpose, like computation rather than human eyes — it would have to be rewritten.

In general, only the highest-level parts of a program should interact with the human user or the console. Lower-level parts should take their input as parameters and return their output as results. The sole exception here is debugging output, which can of course be printed to the console. But that kind of output shouldn’t be a part of your design, only a part of how you debug your design.

reading exercises

Returning results

If you were going to change countLongWords to return the result that it currently prints to the console, which of the following function signatures would be capable of returning that value? Which would be the best choice among the given options?

Capable of returning the result:

Best choice for returning the result:

(missing explanation)

Returning two results

The countLongWords function actually has two results – the n currently printed to the console, and longestWord currently stored in a global variable. Which of the following function signatures would be capable of returning both results, without printing to the console or using a global variable? Which would be the best choice among the given options?

Capable of returning both results:

Best choice for returning both results:

(missing explanation)

Avoid special-case code

Programmers are often tempted to write special code to deal with what seem like special cases – parameters that are 0, for example, or empty arrays, or empty strings. The countLongWords example falls into this trap by handling an empty words array specially:

if (words.length === 0) {
    console.log("0");
    return;
}
let n = 0;
longestWord = "";
for (let word of words) {
    if (word.length > LONG_WORD_LENGTH) ++n;
    if (word.length > longestWord.length) longestWord = word;
}
console.log(n);

The starting if statement is unnecessary. If it were omitted, and the words array is empty, then the for loop simply does nothing, and 0 is printed anyway. In fact, handling this special case separately has led to a possible bug – a difference in the way an empty array is handled, compared to a nonempty array that happens to have no long words on it.

Actively resist the temptation to handle special cases separately. If you find yourself writing an if statement for a special case, stop what you’re doing, and instead think harder about the general-case code, either to confirm that it can actually already handle the special case you’re worrying about (which is often true!), or put in a little more effort to make it handle the special case. If you haven’t even written the general-case code yet, but are just trying to deal with the easy cases first, then you’re doing it in the wrong order. Tackle the general case first.

Writing broader, general-case code pays off. It results in a shorter function, which is easier to understand and has fewer places for bugs to hide. It is likely to be safer from bugs, because it makes fewer assumptions about the values it is working with. And it is more ready for change, because there are fewer places to update when a change to the function’s behavior is made.

Some programmers justify handling special cases separately with a belief that it increases the overall performance of the function, by returning a hardcoded answer for a special case right away. For example, when writing a sort algorithm, it can be tempting to check whether the size of the list is 0 or 1 at the very start of the function, since you can then return immediately with no need to sort at all. Or if the size is 2, just do a comparison and a possible swap. These optimizations might indeed make sense — but not until you have evidence that they actually would matter to the speed of the program! If the sort function is almost never called with these special cases, then adding code for them just adds complexity, overhead, and hiding-places for bugs, without any practical improvement in performance. Write a clean, simple, general-case algorithm first, and optimize it later, only if it would actually help.

reading exercises

Bug hiding in a special case

Smelly Example #3 (reproduced below) starts with an if statement that handles a special case, prints 0, and returns immediately. But this makes the behavior of the special case inconsistent with other cases where the code would print 0, for example when the list is nonempty but has no long words in it. Which line of code from the general-case code did the special case also need to do?

         if (words.length === 0) {
             console.log("0");
             return;
         }
/*A*/    let n = 0;
/*B*/    longestWord = "";
/*C*/    for (let word of words) {
/*D*/        if (word.length > LONG_WORD_LENGTH) ++n;
/*E*/        if (word.length > longestWord.length) longestWord = word;
         }
/*F*/    console.log(n);

(missing explanation)

Summary

Code review is a widely-used technique for improving software quality by human inspection. Code review can detect many kinds of problems in code, but as a starter, this reading talked about these general practices for good code:

  • Don’t Repeat Yourself (DRY)
  • Comment where needed
  • Fail fast
  • Avoid magic numbers
  • One purpose for each variable
  • Use good names
  • Use whitespace and punctuation to help the reader
  • Don’t use global variables
  • Functions should return results, not print them
  • Avoid special-case code

The topics of today’s reading connect to our three key properties of good software as follows:

  • Safe from bugs. In general, code review uses human reviewers to find bugs. DRY code lets you fix a bug in only one place, without fear that it has propagated elsewhere. Documenting your assumptions with clear comments makes it less likely that another programmer will introduce a bug. The Fail Fast principle detects bugs as early as possible. Avoiding global variables makes it easier to localize bugs related to variable values, since non-global variables can be changed in only limited places in the code.

  • Easy to understand. Code review is really the only way to find obscure or confusing code, because other people are reading it and trying to understand it. Using judicious comments, avoiding magic numbers, keeping one purpose for each variable, using good names, and using whitespace well can all improve the understandability of code.

  • Ready for change. Code review helps here when it’s done by experienced software developers who can anticipate what might change and suggest ways to guard against it. DRY code is more ready for change, because a change only needs to be made in one place. Returning results instead of printing them makes it easier to adapt the code to a new purpose.

Code review is not the only software development practice with strong supporting evidence. Another is: sleep.

Asking questions about the reading

Have a question about the reading, or about a reading exercise? Post your question on Piazza, and include a link that jumps directly to the part of the reading or the reading exercise that you’re asking about. You can get that link by moving your mouse cursor into the left margin until an octothorpe # link appears next to what you want to refer to. Click on that #, and your browser’s address bar now has the link you want.

reading exercises

Every question is a good question

Find the paragraph in this reading that defines camelCase, get the link that jumps right to that paragraph, and paste the link here:

(missing explanation)

Jumping in