Programming HabitsThursday, Feb 15th, 2007
Over the years, I have discovered many things about the art of programming. Some I later discovered to be wrong, others I forgot and then rediscovered, others yet were discoveries of the extent of what I must yet discover. I also developed a set of habits in the domain of administering programming projects, whether they be my own, or by a small group, or larger, corporate affairs. These habits are, on the one hand, essential to the success of any software (that is not so small or so lucky that it manages regardless by sheer chance), on the other hand sufficiently easy to describe as to fit in a single essay, and on the third hand not known and loved as universally as they should be. Therefore, I wrote that single essay, and you are now reading it.
This document is organized as a collection of individual, bite-sized pieces of advice, grouped about six general topics. One reason for this is that they interact with each other, enough that it is impossible to write them down linearly, without forward references. The other reason is to allow you to jump around, to skip the bits you already know, to revisit bits you want to think about again, or to just run down the list without having to sift through the explanations.
Version Control is a means to systematically track the history of a piece of software as it is being developed. This task is carried out by a thing called a version control system, such as CVS or Subversion. Said thing maintains, explicitly or otherwise, a history of the code being controlled, and provides operations to acquire (usually called check out) the current version of the code, to check in or commit one's modifications to it, and to update one's working copy to incorporate changes made by others. The version control system also provides functions for examining the history of the code in various ways, for undoing various changes made and/or returning to previous points in the software's history, and for resolving conflicts (two people changing the same code at the same time in different ways).
Anyone who has ever hit the "Undo" button in a text editor or recovered a backup file knows the wonders of having the computers remember things that are not currently relevant or immediately apparent. Since code is more complicated than normal documents, this is even more important in software. And if you haven't had the pleasure of developing software with others, take my word for it that all the potential for confusion and data loss is multiplied by the presence of more than one person. For these reasons, version control is an invaluable tool in the process of producing software, and it becomes even better if used well.So, how do you use version control well?
- Use a good version control system. My current favorite is Subversion — it's effective, it's Free, it's fairly popular and getting more so, and it has few defects that are effectively corrected by any other system. I will not claim it's perfect, but it's a fine default choice until you know better.
- Version from the start.
If there is anything about a project worth recording
in any more permanent form than a scrap of paper, it's worth
- Corollary: If you have already started a project without version control, set it up and check it in right now.
- Version even if you work alone. Yourself three weeks from now is a surprisingly different person from yourself today. Following good version control practice will make your intentions clearer as you carry them out, and will be invaluable if you ever forget anything about what you were doing (which, believe me, you will). Besides, you get great backups for free,  so there's really nothing to lose, and everything to gain.
- Version everything humans create. Code (of course), test cases, build scripts, documentation, to-do lists, explanations, presentations, ideas, requirements — anything to do with the project that is created by human minds should be in the version control system unless and until you find a good reason to put it somewhere else.
- Do not version computer-generated files. Doing so only introduces another way the state of the project could become inconsistent (e.g., if someone checks in a source file and forgets to regenerate some dependent generated file). Much better to ask the version control system to ignore those generated files, and just regenerate them whenever they are needed. But do version every original human-created thing needed to generate those files, including the commands that carry out the generation process.
- Write good log messages. Every good version
control system will ask you for a log message with every commit, whose
purpose is to explain what you were doing when you checked that code
in. Do not blow these off, but write them, and write them well.
- You are writing for yourself as much as for others. Being conscientious about writing good log messages will force you to think clearly about your designs, be honest about your hacks, and remain aware of what you are doing, as well as informing anyone (including your own future self) who wants to know what was going on in the head of whoever wrote the code they are now staring at.
- Write what you did (and when needed why), not so much how. If the "how" is interesting or not clear from the change itself write it, of course, but a lot of the time the code will reveal the "how" well enough. If anything is unclear, it is likely to be your intentions.
- Describe everything you did. The version control system can help you figure out what changes you made; do try to talk about all of them. Corollary: Try not to do things that are so complicated that you won't be able to explain them. Break such activities up into multiple smaller steps and check them in one by one.
- Never break the build. Every time you check in, the system should be functional. Someone updating to that point should be able to compile it (if applicable) and run the test suite, and have the test suite pass. Checking in errors is extremely rude to everyone you are working with (including your aforementioned future self) because it becomes impossible for them to know whether the system is broken because of something they did or because you checked in something bad.  Corollary: If you do break the build, apologize, and your top priority is to fix it.
- Commit small, semantic changes. Ideally, every commit should consist of a single act, and every log message should amount to a paragraph with a single topic sentence. As a good rule of thumb for deciding whether two related things are one or two semantic changes, ask yourself whether someone might want to undo one of them without undoing the other. If the answer is "yes", commit them separately.
- Don't keep uncommitted changes for long. The longer something goes uncommitted, the more likely you are to forget what you were doing, to introduce bugs, to collide with others working on something related. If you haven't committed your changes, you are not done for the day, unless you have a very good reason not to check whatever it is in.
A build system is a collection of automations of the tasks of software development. The most common such task, and the one from which these systems get their name, is compiling the code. Running the test suite against the code is also something the build system will allow one to do, as are other things like generating browsable documentation from code comments, putting together tarballs for release, etc. Automating these tasks saves an insane amount of human time and effort, and prevents errors that arise from these tasks being done incorrectly or neglected through laziness.
The world has tools for this. The UNIX utility
has been the standard in build automation for eons, and definitely
does the job well.
ant is popular in the Java world, and
I personally like
rake.  Modern IDEs may also
provide some of this functionality, or hooks for calling out to these
standard tools. Whichever tool you end up using,
compiling, testing, and whatever else
may be applicable should happen at the push of a simple button.
- Use a real build tool. Learning a whole new tool for this task of building can be daunting, but it is worth your while. Your project will outgrow a trivial shell script in no time, whereas no project is too small for the standard tools, and very few projects are too big. 
- Automate everything. Compiling and testing should be automated right from the start. Generation of documentation or code, cleanup, installation, etc, as soon as you start doing it. In general, any task that will get done more than once is a good candidate for automation.
- Communicate the automation clearly. In particular, there should be a well-defined "build" that people should not break when they commit. Typically this will consist of successful compilation and execution of the test suite, but in any case, make it clear what command runs it, make it clear that it should always work, and ensure that it is obvious whether or not it worked in any given run.
A test suite is a collection of tests whose purpose is to verify that code works. It is automated if it can be executed, and if its results can be evaluated, entirely by computer.
Tests can be categorized according to the amount of code they exercise: A unit test verifies the functionality of a single software component, such as a function or a class. An integration test verifies that several components, possibly the entire system, work together properly. A functional test verifies that the system (typically all or much of it) behaves correctly at a reasonably high level. These last two definitions overlap, though with somewhat different flavors, and in my experience, there is no industry-wide consensus on which is which. The terms are often used to distinguish them from unit tests.
I will not dwell on the merits of good automated tests. At present, the industry is divided as to who should be testing software, how, how much, how automated the tests should be, when they should be written, etc. This subject sees plenty of discussion on the Internet, and I will not waste your time by repeating it. Suffice it to say that I personally believe that a good test suite, consisting of adequate coverage both by unit and functional/integration tests, is an essential piece of any software project, and that it should be written at the same time and by the same people as the main code (though help from dedicated testers is a nice bonus, for projects that are large enough to afford them).
The world has tools for writing test suites: Java has
JUnit, Python has
PyUnit, Ruby has
Test::Unit, etc. Practically every programming language
has a testing library in the xUnit style, and most offer alternatives.
Do not fear this profusion of options: the presence and execution of a
test suite is far more important than which particular testing library
- The test suite should be checked in. The tests are as much a part of the code as the stuff that runs in production, if not more. Version them. They should be shared, backed up, and tracked like all the rest of it, and everyone should run the same suite.
- The test suite should be automatic.
There should be one clear button (such as the command
make test) that executes all the tests and reports the results.
- The test suite should be unambiguous. After running the tests, it should be obvious whether they passed or failed, and if any failed, which ones. There should be absolutely no human judgment involved in deciding whether something is a success or a failure. The tests should not produce any output that might obscure the report from the framework.
- The tests should always pass. Checking in something that fails tests constitutes a build breakage, and should never be done (or if done accidentally, should be immediately fixed). If you know the code is right and the test is wrong, fix the test. If you need to check in but some test is failing for a stupid reason that you don't have time to debug immediately, remove it from the suite temporarily, but solve that problem as soon as possible.
- Run the tests often. When you update, before you commit, during development. Running the test suite (or at least a relevant portion of it) should be part of your development cycle.
- Write the tests first. No, really. When fixing a bug, first write a test that fails because of it, then fix the bug. When the test passes, you got it. When adding a feature, first write tests that require it. This helps you understand what the feature ought to do, and to know when you got it right.
- Tests are executable documentation. Unlike normal documentation, they never lie, because everyone runs them and they always pass. If you think a piece of code is confusing, write unit tests for it. If you write real documentation, write tests that verify every statement you make in it.
- Test everything. If it's not blindingly obvious that a piece of code works just by looking at it, write tests that check that it works. You'll be surprised at how many bugs you catch, and how many you prevent from ever forming.
- Only test what actually matters. It's all too easy to write tests of the form "when I run this program on this input, it should produce these 10,000 lines of output that I have in this file." While often better than no test at all, this is a terrible kind of test, because alongside the things that actually matter, it also tests a million irrelevant details (like floating point round-off). Updating these kinds of tests to reflect desired change in the program is either painful or prone to allowing the entry of bugs.
Code review is the process and practice of reading over code written by someone else in order to detect errors, suggest improvements, etc. It goes a long way towards cleaner, better designed, and generally superior code. A second pair of eyes and a second perspective on the problem at hand are extremely helpful for a clearer, better solution. Code reviews also help programmers educate each other about useful techniques, methods, styles, etc. As soon as there is more than one person working on a project, start reviewing one-another's code. Ideally, every line of code should be read by at least two people: its author, and a reviewer.
There are many choices to be made about code reviewing practice. When to review code? how thoroughly to review it? who should do the reviewing? In the limit of a large project worked on full time by many people, every piece of code should be reviewed as soon as (or before, if there is good tool support for this) it is checked in. Every piece of code should be reviewed by someone who knows the code around it, and can understand its effects (and, importantly, any mistakes it might be making). In this scenario, the review should be prompt (within a day or two), and the author should try not to change the same code further until the review is done, to avoid muddling the process.
Smaller projects with fewer people will necessarily not work this way. If there isn't much code, it evolves quickly, and can't afford the delay introduced by reviewing every checkin. Nevertheless, code reviews are good for the code and for the people writing it (since, among other things, they have the effect that at least two people understand any given piece of code). Use whatever pattern fits the project, but get into the habit of reviewing code and asking others to review yours.If you are the reviewer:
- Be prompt. If someone asked you for a code review, either do it or let them know you won't (and redirect to a more qualified reviewer) quickly. Don't make the author wait.
- Be respectful. The objective of the review is to ensure code quality, not figure out who is smarter than whom. As the reviewer, you have a great deal of power — do not abuse it.
- Be thorough. If you don't understand something, it's either not coded clearly enough or not commented thoroughly enough, or both. Ask the author to clarify (not only to you personally — in the code). If something seems wrong, it probably is, or if it's right but looks wrong, it's confusing. Bring it up.
- Enforce policies. If your project has policies or conventions (code style, naming conventions, testing, etc), ask the author to correct any violations. This may seem like nitpicking in some cases, but those policies and conventions were presumably established for a good reason, so they should be upheld.
- Be respectful. The code reviewer is your friend, and is giving you good advice. If you disagree with any of it, that may be a good subject for a constructive discussion. If the reviewer misunderstood something in your code, you probably didn't code it clearly enough.
- Don't take criticism personally. Code reviews are about improving code, not about bruising (or boosting) egos. A review will necessarily focus on the things you got wrong, because that's where the improvement can happen. The criticism in any decent code review will all be constructive (including, possibly, the constructive request to think more deeply about some aspect of the code), and if it's not, that may be a good thing to have a polite conversation about with the reviewer.
- Get reviews early and often. Few things in software are as obnoxious as having to review an enormous pile of new code — except writing one and having a reviewer discover that you did something stupid in all of it. This goes hand in hand with making and checking in small changes one at a time. Even if you have to do what seems to be a little extra work to break your piece into multiple working, reviewable chunks, the mistakes saved will pay you back a hundred-fold. Ten 100-line code reviews are much nicer to do than one 900-line one, and doing it that way will save a great deal of work finding and fixing bugs.
Refactoring is the process of rewriting a piece of code so as to leave its execution behavior intact, but improve it in some other way. Usually the purpose is to make the code clearer and more readable, or more extensible, or perhaps faster. Activities of this sort can be called refactoring at any size scale. Renaming a variable or a function is a refactoring, as is shifting around responsibilities between classes, as is splitting a 100,000-line ball of tangled dependencies into a comprehensible plugin architecture with comparatively small, independent components. Of course, the abstraction barriers that are preserved in these three scenarios are vastly different, but the theme of reworking the guts of some code without affecting its interactions with the outside world runs through all three.
Refactoring is good for the code and for the soul. Do it. There is no shame or insult in rewriting a piece of code. Think of the first version as a draft. It helps you sketch out what the problem is like, and what sorts of things you need from a solution. The existence of working code is a wonderful thing for setting up a test suite that defines what the problem you want to solve is, exactly. And then, that in hand, you can modify that initial solution and improve it. And then improve the second draft, and the third, as long as there is room for it to improve. Likewise, do not think you are wasting time if you are not changing the externally visible behavior of the code. On the contrary, by making code cleaner, you make it easier to read and comprehend, easier to maintain and expand, easier to detect and correct bugs. This investment of energy before the deadline pays off immensely when the desiderata change in the eleventh hour, as they are all too prone to do.So, how to refactor:
- Do refactor. I cannot stress the value of this enough.
- Do not mix refactoring with actual change. The statement that the behavior of the code is not supposed to be different before you started and after you finished is an immensely powerful tool in verifying and validating your work. To the extent possible, check in your refactorings as such, one by one, and check in your actual changes as separate commits.
- Test what you refactor. With all but the simplest of refactorings, if it's complicated enough to refactor, it's complicated enough to be worth testing. Conveniently, by the definition of refactoring, there is some boundary at which the code's behavior does not change. If your test suite already has sufficient tests at that boundary, great. If not, write some, and check them in before you start. It's a great way to make sure your refactoring doesn't break anything.
- Refactor in bite-size pieces. It's all too easy to say "I need to rewrite this program" and go out and try to redo it all in one fell swoop. It never works. You never get it to be the same, and then you wonder whether the difference is a bug or a feature, and what did the old code do about this anyway, etc, etc, etc. Disaster. Especially with refactoring, when the objective is, in a sense, to stay in the same place, it is always possible to break the journey down into steps small enough to validate.
- Don't be afraid of stupid-looking intermediate states. If, for example, you want to change an interface, first add support for the new one, then convert all the clients to using it one by one, and then delete the old version. There is no reason to do this all at once — spread it out over several commits if it's too big. Sure you've got some intermediate states where the code looks stupid, and some clients are doing it one way and some the other, but the program works, the tests pass, so you can verify that what you've done so far didn't break anything unexpected. Every refactoring can be decomposed like this. Look for that decomposition, and avail yourself of it to reduce your work to manageable pieces.
- Don't get stuck in stupid intermediate states. If you started a chain of refactorings, finish it. The point is to make the code cleaner, but the intermediate states will typically have all the old nastiness plus the intended solution mixed together. Don't let the program live in that kind of state for long.
- Refactor to pave the way for actual change. How often have you started doing X only to discover that you need to change Y for it to work, and to change Y the way you want, you first need to rearrange Z, and got lost and made a huge mess? We've all been there, but to go there less often, predict that you'll need this, then refactor Z first, test it, check it in, then refactor Y, check that in, and then finally do X. You'll make nicer code and fewer mistakes, and even if do you mess something up, you'll be closer to a working state that you can go back to.
- Don't be afraid to delete code. Just because someone (possibly you) spent a long time writing it doesn't mean it's the right way to do the job. The code helped to define what the job is and what challenges arise in doing it, and its author's work is respected in the continued use of that knowledge. Even the knowledge that the job does not need to be done is useful, and sometimes worth all the effort spent writing code to do it. On the other hand, dead, unused code only clutters a program and makes it harder to read and understand, so delete it. If later you decide you needed it after all, the version control system will have it safe and sound for you to recover.
- Don't comment out code you mean to delete. I have known people to comment out blocks of code they were rewriting and check them in commented out. Don't do this. Commented blocks of code are not only unhelpful but actively confusing. If you mean it as a backup and an opportunity to return to the old state, remember that the version control system does that better. If you mean it as an explanation of what the new code is supposed to do, write that explanation in English instead. Whatever your purpose, whole blocks of commented code are not the best way to achieve it.
The phrase code style refers to the myriad of
tiny, irrelevant choices we make almost without thinking when we code.
How much do we indent a subblock? Do we put parentheses around the
condition of an
if statement? Do we put braces around
for loops? Do we put spaces between the
operator and the summands? Do we compactify long-range close
parentheses or indentation-match them?  How long can a line get before
we break it? These are little things, but doing them consistently
makes a program appreciably more legible.
The world has tools for this. There are programs that will look over source code and verify that it obeys some style criteria, and programs that will even modify the source code to obey such criteria. I have never used them myself, nor been in projects that used them in earnest, so I cannot comment on their worth, but I would be remiss if I did not mention their existence. The advice below applies regardless of the use of such a program.
- Have a code style. All the little things that the compiler or interpreter doesn't care about should be done the same way throughout a project. It is not necessary to write it down at first, but do make sure that everyone knows it. As the project grows, a written document will become useful.
- Agree on the code style. Some polite, constructive discussions during the project's infancy, when it's still small enough that someone can go and carry out a change of style, could be quite helpful. There is no reason to keep doing something the way someone accidentally started doing it, if some alternative is clearly better. On the other hand, don't waste too much time. Consciously make any decisions that need making, and get on with the project.
- Respect the language's code style. If the community around the programming language(s) you are using has any code style conventions, follow them. Doing this offers the same benefits across projects as does having a code style within one project.
- Obey the code style. Even if you personally like your open curly braces on the same line, if the project has decided to put them on their own line, put them there. Consistency, and the readability it creates, are more important than personal taste.
- Maintain local consistency. If some file or component you are changing does not obey the global code style, somebody really ought to convert it, but if you don't have time for that now, obey the style prevalent around the change you are making. Local consistency is more important than global.
If any of this is new to you, I encourage you to incorporate it into your work. It may take a little while to get used to, but trust me, it's worth it. While you're figuring any of these things out, I encourage you to err on the side of overdoing rather than underdoing. By definition, not knowing about a principle constitutes a massive underdoing of it, so you haven't bracketed the problem until you find yourself in an excess of a good thing. When you know both that it is good and that you have too much, then you can trust your judgment on how much is just enough.
1. If your working copy dies, you have the repository whence to get another working copy. If your repository dies, then even if you haven't backed it up, you have a working copy (or several) from which to restore the current state and continue. It is, however, a good idea to back up the repository, so that you will be able to restore the history of the project.
2. In many programming environments it will be possible to check in experimental code in such a way that no one except the author is affected by it. Such code is not part of the build, so checking in a broken experiment does not, by definition, break the build. While this sort of thing may be necessary in some situations, I caution against abusing it — by avoiding the discipline of maintaining the build, you also avoid that discipline's benefits.
except it uses Ruby as the build specification language instead of the
ad-hoc language that
make interprets, and I find that
that advantage outweighs
rake's comparative youth and
consequent lack of some sophisticated features.
4. If, nondenominational deity forbid, your project should outgrow even the tools' capacity, then you are justified in hiring build engineers to come up with a better, tailored solution, but that is a problem that strikes corporations, not people.
5. This is the code style issue that programmers of Lisp-like languages argue about, instead of the slew of those akin to the preceding three that plague their colleagues working with other tongues.
Thanks to Gregory Marton, Tanya Khovanova, and Arthur Gleckler for commentary during the evolution of this.