Pismere Coding Standards

by Danilo Almeida

NEW: New information includes C Pre-Processor macros, MIT copyright, Strings and Errors & Cleanup. The formatting information has been updated on using Visual Studio 6.0 and Emacs to help keep your code formatted properly.

Revision 0.6 - This is a draft. Please send feedback to pismere-team@mit.edu for discussion.

Comments that need to be addressed are stuck into the draft are in red.

TBD: exception handling.

NOTE: This first cut has been cobbled together from GNU and Athena coding standards. It still needs a bunch of re-org work. Please feel free to send suggestions. Currently, we only deal with C. We need to add C++-specific stuff.

Introduction A few words to get you started
Formatting Formatting Your Source Code
Header Files Header Files
C Pre-Processor Using the C Pre-Processor
Comments Commenting Your Work
Syntactic Conventions Clean Use of C Constructs
Names Naming Variables and Functions
Software Engineering Software engineering practices
Namespaces Symbol namespaces
Strings Using strings
Errors & Cleanup Error-handling and resource cleanup
Portability (3rd party) Portability considerations for code we get from third parties
MIT copyright An example of the copyright to place in source


Introducton

We want all of our code to have a consistent coding style. For code that comes from third parties, we will generally maintain the original style of that code.


Formatting Your Source Code

If possible, try to have no source code lines longer than 79 characters. Do not store tabs in a line after any non-tab character. Tab stops must be set to 8 characters in your editor. That should be independent from the indentation setting. The indentation setting should be 4 characters.

QUESTION: Do we want to outlaw tabs altogether?

Putting Visual Studio 6.0 to Work for You

You can use this registry file to set up Visual Studio 6.0 to have the proper tabbing defaults.

Putting Emacs to Work for You

You can use Emacs to help you format your source code. First, you add the emacs locker by running \\fqd.isnt.mit.edu\locker\addlocker emacs from a prompt. Then, from that prompt, start up Emacs with the command-line: runemacs.cmd (the .cmd is important).

You can then hit TAB to indent the current line or select an entire region (like, say, the entire buffer) and do M-x indent-region. You can also do M-x pismre-remove-ws to remove trailing whitespace from your files.

Back to Formatting

It is important to put the open-brace that starts the body of a C function in column zero, and avoid putting any other open-brace or open-parenthesis or open-bracket in column zero. Several tools look for open-braces in column zero to find the beginnings of C functions. These tools will not work on code not formatted that way.

It is also important for function definitions to start the name of the function in column zero. This helps people to search for function definitions, and may also help certain tools recognize them.

So as to make it easy to read, parse, and modify argument lists, we list the arguments one per line, indented at 4 characters. This makes it so that we don't have to worry about splitting arguments, etc. Thus, the proper format is this:

static char *
concat(
    char *s1,
    char *s2
    )
{
    ...
}

Put a comment explaining the function before the function definition. The format is decribed in the Commenting Your Work.

Write prototypes for all of your functions. Include argument names in your prototypes, to help the reader understand what they mean. Use the formatting described above. You should simply be able to copy the function header as the prototype. Leave the variable names in the prototype so as to make the prototypes more self-documenting (and to make it so you can just simply copy-paste when updating them).

When declaring local variables, only declare one variable per line. This makes teh code easier to read.

For the body of the function, we prefer code formatted like this:

    if (x < foo(y, z))
    {
        haha = bar[4] + 5;
    }
    else
    {
        while (z)
        {
            haha += foo(z, z);
            z--;
        }
        return ++x + bar();
    }

In an if-else clause, we always put the enclosing curly braces to reduce the possibility of error and to make the code easier to read. On a singleton if statement, the curly braces are optional.

We find it easier to read a program when it has spaces before the open parenthesis and after the commas. Especially after the commas. However, when making a function call, do not put a space before the open parenthesis. Do put spaces before the open parenthesis after "while", "if", or "for", though.

When you split an expression into multiple lines, split it after an operator, not after one. Also include extra ()s to make the operator precedence more explicit. This helps make the code easier to read. Here is the right way:

    if (foo_this_is_long && and_so_is_this_variable && (bar > win(x, y, z)) &&
        remaining_condition)

Try to avoid having two operators of different precedence at the same level of indentation. For example, don't write this:

    mode = (inmode[j] == VOIDmode ||
            GET_MODE_SIZE(outmode[j]) > GET_MODE_SIZE(inmode[j]) ?
            outmode[j] : inmode[j]);

Instead, use extra parentheses so that the indentation shows the nesting:

    mode = ((inmode[j] == VOIDmode) ||
            (GET_MODE_SIZE(outmode[j]) > GET_MODE_SIZE(inmode[j]))) ?
            outmode[j] : inmode[j]);

Insert extra parentheses so that Emacs will indent the code properly. For example, the following indentation looks nice if you do it by hand, but Emacs would mess it up:

    v = rup->ru_utime.tv_sec*1000 + rup->ru_utime.tv_usec/1000 +
        rup->ru_stime.tv_sec*1000 + rup->ru_stime.tv_usec/1000;

But adding a set of parentheses solves the problem:

    v = (rup->ru_utime.tv_sec*1000 + rup->ru_utime.tv_usec/1000 +
         rup->ru_stime.tv_sec*1000 + rup->ru_stime.tv_usec/1000);

Format do-while statements like this:

    do
    {
        a = foo(a);
    }
    while (a > 0);

When writing return statements, do not put parentheses around the return value; just write "return foo;".


Header Files

Each header file should check for and define a unique symbol to prevent it from being included multiple times during compilation. For example:

#ifndef __NameOfHeaderFile_ExtensionOfHeaderFile__
#define __NameOfHeaderFile_ExtensionOfHeaderFile__
...
#endif /* __NameOfHeaderFile_ExtensionOfHeaderFile__ */

If your header file name is not particularly unique, you can prefix it with the name of the compoment to which it belongs. Or you might add some unique string to the end.


Using the C Pre-Processor

Macros

When defining macros, use all-caps names. If you cannot do that for some reason, append _macro at the end of the macro. This will help people debugging your code to identify what are macros and what are real symbols.

PWhen writing a multi-line macro, please use the following convention:

  1. In general, a macro should be one of the following:
    a) a declaration macro (used in type of variable declarations)
    b) a data structure builder macro (to be used in an initializer)
    c) an expression
    d) a single statement
  2. A macro should NOT terminate in a semicolon.

  3. If you have a macro that looks like a block, you can convert it into a single C statement by enclosing the block with a do/while(0):
    #define SWAP(a, b, t) { (t) = (a); (a) = (b); (b) = (t); }
    

    would then become:
    #define SWAP(a, b, t) do { (t) = (a); (a) = (b); (b) = (t); } while(0)
    
    The reason to convert a block into a single statement is so that someone can write code that uses the macro as though it were a function returning void:
    if (...)
        SWAP(...);
    else
        SWAP(...);
    
    If the SWAP() macro were not enclosed in do/while(0), the above code would be an error.

  4. In general, a source code macro should be as readable as the corresponding source code (according to our coding standards). This means that you will need to break up multi-statement macros into multiple lines. To do line continuations, use a backslash at the end of each line. Line up all the backslashes in the same column. Using the above SWAP() macro as an example:
    #define SWAP(a, b, t) \
       do {               \
           (temp) = (a);  \
           (a) = (b);     \
           (b) = (temp);  \
       } while(0)
    

Make sure you comment any pre-processor symbols or macros that you define.

Defines

Never put an #ifdef around the start or end of a control block. That will screw up automagic indenting tools. So, never do something like this:

#ifdef FOO
    if (foo_func(x))
#else
    if (normal_func(x))
#endif

Instead, if you really need to do this, use:

    if (
#ifdef FOO
        foo_test(x)
#else
        normal_test(x)
#endif
        )

Avoid sprinkling your code with #ifdefs. Instead, use an abstraction for the functionality you wish to conditionally use. For example, instead of having the above code fragment sprikled thorughout a source file, use the following:

#ifdef FOO
#define TEST(x) foo_test(x)
#else
#define TEST(x) normal_test(x)
#endif

...

    if (TEST(x))

Under Win32, _WIN32 is always defined by the compiler. Test for that rather than WIN32.

The pismere source tree use NODEBUG in Makefiles to check whether we are building debug or not. If NODEBUG is set, we are building release binaries. If NODEBUG is not set, we are building debug binaries. When debug binaries are built, our Makefiles define _DEBUG. For release, we define NDEBUG. Only check for _DEBUG. We do this to resemble the Platform SDK definitions.


Commenting Your Work

Each C source and header file should begin with an MIT copyright statement, a comment describing what the file does. MIT copyright statements may be omitted for source code which is not distributed outside of MIT.

Every program should start with a comment saying briefly what it is for. Example: fmt - filter for simple filling of text.

Please write the comments in a program in English, because English is the one language that nearly all programmers in all countries can read. If you do not write English well, please write comments in English as well as you can, then ask other people to help rewrite them. If you can't write comments in English, please find someone to work with you and translate your comments into English.

Please put a comment on each function saying what the function does, what sorts of arguments it gets, and what the possible values of arguments mean and are used for. It is not necessary to duplicate in words the meaning of the C argument declarations, if a C type is being used in its customary fashion. If there is anything nonstandard about its use (such as an argument of type char * which is really the address of the second character of a string, not the first), or any possible values that would not work the way one would expect (such as, that strings containing newlines are not guaranteed to work), be sure to say so.

Also explain the significance of the return value, if there is one.

Please put two spaces after the end of a sentence in your comments, so that the Emacs sentence commands will work. Also, please write complete sentences and capitalize the first word. If a lower-case identifier comes at the beginning of a sentence, don't capitalize it! Changing the spelling makes it a different identifier. If you don't like starting a sentence with a lower case letter, write the sentence differently (e.g., "The identifier lower-case is ...").

The comment on a function is much clearer if you use the argument names to speak about the argument values. The variable name itself should be lower case, but write it in upper case when you are speaking about the value rather than the variable itself. Thus, "the inode number NODE_NUM" rather than "an inode".

Here is a simple example:

/*
 *  concat
 *
 *    Concatenates two strings and returns a pointer to the new string.
 *
 *  Parameters:
 *
 *    s1 - Pointer to a null-terminated string.
 *
 *    s2 - Pointer to a null-terminated string.
 *
 *  Returns:
 *
 *    A new null-terminated string that consists of the first string 
 *    followed by the second string.  If both s1 and s2 are null, null
 *    is returned.  If only one of them is null (or an empty string), the 
 *    returned string only contains the non-null string.
 *
 *  Remarks:
 *
 *    Use free_string to free the returned string.
 *
 *  Implementation:
 *
 *    We use a clever algorithm where...
 *
 */

Another big-block style you can use is

/******************************************************************************
 *
 *  concat
 *
 *  ...
 *
 *****************************************************************************/

Try to avoid in-function comments, explaining things up-front when appropriate. Do put comments inline when things may be tricky.

There should be a comment on each static variable as well, like this:

/* Nonzero means truncate lines in the display;
   zero means continue them.  */
int truncate_lines;

Every #endif should have a comment, except in the case of short conditionals (just a few lines) that are not nested. The comment should state the condition of the conditional that is ending, including its sense. #else should have a comment describing the condition and sense of the code that follows. For example:

#ifdef FOO
  ...
#else /* !FOO */
  ...
#endif /* !FOO */
#ifdef FOO
  ...
#endif /* FOO */

but, by contrast, write the comments this way for a #ifndef:

#ifndef FOO
  ...
#else /* FOO */
  ...
#endif /* FOO */
#ifndef FOO
  ...
#endif /* !FOO */

Clean Use of C Constructs

Please explicitly declare all arguments to functions. Don't omit them just because they are ints.

Declarations of external functions and functions to appear later in the source file should all go in one place near the beginning of the file (somewhere before the first function definition in the file), or else should go in a header file. Don't put extern declarations inside functions.

It used to be common practice to use the same local variables (with names like tem) over and over for different values within one function. Instead of doing this, it is better declare a separate local variable for each distinct purpose, and give it a name which is meaningful. This not only makes programs easier to understand, it also facilitates optimization by good compilers. You can also move the declaration of each local variable into the smallest scope that includes all its uses. This makes the program even cleaner.

Don't use local variables or parameters that shadow global identifiers.

Don't declare multiple variables in one declaration that spans lines. Start a new declaration on each line, instead. Only declare one variable per line. For example, instead of:

int    foo,
       bar;
or this:
int foo, bar;
write this:
int foo;
int bar;
(If they are global variables, each should have a comment preceding it anyway.)

When you have an if-else statement nested in another if statement, always put braces around the if-else. Thus, never write like this:

    if (foo)
        if (bar)
            win();
        else
            lose();
always like this:
    if (foo)
    {
        if (bar)
	{
            win();
	} else {
           lose();
	}
    }

If you have an if statement nested inside of an else statement, either write else if on one line, like this,

    if (foo)
    {
        ...
    }
    else if (bar)
    {
        ...
with its then-part indented like the preceding then-part, or write the nested if within braces like this:
    if (foo)
    {
        ...
    } else {
        if (bar)
            ...
    }

Don't declare both a structure tag and variables or typedefs in the same declaration. Instead, declare the structure tag separately and then use it to declare the variables or typedefs.

Try to avoid assignments inside if-conditions. For example, don't write this:

    if ((foo = (char *) malloc(sizeof *foo)) == 0)
        fatal("virtual memory exhausted");

instead, write this:

    foo = (char *) malloc(sizeof *foo);
    if (foo == 0)
        fatal("virtual memory exhausted");

This makes the program easier to debug in a debugger.

When using a switch statement:

    switch(foo) {
    case A:
    {
	...
	break;
    }
    case B:
    {
	...
	break;
    }
    default:
    {
	...
    } 
    } /* swtich */

Don't make the program ugly to placate lint. Please don't insert any casts to void. Zero without a cast is perfectly fine as a null pointer constant, except when calling a varargs function.


Naming Variables and Functions

The names of global variables and functions in a program serve as comments of a sort. So don't choose terse names -- instead, look for names that give useful information about the meaning of the variable or function. In a program, names should be English, like other comments.

Local variable names can be shorter, because they are used only within one context, where (presumably) comments explain their purpose.

Please use underscores to separate words in a name, so that the Emacs word commands can be useful within them. Stick to lower case; reserve upper case for macros and enum constants, and for name-prefixes that follow a uniform convention.

For example, you should use names like ignore_space_change_flag; don't use names like iCantReadThis.

Actually, in mixed case names are not so bad. However, never use names with multiple capital letters together.

For example, never use something like DCEIsAGoodAcronym. Rather, use something like dce_is_a_good_acronym or DceIsGoodAcronym. (In fact, you probably want something less verbose in the first place.)

Variables that indicate whether command-line options have been specified should be named according the meaning of the option, not after the option-letter. A comment should state both the exact meaning of the option and its letter. For example,

/* Ignore changes in horizontal whitespace (-b).  */
int ignore_space_change_flag;

When you want to define names with constant integer values, use enum rather than #define. This makes debugging easier.

Use filenames that will go over to UNIX easily. Remember that UNIX is case-sensitive and that ls on UNIX has a default way of sorting things. In general, keep file names lowercase, except for informational files like ChangeLog, NEWS, README, TODO, etc.


Software engineering practices

The following software engineering practices are strongly encouraged:

The following software engineering practices are discouraged:


Symbol namespaces

If you are writing a library, you should pick a prefix for your library. You should ensure that all symbols which interact with the application's namespace (both at link time and on inclusion of the header file) begin with your prefix (or the all-caps form of your prefix, in the case of preprocessor symbols and enumeration constants) followed by an underscore. Symbols which are not intended for use by the application should begin with your prefix followed by two underscores.

For instance, if your prefix is "hes", then functions exported to the user should begin with "hes_" (e.g. "hes_resolve"). Functions used internally by the library should begin with "hes__" (e.g. "hes__resolve" for the internal resolver function). Preprocessor symbols intended for use by the application should begin with "HES_" (e.g. "HES_ER_OK"); preprocessor symbols not intended for use by the application should begin with "HES__" (e.g. "HES__HESIOD_H", used to protect hesiod.h from multiple inclusion).

Names of structures should begin with your prefix, but structure fields don't need to. Strictly speaking, structure fields interact with the user's namespace because the user might have "#define"'d them to something before including your header file, but applications generally shouldn't be "#define"ing lots of lowercase symbols, so this is not a real worry.


Strings

Here are a few guidelines you should follow:

  1. In general, a function should dynamically allocate strings rather than having fixed arbitrary sizes for strings. This applies to buffers used inside a function, buffers passed into a function (i.e., the function should have a buffer size argument), and buffers returned by a function (i.e., the function allocates the buffer and the caller calls a function to free the buffer after the buffer is no longer needed).

  2. If a function is using fixed-size buffers, the function must check that it does not write past the end of the buffer no matter what the caller feeds the function. This can be done by using the "n" versions of the string handling routines and keeping track of buffer sizes when doing string operations. Failing to do so can result in unexpected crashes and even security holes.

Errors & Cleanup

A function will need to cleanup resources that it uses internally. However, a function must still be easy. It is not acceptable for a function's "normal" code path to be a sequence of nested if's that check for errors. Neither is it acceptable to embed resource cleanup at every point where an error might occur.

Rather, a function should initialize all resource variables to a known initial state and clean up all allocated resources in a single location. Critical errors should cause the function to execute this single block of cleanup code. The cleanup code will have a cleanup: label. On an error, the main function body should set an error flag and go to the label (i.e., goto cleanup:).

To properly be able to do this, the function must initialize all resources and error variables that may need to be checked or cleaned up on returning from the function. Here is an example:

int
check_dir_file(
    char* dir,
    char* file
    )
{
    int result = 0;
    char* buffer = 0;

    buffer = malloc(strlen(first) + strlen(file) + sizeof(MY_PATH_SEPARATOR));
    if (!buffer)
    {
        result = MY_ERROR_NO_MEMORY;
        goto cleanup;
    }
    /*
     * We know that first and file won't be changing lengths, so we can
     * safely write into the buffer w/o checking sizes.  We choose to use
     * strcat() to the start of the buffer each time because we prefer to
     * sacrifice a little speed to have the code a little more readable.
     * 
     */
    buffer[0] = 0;
    strcat(buffer, dir);
    strcat(buffer, MY_PATH_SEPARATOR);
    strcat(buffer, file);
    result = check_filepath(buffer);

  cleanup:
    if (buffer)
        free(buffer);
    return result;
}

Portability considerations for code we get from third parties

The overriding principle for code obtained from third parties is to make as few changes as possible. A lot of third-party code has a very bad approach to portability, or cares about a lot of platforms we don't care about. You should attempt to follow the portability approach used by the rest of the program, such as it may be. Ideally, any changes you make should be made in such a manner that they can be incorporated into the source tree maintained by the third party.


An example MIT copyright

Here is an example of the MIT copyright statement. Make sure it appears somewhere in the source of each application, service or other deliverable.

/* Copyright 2003 by the Massachusetts Institute of Technology.
 *
 * Permission to use, copy, modify, and distribute this
 * software and its documentation for any purpose and without
 * fee is hereby granted, provided that the above copyright
 * notice appear in all copies and that both that copyright
 * notice and this permission notice appear in supporting
 * documentation, and that the name of M.I.T. not be used in
 * advertising or publicity pertaining to distribution of the
 * software without specific, written prior permission.
 * M.I.T. makes no representations about the suitability of
 * this software for any purpose.  It is provided "as is"
 * without express or implied warranty.
 */