NEW: New information includes C Pre-Processor macros, MIT copyright, Strings and Errors & Cleanup. The formatting information has been updated on using Visual Studio 6.0 and Emacs to help keep your code formatted properly.
Revision 0.6 - This is a draft. Please send feedback to pismere-team for discussion.
Comments that need to be addressed are stuck into the draft are in red.
TBD: exception handling.
NOTE: This first cut has been cobbled together from GNU and Athena coding standards. It still needs a bunch of re-org work. Please feel free to send suggestions. Currently, we only deal with C. We need to add C++-specific stuff.
Introduction | A few words to get you started |
Formatting | Formatting Your Source Code |
Header Files | Header Files |
C Pre-Processor | Using the C Pre-Processor |
Comments | Commenting Your Work |
Syntactic Conventions | Clean Use of C Constructs |
Names | Naming Variables and Functions |
Software Engineering | Software engineering practices |
Namespaces | Symbol namespaces |
Strings | Using strings |
Errors & Cleanup | Error-handling and resource cleanup |
Portability (3rd party) | Portability considerations for code we get from third parties |
MIT copyright | An example of the copyright to place in source |
We want all of our code to have a consistent coding style. For code that comes from third parties, we will generally maintain the original style of that code.
If possible, try to have no source code lines longer than 79 characters. Do not store tabs in a line after any non-tab character. Tab stops must be set to 8 characters in your editor. That should be independent from the indentation setting. The indentation setting should be 4 characters.
QUESTION: Do we want to outlaw tabs altogether?
You can use this registry file to set up Visual Studio 6.0 to have the proper tabbing defaults.
You can use Emacs to help you format your source code. First, you add
the emacs locker by running \\fqd.isnt.mit.edu\locker\addlocker
emacs
from a prompt. Then, from that prompt, start up Emacs
with the command-line: runemacs.cmd
(the .cmd is
important).
You can then hit TAB to indent the current line or select an entire
region (like, say, the entire buffer) and do M-x
indent-region
. You can also do M-x
pismre-remove-ws
to remove trailing whitespace from your files.
It is important to put the open-brace that starts the body of a C function in column zero, and avoid putting any other open-brace or open-parenthesis or open-bracket in column zero. Several tools look for open-braces in column zero to find the beginnings of C functions. These tools will not work on code not formatted that way.
It is also important for function definitions to start the name of the function in column zero. This helps people to search for function definitions, and may also help certain tools recognize them.
So as to make it easy to read, parse, and modify argument lists, we list the arguments one per line, indented at 4 characters. This makes it so that we don't have to worry about splitting arguments, etc. Thus, the proper format is this:
static char * concat( char *s1, char *s2 ) { ... }
Put a comment explaining the function before the function definition. The format is decribed in the Commenting Your Work.
Write prototypes for all of your functions. Include argument names in your prototypes, to help the reader understand what they mean. Use the formatting described above. You should simply be able to copy the function header as the prototype. Leave the variable names in the prototype so as to make the prototypes more self-documenting (and to make it so you can just simply copy-paste when updating them).
When declaring local variables, only declare one variable per line. This makes teh code easier to read.
For the body of the function, we prefer code formatted like this:
if (x < foo(y, z)) { haha = bar[4] + 5; } else { while (z) { haha += foo(z, z); z--; } return ++x + bar(); }
In an if-else clause, we always put the enclosing curly braces to reduce the possibility of error and to make the code easier to read. On a singleton if statement, the curly braces are optional.
We find it easier to read a program when it has spaces before the open parenthesis and after the commas. Especially after the commas. However, when making a function call, do not put a space before the open parenthesis. Do put spaces before the open parenthesis after "while", "if", or "for", though.
When you split an expression into multiple lines, split it after an operator, not after one. Also include extra ()s to make the operator precedence more explicit. This helps make the code easier to read. Here is the right way:
if (foo_this_is_long && and_so_is_this_variable && (bar > win(x, y, z)) && remaining_condition)
Try to avoid having two operators of different precedence at the same level of indentation. For example, don't write this:
mode = (inmode[j] == VOIDmode || GET_MODE_SIZE(outmode[j]) > GET_MODE_SIZE(inmode[j]) ? outmode[j] : inmode[j]);
Instead, use extra parentheses so that the indentation shows the nesting:
mode = ((inmode[j] == VOIDmode) || (GET_MODE_SIZE(outmode[j]) > GET_MODE_SIZE(inmode[j]))) ? outmode[j] : inmode[j]);
Insert extra parentheses so that Emacs will indent the code properly. For example, the following indentation looks nice if you do it by hand, but Emacs would mess it up:
v = rup->ru_utime.tv_sec*1000 + rup->ru_utime.tv_usec/1000 + rup->ru_stime.tv_sec*1000 + rup->ru_stime.tv_usec/1000;
But adding a set of parentheses solves the problem:
v = (rup->ru_utime.tv_sec*1000 + rup->ru_utime.tv_usec/1000 + rup->ru_stime.tv_sec*1000 + rup->ru_stime.tv_usec/1000);
Format do-while statements like this:
do { a = foo(a); } while (a > 0);
When writing return statements, do not put parentheses around the return value; just write "return foo;".
Each header file should check for and define a unique symbol to prevent it from being included multiple times during compilation. For example:
#ifndef __NameOfHeaderFile_ExtensionOfHeaderFile__ #define __NameOfHeaderFile_ExtensionOfHeaderFile__ ... #endif /* __NameOfHeaderFile_ExtensionOfHeaderFile__ */
If your header file name is not particularly unique, you can prefix it with the name of the compoment to which it belongs. Or you might add some unique string to the end.
When defining macros, use all-caps names. If you cannot do that for some reason, append _macro at the end of the macro. This will help people debugging your code to identify what are macros and what are real symbols.
PWhen writing a multi-line macro, please use the following convention:
a) a declaration macro (used in type of variable declarations)
b) a data structure builder macro (to be used in an initializer)
c) an expression
d) a single statement
#define SWAP(a, b, t) { (t) = (a); (a) = (b); (b) = (t); }
The reason to convert a block into a single statement is so that someone can write code that uses the macro as though it were a function returning void:#define SWAP(a, b, t) do { (t) = (a); (a) = (b); (b) = (t); } while(0)
If the SWAP() macro were not enclosed in do/while(0), the above code would be an error.if (...) SWAP(...); else SWAP(...);
#define SWAP(a, b, t) \ do { \ (temp) = (a); \ (a) = (b); \ (b) = (temp); \ } while(0)
Make sure you comment any pre-processor symbols or macros that you define.
Never put an #ifdef
around the start or end of a control
block. That will screw up automagic indenting tools. So, never do
something like this:
#ifdef FOO if (foo_func(x)) #else if (normal_func(x)) #endif
Instead, if you really need to do this, use:
if ( #ifdef FOO foo_test(x) #else normal_test(x) #endif )
Avoid sprinkling your code with #ifdef
s. Instead, use an
abstraction for the functionality you wish to conditionally use. For
example, instead of having the above code fragment sprikled thorughout
a source file, use the following:
#ifdef FOO #define TEST(x) foo_test(x) #else #define TEST(x) normal_test(x) #endif ... if (TEST(x))
Under Win32, _WIN32
is always defined by the compiler.
Test for that rather than WIN32
.
The pismere source tree use NODEBUG
in Makefiles to check
whether we are building debug or not. If NODEBUG
is set,
we are building release binaries. If NODEBUG
is not set,
we are building debug binaries. When debug binaries are built, our
Makefiles define _DEBUG
. For release, we define
NDEBUG
. Only check for _DEBUG
. We do this
to resemble the Platform SDK definitions.
Each C source and header file should begin with an MIT copyright statement, a comment describing what the file does. MIT copyright statements may be omitted for source code which is not distributed outside of MIT.
Every program should start with a comment saying briefly what it is for. Example: fmt - filter for simple filling of text.
Please write the comments in a program in English, because English is the one language that nearly all programmers in all countries can read. If you do not write English well, please write comments in English as well as you can, then ask other people to help rewrite them. If you can't write comments in English, please find someone to work with you and translate your comments into English.
Please put a comment on each function saying what the function does,
what sorts of arguments it gets, and what the possible values of
arguments mean and are used for. It is not necessary to duplicate in
words the meaning of the C argument declarations, if a C type is being
used in its customary fashion. If there is anything nonstandard about
its use (such as an argument of type char *
which is really the
address of the second character of a string, not the first), or any
possible values that would not work the way one would expect (such as,
that strings containing newlines are not guaranteed to work), be sure
to say so.
Also explain the significance of the return value, if there is one.
Please put two spaces after the end of a sentence in your comments, so that the Emacs sentence commands will work. Also, please write complete sentences and capitalize the first word. If a lower-case identifier comes at the beginning of a sentence, don't capitalize it! Changing the spelling makes it a different identifier. If you don't like starting a sentence with a lower case letter, write the sentence differently (e.g., "The identifier lower-case is ...").
The comment on a function is much clearer if you use the argument names to speak about the argument values. The variable name itself should be lower case, but write it in upper case when you are speaking about the value rather than the variable itself. Thus, "the inode number NODE_NUM" rather than "an inode".
Here is a simple example:
/* * concat * * Concatenates two strings and returns a pointer to the new string. * * Parameters: * * s1 - Pointer to a null-terminated string. * * s2 - Pointer to a null-terminated string. * * Returns: * * A new null-terminated string that consists of the first string * followed by the second string. If both s1 and s2 are null, null * is returned. If only one of them is null (or an empty string), the * returned string only contains the non-null string. * * Remarks: * * Use free_string to free the returned string. * * Implementation: * * We use a clever algorithm where... * */
Another big-block style you can use is
/****************************************************************************** * * concat * * ... * *****************************************************************************/
Try to avoid in-function comments, explaining things up-front when appropriate. Do put comments inline when things may be tricky.
There should be a comment on each static variable as well, like this:
/* Nonzero means truncate lines in the display; zero means continue them. */ int truncate_lines;
Every #endif
should have a comment, except in the case of
short conditionals (just a few lines) that are not nested. The
comment should state the condition of the conditional that is ending,
including its sense. #else
should have
a comment describing the condition and sense of the
code that follows. For example:
#ifdef FOO ... #else /* !FOO */ ... #endif /* !FOO */ #ifdef FOO ... #endif /* FOO */
but, by contrast, write the comments this way for a #ifndef
:
#ifndef FOO ... #else /* FOO */ ... #endif /* FOO */ #ifndef FOO ... #endif /* !FOO */
Please explicitly declare all arguments to functions. Don't omit
them just because they are int
s.
Declarations of external functions and functions to appear later in
the source file should all go in one place near the beginning of the
file (somewhere before the first function definition in the file), or
else should go in a header file. Don't put extern
declarations inside functions.
It used to be common practice to use the same local variables (with
names like tem
) over and over for different values within one
function. Instead of doing this, it is better declare a separate local
variable for each distinct purpose, and give it a name which is
meaningful. This not only makes programs easier to understand, it also
facilitates optimization by good compilers. You can also move the
declaration of each local variable into the smallest scope that includes
all its uses. This makes the program even cleaner.
Don't use local variables or parameters that shadow global identifiers.
Don't declare multiple variables in one declaration that spans lines. Start a new declaration on each line, instead. Only declare one variable per line. For example, instead of:
or this:int foo, bar;
write this:int foo, bar;
(If they are global variables, each should have a comment preceding it anyway.)int foo; int bar;
When you have an if-else
statement nested in another
if
statement, always put braces around the
if-else
. Thus, never write like this:
always like this:if (foo) if (bar) win(); else lose();
if (foo) { if (bar) { win(); } else { lose(); } }
If you have an if
statement nested inside of an
else
statement, either write else if
on one
line, like this,
with itsif (foo) { ... } else if (bar) { ...
then
-part indented like the preceding
then
-part, or write the nested if
within
braces like this:
if (foo) { ... } else { if (bar) ... }
Don't declare both a structure tag and variables or typedefs in the same declaration. Instead, declare the structure tag separately and then use it to declare the variables or typedefs.
Try to avoid assignments inside if
-conditions. For
example, don't write this:
if ((foo = (char *) malloc(sizeof *foo)) == 0) fatal("virtual memory exhausted");
instead, write this:
foo = (char *) malloc(sizeof *foo); if (foo == 0) fatal("virtual memory exhausted");
This makes the program easier to debug in a debugger.
When using a switch
statement:
switch(foo) { case A: { ... break; } case B: { ... break; } default: { ... } } /* swtich */
Don't make the program ugly to placate lint
. Please
don't insert any casts to void
. Zero without a cast is
perfectly fine as a null pointer constant, except when calling a
varargs function.
The names of global variables and functions in a program serve as comments of a sort. So don't choose terse names -- instead, look for names that give useful information about the meaning of the variable or function. In a program, names should be English, like other comments.
Local variable names can be shorter, because they are used only within one context, where (presumably) comments explain their purpose.
Please use underscores to separate words in a name, so that the
Emacs word commands can be useful within them. Stick to lower case;
reserve upper case for macros and enum
constants, and for
name-prefixes that follow a uniform convention.
For example, you should use names like
ignore_space_change_flag
; don't use names like
iCantReadThis
.
Actually, in mixed case names are not so bad. However, never use names with multiple capital letters together.
For example, never use something like DCEIsAGoodAcronym
.
Rather, use something like dce_is_a_good_acronym
or
DceIsGoodAcronym
. (In fact, you probably want something
less verbose in the first place.)
Variables that indicate whether command-line options have been specified should be named according the meaning of the option, not after the option-letter. A comment should state both the exact meaning of the option and its letter. For example,
/* Ignore changes in horizontal whitespace (-b). */ int ignore_space_change_flag;
When you want to define names with constant integer values, use
enum
rather than #define
. This makes
debugging easier.
Use filenames that will go over to UNIX easily. Remember that UNIX is case-sensitive and that ls on UNIX has a default way of sorting things. In general, keep file names lowercase, except for informational files like ChangeLog, NEWS, README, TODO, etc.
The following software engineering practices are strongly encouraged:
The following software engineering practices are discouraged:
If you are writing a library, you should pick a prefix for your library. You should ensure that all symbols which interact with the application's namespace (both at link time and on inclusion of the header file) begin with your prefix (or the all-caps form of your prefix, in the case of preprocessor symbols and enumeration constants) followed by an underscore. Symbols which are not intended for use by the application should begin with your prefix followed by two underscores.
For instance, if your prefix is "hes", then functions exported to the user should begin with "hes_" (e.g. "hes_resolve"). Functions used internally by the library should begin with "hes__" (e.g. "hes__resolve" for the internal resolver function). Preprocessor symbols intended for use by the application should begin with "HES_" (e.g. "HES_ER_OK"); preprocessor symbols not intended for use by the application should begin with "HES__" (e.g. "HES__HESIOD_H", used to protect hesiod.h from multiple inclusion).
Names of structures should begin with your prefix, but structure fields don't need to. Strictly speaking, structure fields interact with the user's namespace because the user might have "#define"'d them to something before including your header file, but applications generally shouldn't be "#define"ing lots of lowercase symbols, so this is not a real worry.
Here are a few guidelines you should follow:
A function will need to cleanup resources that it uses internally. However, a function must still be easy. It is not acceptable for a function's "normal" code path to be a sequence of nested if's that check for errors. Neither is it acceptable to embed resource cleanup at every point where an error might occur.
Rather, a function should initialize all resource variables to a known
initial state and clean up all allocated resources in a single
location. Critical errors should cause the function to execute this
single block of cleanup code. The cleanup code will have a
cleanup:
label. On an error, the main function body
should set an error flag and go to the label (i.e., goto
cleanup:
).
To properly be able to do this, the function must initialize all resources and error variables that may need to be checked or cleaned up on returning from the function. Here is an example:
int check_dir_file( char* dir, char* file ) { int result = 0; char* buffer = 0; buffer = malloc(strlen(first) + strlen(file) + sizeof(MY_PATH_SEPARATOR)); if (!buffer) { result = MY_ERROR_NO_MEMORY; goto cleanup; } /* * We know that first and file won't be changing lengths, so we can * safely write into the buffer w/o checking sizes. We choose to use * strcat() to the start of the buffer each time because we prefer to * sacrifice a little speed to have the code a little more readable. * */ buffer[0] = 0; strcat(buffer, dir); strcat(buffer, MY_PATH_SEPARATOR); strcat(buffer, file); result = check_filepath(buffer); cleanup: if (buffer) free(buffer); return result; }
The overriding principle for code obtained from third parties is to make as few changes as possible. A lot of third-party code has a very bad approach to portability, or cares about a lot of platforms we don't care about. You should attempt to follow the portability approach used by the rest of the program, such as it may be. Ideally, any changes you make should be made in such a manner that they can be incorporated into the source tree maintained by the third party.
Here is an example of the MIT copyright statement. Make sure it appears somewhere in the source of each application, service or other deliverable.
/* Copyright 2003 by the Massachusetts Institute of Technology. * * Permission to use, copy, modify, and distribute this * software and its documentation for any purpose and without * fee is hereby granted, provided that the above copyright * notice appear in all copies and that both that copyright * notice and this permission notice appear in supporting * documentation, and that the name of M.I.T. not be used in * advertising or publicity pertaining to distribution of the * software without specific, written prior permission. * M.I.T. makes no representations about the suitability of * this software for any purpose. It is provided "as is" * without express or implied warranty. */