This document was written by Greg Hudson , and is in the public domain. Why not to use cpp ------------------ This document has two independent parts, because there are two contexts in which there are good reasons not to use cpp: C source and non-C source. The arguments are stronger for the latter case, but there are good arguments for both. We'll start with the stronger case. Why not to use cpp on files other than C source code ---------------------------------------------------- People sometimes try to use cpp as a general-purpose preprocessor, since the basic features (#ifdef and #define) are useful in contexts other than C code (or C++, Objective C, and assembly code, which cpp implementations are generally designed to work with). This approach is fundamentally flawed, because cpp treats its input and output as sequences of C tokens rather than sequences of lines or bytes. Some practical problems you are likely to run into with this approach: * Determining the command to invoke the C preprocessor is a somewhat nonportable task. /lib/cpp works on most systems, but not modern BSD variants. Using something like "cc -E" also works on most systems, but can have odd effects; for instance, the Sun WorkShop cc appends an "#ident" line to the output of cc -E, even if given the -P option. * If you're using something like "gcc -E", then the input file must typically have a ".c" extension. * Unless you give cpp the "-P" option (which seems mostly portable if you've found the actual "cpp" command; it's less portable if you are using cc -E), the output will generally contain line number declarations like: # 1 "foo" intended to give a C compiler backend the information to give proper error messages. These declarations appear every time the input and output line numbers need to be synchronized. * The preprocessor may predefine symbols with everyday names like "sun". This can yield unexpected results like: % echo "The sun is hot" | /lib/cpp - # 1 "" The 1 is hot (The particular symbol "sun" is only predefined on a Sun system, of course; if you were working on some other platform to start with, you wouldn't notice this problem until your work started being used on a Sun.) * Some versions of C preprocessors can be picky about input which doesn't form a valid C token stream. For instance, on a GNU/Linux system with a modern (circa 2.95.2) version of gcc: % echo "'blah" | /lib/cpp :1:1: missing terminating ' character # 1 "" 'blah' (Notice the change between input and output as well as the error message.) * Lines ending with a backslash are folded by cpp. This can be a bit of a pickle if you want to actually end an output line with a backslash. * Some cpp implementations insert spaces into their output between C tokens. This is perfectly valid for preprocessing C code, but usually a complete non-starter for non-C code. You may run into other problems as well, and into new problems as cpp implementations evolve. The "missing terminating ' character" problem only surfaced around 1999 or 2000, for instance, due to a change in gcc. Why not to use cpp constructs in C source ----------------------------------------- As implied at the beginning, the arguments here are weaker than the arguments in the previous section. Still, using most cpp constructs (other than #include) is often avoidable in C programs, and you may save yourself trouble if you do. Here's what can go wrong using #define in particular: * Pollution of all namespaces. Most identifiers in a C program live in a particular namespace, such as the namespace of structure tags or the namespace of identifiers within a particular scope. cpp macro names live in all namespaces. A common problem arises when a cpp macro targeted for one namespace is used in another namespace; for instance: struct hostent { [...] char **h_addr_list; #define h_addr h_addr_list[0] /* for compatibility */ }; Although h_addr is only intended to be used in the namespace of fields of a struct hostent, it will be substituted with h_addr_list[0] everywhere. This problem can be mitigated by using common prefixes to segregate symbols from each other (such as the "h_" prefix in the above identifiers); still, it is better to avoid it altogether when possible. * Difficulty of debugging. Because object files rarely include C preprocessor information (other than line numbers), it is generally harder to debug code which makes extensive use of the preprocessor. One cannot step into a complicated function macro or use a cpp-defined symbol in a debugger assignment. This problem can be partially mitigated (for gdb) through the use of .gdbinit files, but then those files have to be kept in sync with the preprocessor macros. A similar problem applies to compiler errors, since the back end of the C compiler generally doesn't know much about the un-preprocessed input and can't give as good error messages. * ABI complication. If you use CPP function macros in the header file for a library, every operation used by those macros becomes part of your ABI (the binary interface used by programs which link to your library), even if it is not part of the API (the source-level interface). Using real live functions will result in fewer headaches when you want to make changes to your library. The use of #ifdef (or #if) is more straightforward and has few pitfalls, but is a notorious readability killer. Here are some ways to avoid using cpp constructs: * Use untagged enums instead of #defines for constant values. Instead of: #define FOO 26 #define BAR 33 use: enum { FOO = 26, BAR = 33 }; It will work just as well, and a debugger will be able to understand FOO and BAR. * Consider using inline functions instead of preprocessor function macros. Inline functions aren't portable in C89, but it's not too difficult to use them portably with the assistance of autoconf, and they have fewer pitfalls. Note that they still have the ABI complication problem. * Better yet, don't prematurely optimize. It's dishearteningly common to see preprocessor function macros used for operations which will only be invoked once in the lifetime of a program. * Try to write portable code instead of using #ifdef. When this proves impossible, at least segregate the platform-specific code so that it doesn't harm the readability of your central logic. Some of these ideas in this section (but no text) were taken from _The Practice of Programming_, a highly recommended book by Kernighan and Pike.