This document was written by Greg Hudson <ghudson@mit.edu>, and is in
the public domain.

Why not to use cpp
------------------

This document has two independent parts, because there are two
contexts in which there are good reasons not to use cpp: C source and
non-C source.  The arguments are stronger for the latter case, but
there are good arguments for both.  We'll start with the stronger
case.

Why not to use cpp on files other than C source code
----------------------------------------------------

People sometimes try to use cpp as a general-purpose preprocessor,
since the basic features (#ifdef and #define) are useful in contexts
other than C code (or C++, Objective C, and assembly code, which cpp
implementations are generally designed to work with).  This approach
is fundamentally flawed, because cpp treats its input and output as
sequences of C tokens rather than sequences of lines or bytes.

Some practical problems you are likely to run into with this approach:

	* Determining the command to invoke the C preprocessor is a
	  somewhat nonportable task.  /lib/cpp works on most systems,
	  but not modern BSD variants.  Using something like "cc -E"
	  also works on most systems, but can have odd effects; for
	  instance, the Sun WorkShop cc appends an "#ident" line to
	  the output of cc -E, even if given the -P option.

	* If you're using something like "gcc -E", then the input file
	  must typically have a ".c" extension.

	* Unless you give cpp the "-P" option (which seems mostly
	  portable if you've found the actual "cpp" command; it's less
	  portable if you are using cc -E), the output will generally
	  contain line number declarations like:

		# 1 "foo"

	  intended to give a C compiler backend the information to
	  give proper error messages.  These declarations appear every
	  time the input and output line numbers need to be
	  synchronized.

	* The preprocessor may predefine symbols with everyday names
	  like "sun".  This can yield unexpected results like:

		% echo "The sun is hot" | /lib/cpp -
		# 1 "" 
		The 1 is hot

	  (The particular symbol "sun" is only predefined on a Sun
	  system, of course; if you were working on some other
	  platform to start with, you wouldn't notice this problem
	  until your work started being used on a Sun.)

	* Some versions of C preprocessors can be picky about input
	  which doesn't form a valid C token stream.  For instance,
	  on a GNU/Linux system with a modern (circa 2.95.2) version
	  of gcc:

		% echo "'blah" | /lib/cpp
		<stdin>:1:1: missing terminating ' character
		# 1 ""
		'blah'

	  (Notice the change between input and output as well as the
	  error message.)

	* Lines ending with a backslash are folded by cpp.  This can
	  be a bit of a pickle if you want to actually end an output
	  line with a backslash.

	* Some cpp implementations insert spaces into their output
	  between C tokens.  This is perfectly valid for preprocessing
	  C code, but usually a complete non-starter for non-C code.

You may run into other problems as well, and into new problems as cpp
implementations evolve.  The "missing terminating ' character" problem
only surfaced around 1999 or 2000, for instance, due to a change in
gcc.

Why not to use cpp constructs in C source
-----------------------------------------

As implied at the beginning, the arguments here are weaker than the
arguments in the previous section.  Still, using most cpp constructs
(other than #include) is often avoidable in C programs, and you may
save yourself trouble if you do.  Here's what can go wrong using
#define in particular:

	* Pollution of all namespaces.  Most identifiers in a C
	  program live in a particular namespace, such as the
	  namespace of structure tags or the namespace of identifiers
	  within a particular scope.  cpp macro names live in all
	  namespaces.  A common problem arises when a cpp macro
	  targeted for one namespace is used in another namespace; for
	  instance:

		struct hostent
		{
		  [...]
		  char **h_addr_list;
		#define h_addr  h_addr_list[0] /* for compatibility */
		};

	  Although h_addr is only intended to be used in the namespace
	  of fields of a struct hostent, it will be substituted with
	  h_addr_list[0] everywhere.

	  This problem can be mitigated by using common prefixes to
	  segregate symbols from each other (such as the "h_" prefix
	  in the above identifiers); still, it is better to avoid it
	  altogether when possible.

	* Difficulty of debugging.  Because object files rarely
	  include C preprocessor information (other than line
	  numbers), it is generally harder to debug code which makes
	  extensive use of the preprocessor.  One cannot step into a
	  complicated function macro or use a cpp-defined symbol in a
	  debugger assignment.  This problem can be partially
	  mitigated (for gdb) through the use of .gdbinit files, but
	  then those files have to be kept in sync with the
	  preprocessor macros.

	  A similar problem applies to compiler errors, since the back
	  end of the C compiler generally doesn't know much about the
	  un-preprocessed input and can't give as good error messages.

	* ABI complication.  If you use CPP function macros in
	  the header file for a library, every operation used by those
	  macros becomes part of your ABI (the binary interface used
	  by programs which link to your library), even if it is not
	  part of the API (the source-level interface).  Using real
	  live functions will result in fewer headaches when you want
	  to make changes to your library.

The use of #ifdef (or #if) is more straightforward and has few
pitfalls, but is a notorious readability killer.

Here are some ways to avoid using cpp constructs:

	* Use untagged enums instead of #defines for constant values.
	  Instead of:

		#define FOO 26
		#define BAR 33

	  use:

		enum {
		  FOO = 26,
		  BAR = 33
		};

	  It will work just as well, and a debugger will be able to
	  understand FOO and BAR.

	* Consider using inline functions instead of preprocessor
	  function macros.  Inline functions aren't portable in C89,
	  but it's not too difficult to use them portably with the
	  assistance of autoconf, and they have fewer pitfalls.  Note
	  that they still have the ABI complication problem.

	* Better yet, don't prematurely optimize.  It's
	  dishearteningly common to see preprocessor function macros
	  used for operations which will only be invoked once in the
	  lifetime of a program.

	* Try to write portable code instead of using #ifdef.  When
	  this proves impossible, at least segregate the
	  platform-specific code so that it doesn't harm the
	  readability of your central logic.

Some of these ideas in this section (but no text) were taken from _The
Practice of Programming_, a highly recommended book by Kernighan and
Pike.