Appendix A: A Five-Minute Introduction to C

This Appendix provides a brief introduction to that part of the C programming language as used in the examples presented in this book. This appendix is not a reference manual, nor does it describe all of C. Rather, it assumes that you are familiar with other programming languages in general and just need a nudge or two to follow the examples presented in this book. If you want to learn more about the C language, consult ANSI (1990) or Kernighan (1978). If you are interested in C, you should also look into the C++ language.

The best way to think of the C programming language is to consider it as providing the programmer with a mechanism for allocating and naming memory, control structure, and an extension mechanism. Many people consider it barely a high-level language. Perhaps for that very reason, it is ideally suited for systems and utilities such as text editors.

The declaration mechanism provides the programmer a way to allocate and name memory. Data types are oriented around what is best suited to the hardware.

The language statements provide a control structure mechanism. All of the usual control structures are available. In addition, a full suite of arithmetic and bit operators is available, again focusing around what is best suited to the hardware.

The procedure definition and call mechanism provide the extension mechanism. Many "standard" C operations such as string copy and input/output are implemented in terms of this mechanism.

Comments are enclosed in "/* ... */".

Case Conventions

The names in this book follow the convention that UPPERCASE names are pre-defined constants, MixedCase names are procedures, and lowercase names are variables. Again, these are conventions, not requirements.

Data Types and Declarations

All variables and procedures used are declared. Declarations are of the following form:

type variable;

The language supports the following data types:

char: The variable holds one character. Characters are typically 8 bits wide. They may either be signed (range -128 to +127) or unsigned (range 0 to 255) at the implementation's discretion.
int: The variable holds an integer of a size convenient for the hardware. This size is typically 16 or 32 bits.
float: The variable holds a floating point number.
FILE *: The variable holds a file descriptor.
struct name { <list of declarations> };: The declarations are combined into a single, larger data type named name.
type (*name)();: The name is the address of a procedure that returns a value of type type.
void: No value. Used to indicate that a procedure does not return anything or accepts no arguments.

A declaration of the form

type *

means that the variable holds the address of an object of the specified type. The variable is called a pointer to the specified type. A declaration of the form

type name[constant]

means that the variable holds an array of objects of the specified data type. The array is constant object long. The form

type name[constant1][constant2]

is used for a two-dimensional array.

The following data types are not part of the language, but represent types used in examples. An implementation would define these in terms of existing-language data types.

FLAG: The variable holds a True or False value. (In C a 0 value is considered to be False and any non-zero value is considered to be True.)
status: The variable holds a success or failure status value. This value may include warning or error information.
location: The variable holds a value that represents a point or mark location within a buffer.
time: The variable holds a value that represents the time date and time.
private: You get to define this.

Constants

Integers are written as themselves (e.g., "56" means the value fifty-six).

Hexadecimal constants are written in the form "0x##", where the ##s are hexadecimal digits.

Character strings are enclosed in double quotes (" "). A NUL-terminator (a byte of decimal value 0) is automatically appended to the string by the compiler. Character strings are considered to have the type "array of char."

Character constants are enclosed in single quotes (''). They are automatically converted to integers whose value is that of the character. For example:

"a": is an array of two characters, consisting of the characters 'a' and NUL (values 97 and 0 decimal, assuming ASCII).
'a': is an integer whose value is 97, assuming ASCII.

while:

"abc" is an array of four characters, consisting of the characters 'a', 'b', 'c', and NUL (values 97, 98, 99, and 0 decimal, assuming ASCII).

The form 'abc' is officialy undefined (some compilers might consider this as an integer whose value is 97 * 65536 + 98 * 256 + 99, but don't count on it).

The character '\b' refers to the ASCII Back Space character (8 decimal).

Note: the "" / '' syntax is used in all code excerpts. However, the normal English typographical conventions of using "" are followed in the body of the text.

Pre-defined Constants

NEWLINE: The character string that represents a system-specific newline, written "\n".
NL: The character that represents a newline, 10 decimal.
SP: The space character, 32 decimal.
TAB: The horizontal tab character, 9 decimal.
NUL: The nul character, 0 decimal. Character strings are terminated by this character.
NULL: The null pointer: no data object can be at this address.
BUFFERNAMEMAX: The size of the longest possible buffer name plus 1 for the trailing NUL. Possibly 33.
FILENAMEMAX: The size of the longest possible file name plus 1 for the trailing NUL. Typically 1,025.

Procedure Structure

Procedures have the following structure:

type Name(<arguments>)
		{
		<local variables>

		<statements>
		}

The procedure is named Name and returns data of type type (type can be a structure or pointer as well as a basic type). The argument list contains a list of declarations or the keyword void if the procedure takes no arguments. The local variables are then declared (and may be initialized at each procedure invocation). Last are the procedure statements.

Statements

The statements are the usual ones. A semi-colon (";") terminates a statement. Comments start with "/*" and end with "*/". Statements can be grouped with "{" and "}" characters, so the sequence

	{
	<statement 1>
	<statement 1>
		...
	<statement n>
	}

is equivalent to one statement. White space and columns are not significant.

if (condition) then-statement

if (condition) then-statement
else else-statement

for (initializer; end-test; increment)

	statements
			execute the initializer, then the end test, then
			repeat the statements, the increment, and end test
			until the end test becomes True

	break;	exit a loop immediately

	continue;	skip the rest of the loop body, but don't
			 exit the loop

while (end-test) statements
			repeat the end test and statements until the end
			test becomes True

for (;;) statements
			repeat the statements forever: a break, continue,
			or return statement is used to exit the loop

switch (expression) {

case LABEL1:
	statements
	break;

case LABEL2:
	statements
	break;

	...

default:
	statements
	break;
	}
			execute the statements after the label whose value
			matches the expression, or the statements after
			"default" if present and no label matches

return(expression);
				return a value from a procedure

Operators

The (possibly unusual) language operators are these:

=: assignment; not test for equality
==: test for equality; not assignment
!=: test for not equal
!: logical negation: !FALSE becomes TRUE
a+b: returns the sum of a and b
a-b: returns the difference of a and b
a*b: returns the product of a and b
a/b: returns a divided by b
a%b: returns a modulo b
a&b: returns the bitwise and of a and b
a|b: returns the bitwise or of a and b
a^b: returns the bitwise exclusive or of a and b

The construct "a @= b" where "@" is any of the operators "+" through "^" does the same as "a = a @ b", except that "a" is only evaluated once.

Operators that return True / False results return 1 for True and 0 for False. However, when a True / False value is required (say, in an if-condition), any non-zero value means True and zero means False.

&v: returns the address of v
s.m: selects member m of the structure s (s is of type "struct")
p->m: selects member m of the structure pointed to by p (p is of type "struct *")
++v: increment the value in v and return the new value
v++: increment the value in v and return the pre-increment value
--v: decrement the value in v and return the new value
v--: decrement the value in v and return the pre-decrement value

The construct (type)expression (called a "cast") converts the value returned by the expression to the specified type.

Standard Library Functions Used in This Book

fclose(<fileptr>) closes a file opened earlier with fopen.

fgets(<buffer>, <length>, <fileptr>) reads one line from a file opened earlier with fopen.

fopen(<name>, <mode>) opens a file for reading. A <mode> of "r" means "read-only."

free(<ptr>) frees memory previously allocated by malloc.

isprint(<key>) returns True if <key> is a printing character or False if not.

malloc(<size>) allocates a block of memory at least <size> characters long.

memmove(<to>, <from>, <length>) moves <length> characters from <from> to <to>, working properly if the areas overlap.

memset(<start>, <char>, <length>) sets <length> characters starting from <start> to the character <char>.

printf(<format>) or printf(<format>, <arg>) copies the characters from the format string to the screen "as is" until a '%' character is encountered. The sequence "%s" means to take the next argument and send it as a string to the screen. The sequence "%c" means to take the next argument and send it as a single character to the screen. (The routine does a lot more, but the examples in this book don't use the extra functionality.)

strcpy(<to>, <from>) copy the from string to the to string.

strlen(<sting>) returns the number of characters in the string, not counting the terminating NUL.

Non-Standard Library Functions Used in This Book

Fatal(<message>) handles a fatal error.

xiswhite(<c>) returns True if the supplies character is a white-space character (space or tab) or False if not.

xstrcpy(<to>, <from>) works like the C strcpy routine to copy one string to another, but is defined to work properly if the strings overlap.

Back to Contents.

Back to Home.