The preprocessor gives a programmable way to change the text of your program before it is passed to the compiler.
To see the preprocessor in action, the commandcc -std=gnu99 -E myprogram.c will stop the toolchain after preprocessing, and output the final program text (in "pure" C) in your shell window. |
#include (Header files)
The #include directive instructs the preprocessor to replace the line
where the directive is found with the text in some file. It commonly
takes one of two forms:
#include <stdio.h>
#include "my_header.h"
The first of these instructs the preprocessor to look in the default system
path for a file called stdio.h
which define the interfaces
to the standard function libraries for input and output
(the STandarD Input/Output library). On login.stud.ntnu.no
this path is /usr/include
- if you like, you can read the
contents of stdio.h using a command like
less /usr/include/stdio.h
(press 'q' to exit the 'less' file viewer when you are done). It's contents
are not particularly readable, but at least you know that all of this text
goes into your program before it compiles.
The second form makes the preprocessor look through the directory where your code is, so you can structure your code by splitting it up into multiple files.
There are in principle no limits to what kind of contents can be included, so it is possible to split your code into two files right in the middle of an expression and glue them together with an appropriate #include. It is just a simple copy/paste operation, and the compiler is none the wiser: when the preprocessor is done, the result will be as if there was only ever one long file to begin with.
In practice, the point is to save you the work of typing or copying the same code into each and every program file (and keeping changes consistent in many places). Thus, in the name of readability, #include is exclusively used for header files ('something.h'), and the information in these files should not include any of the program logic which you want to execute when the program runs (this would result in the same code being compiled in as many places as it is included, costing compilation time and disk space for no good reason).
What should go in a header file is information on data structures, function names (prototypes), global variable names, and anything else which is of interest to make calls to the functions your code defines. In OO terms, the header file should include public information and interfaces.
#define (Macros)
The #define directive lets you define patterns of text which should be
substituted by the preprocessor. The simplest form of this is a static
pattern: after seeing
#define BUFFER_SIZE 65536
the preprocessor will slavishly substitute the text 'BUFFER_SIZE' for the
text '65536'. Symbols defined by these directives are known as
macros, and have a number of uses.
Macros can also be defined with parameters: if the preprocessor sees
#define SUMSQUARES(a,b) ((a)*(a)+(b)*(b))
and subsequently
hypotenuse = sqrt ( SUMSQUARES(x,10) );
the preprocessor substitution will pass
hypotenuse = sqrt ( ((x)*(x)+(10)*(10)) );
on to the compiler. (The compiler will subsequently rewrite the right-hand
side into sqrt(x*x+100)
, but the preprocessor only deals in
text, and not numbers.)
The preprocessor does not require that macro names are in capital letters. This is always used by convention, however, to make the distinction between function calls and macro substitutions obvious in the code.
#ifdef, #ifndef, #else, #endif (Conditional inclusion)
The #ifdef/#else directives make the preprocessor go through the macros which have been #define-d, and check whether a particular one exists. If it does, the appropriate text will be passed to the compiler, otherwise not. (#ifndef is the logical opposite, triggering when a macro is not defined.)
This is most easily illustrated by a simple example:
#ifdef FAHRENHEIT_INPUT
// If input values are on fahrenheit scale, convert them
#define CENTIGRADE(t) ((5.0/9.0)*((t)-32.0))
#else
// If input values are already on celsius scale, leave them alone
#define CENTIGRADE(t) (t)
#endif
If the code following this faithfully applies CENTIGRADE() to each of its input temperature values before calculating anything, it can peacefully do everything on the celsius scale internally. Compiling separate binaries for use with different kinds of input is then simply a matter of compiling a version where FAHRENHEIT_INPUT is defined, and one where it isn't. (There are, of course, a million ways to deal with different input formats - this is just to illustrate #ifdef.)
For the purpose of compiling many versions of the same code, it is not
necessary to compile, enter a #define directive in the code, save, and
recompile. Simple definitions can be made from the command line, using
the -D flag: compiling with
cc -std=gnu99 -o myprog -DFAHRENHEIT_INPUT myprog.c
will mean the same to the preprocessor as putting
#define FAHRENHEIT_INPUT
inside the code. In this case, even though the macro holds no text and
is not used anywhere in the program itself, the fact that it is defined
can be exploited to make the preprocessor generate different code.