1. Preprocessing: fix the source.
The first thing the compiler needs to do is fix the source. It needs to add in
any extra header files it’s been told about using the #include directive.
It might also need to expand or skip over some sections of the program.
Once it’s done, the source code will be ready for the actual compilation.
2. Compilation: translate into assembly.
The C programming language probably seems pretty low level, but
the truth is it’s not low level enough for the computer to understand. The
computer only really understands very low-level machine code
instructions, and the first step to generate machine code is to convert
the C source code into assembly language symbols like this:
movq -24(%rbp), %rax
movzbl (%rax), %eax
movl %eax, %edx
Looks pretty obscure? Assembly language describes the individual
instructions the central processor will have to follow when running the
program. The C compiler has a whole set of recipes for each of the
different parts of the C language. These recipes will tell the compiler how
to convert an if statement or a function call into a sequence of assembly
language instructions. But even assembly isn’t low level enough for the
computer. That’s why it needs…
3. Assembly: generate the object code.
The compiler will need to assemble the symbol codes into machine or
object code. This is the actual binary code that will be executed by
the circuits inside the CPU.
10010101 00100101 11010101 01011100
So are you all done? After all, you’ve taken the original C source code
and converted it into the 1s and 0s that the computer’s circuits need.
But no, there’s still one more step. If you give the computer several files
to compile for a program, the compiler will generate a piece of object
code for each source file. But in order for these separate object files to
form a single executable program, one more thing has to occur…
4. Linking: put it all together.
Once you have all of the separate pieces of object code, you need to
fit them together like jigsaw pieces to form the executable program.
The compiler will connect the code in one piece of object code that
calls a function in another piece of object code. Linking will also make
sure that the program is able to call library code properly. Finally, the
program will be written out into the executable program file using
a format that is supported by the operating system. The file format
is important, because it will allow the operating system to load the
program into memory and make it run.