
Chapter 02 From Source to Binary
In programming, everything starts with source code. In reality, source code, which sometimes goes by the other name of the code base, usually consists of a number of text files. Within that, each of those text files contains textual instructions written in a programming language.
We know that a CPU cannot execute textual instructions. The reality is that these instructions should first be compiled (or translated) to machine-level instructions in order to be executed by a CPU, which eventually will result in a running program.
In this chapter, we go through the steps needed to get a final product out of C source code. This chapter goes into the subject in great depth, and as such we've split it into five distinct sections:
- The standard C compilation pipeline: In the first section, we are going to cover standard C compilation, the various steps in the pipeline, and how they contribute to producing the final product from C source code.
- Preprocessor: In this section, we are going to talk about the preprocessor component, which drives the preprocessing step, in greater depth.
- Compiler: In this section, we are going to have a deeper look at compilers. We will explain how compilers, driving the compilation step, produce intermediate representations from source code and then translate them into assembly language.
- Assemblers: After compilers, we also talk about assemblers, which play a significant role in translating the assembly instructions, received from compiler, into machine-level instructions. The assembler component drives the assembly step.
- Linker: In the last section, we will discuss the linker component, driving the linking step, in greater depth. The linker is a build component that finally creates the actual products of a C project. There are build errors that are specific to this component, and sufficient knowledge of the linker will help us to prevent and resolve them. We also discuss the various final products of a C project, and we will give some hints about disassembling an object file and reading its content. More than that, we discuss briefly what C++ name mangling is and how it prevents certain defects in the linking step when building C++ code.
Our discussions in this chapter are mostly themed around Unix-like systems, but we discuss some differences in other operating systems, such as Microsoft Windows.
In the first section, we need to explain the C compilation pipeline. It is vital to know how the pipeline produces the executable and library files from the source code. While there are multiple concepts and steps involved, understanding them thoroughly is vital for us if we are to be prepared for the content in both this and future chapters. Note that the various products of a C project are discussed thoroughly in the next chapter, Object Files.