Hands-On System Programming with C++
上QQ阅读APP看书,第一时间看更新

How a C program starts

One part of the standard that is relevant to system programming, but is not as widely discussed in literature, is how a C program starts. A common misconception is that a C program starts with the following two entry points:

int main(void) {}
int main(int argc, char *argv[]) {}

Although this is, in fact, the first function call that a C programmer provides, it is not the first function called when your C program starts. It is not the first code that executes either, nor is it the first code provided by the user that executes.

A lot of work is carried out, both by the operating system and the standard C environment, as well as the user, prior to the main() function ever executing. 

Let's look at how your compiler creates a simple Hello World\n example:

#include <stdio.h>

int main(void)
{
printf("Hello World\n");
}

To better understand the start up process of a C program, let's look at how this simple program is compiled:

> gcc -v scratchpad.c; ./a.out

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ...
...

By adding the -v option to GCC, we are able to see each step the compiler takes to compile our simple Hello World\n program.

To start, the compiler converts the program to a format that can be processed by gnu-as:

/usr/lib/gcc/x86_64-linux-gnu/7/cc1 -quiet -v -imultiarch x86_64-linux-gnu scratchpad.c -quiet -dumpbase scratchpad.c -mtune=generic -march=x86-64 -auxbase scratchpad -version -fstack-protector-strong -Wformat -Wformat-security -o /tmp/ccMSWHgC.s

Not only can you see how the initial compilation is performed, but you can see the default flags that your operating system provides.

Next, the compiler converts the output to an object file, as follows:

/usr/bin/x86_64-linux-gnu-as -v --64 -o /tmp/cc9oaJWV.o /tmp/ccMSWHgC.s

Finally, the last step links the resulting object files into a single executable using the collect2 utility, which is a wrapper around the linker:

/usr/lib/gcc/x86_64-linux-gnu/7/collect2 -plugin /usr/lib/gcc/x86_64-linux-gnu/7/liblto_plugin.so -plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper -plugin-opt=-fresolution=/tmp/ccWQB2Gf.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --sysroot=/ --build-id --eh-frame-hdr -m elf_x86_64 --hash-style=gnu --as-needed -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie -z now -z relro /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o -L/usr/lib/gcc/x86_64-linux-gnu/7 -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/7/../../.. /tmp/cc9oaJWV.o -lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o

There are a couple of important things to take note of here with respect to how the program is linked:

  • -lc: The use of this flag tells the linker to link in libc. Like the rest of the libraries being discussed here, we didn't tell the compiler to link against libc. By default, GCC links libc for us.
  • -lgcc_s: This is a static library that is linked automatically by GCC to provide support for compiler-specific operations including 64-bit operations on a 32-bit CPU, and facilities such as exception unwinding (a topic that will be discussed in Chapter 13, Error - Handling with Exceptions).
  • Scrt1.ocrti.o, crtbeginS.ocrtendS.o, and crtn.o: These libraries provide the code needed to start and stop your application. 

Specifically, the C run-time libraries (CRT) libraries are the libraries of interest here. These libraries provide the code that is needed to bootstrap the application, including:

  • Executing global constructors and destructors (as GCC supports constructors and destructors in C, even though this is not a standard C facility).
  • Setting up unwinding to support exception supporting. Although this is mainly needed for C++ exceptions, which are not needed in a standard C-only application, they are still needed for linking in the set jump exception logic, a topic that will be explained in Chapter 13, Error - Handling with Exceptions
  • Providing the _start function, which is the actual entry point to any C-based application using a default GCC compiler.

Finally, all these libraries are responsible for providing the main() function with the arguments that are passed to it, as well as intercepting the return value of the main() function, and executing the exit() function on your behalf, as needed.

The most important takeaway here is that the first piece of code to execute in your program is not the main() function, and if you register a global constructor, it is not the first piece of code that you provide that executes either. While system programming, if you experience issues with the initialization of your program, this is where to look first.