8.6 Assembly Code and How a Compiler Works

8.6 Assembly Code and How a Compiler Works

If you want to use a compiler's advanced features, you should have an idea of how the compiler operates. Here is a brief summary:

  1. The compiler reads a source code file and builds an internal representation of the code inside the file. If there's a problem with the source code, the compiler states the error and exits.

  2. The compiler analyzes the internal representation and generates assembly code for the target processor.

  3. An assembler converts the assembly code into an object file.

  4. The linker gathers object files and libraries into an executable.

You may be specifically interested in steps 2 and 3 of this process. Assembly code is one step away from the raw binary machine code that the processor runs; it is a textual representation of the processor instructions. Here is an excerpt of a program in x86 assembly code:

.L5:
        movl -8(%ebp),%eax
        imull -16(%ebp),%eax
        movl -4(%ebp),%edx
        addl %eax,%edx
        movl %edx,-12(%ebp)
        incl -16(%ebp)
        jmp .L3
        .p2align 4,,7

Each line of assembly code usually represents a single instruction. To manually generate assembly code from a C source file, use the compiler's -S option:

cc -S -o prog.S prog.c

Here, prog.c is the C source file and prog.S is the assembly code output. You can turn an assembly code file into an object file with the assembler, as:

as -o prog.o prog.S

For more information about x86 assembly code, see The Art of Assembly Language [Hyde]. RISC assembly code is a little more comprehensible; see MIPS RISC Architecture [Kane]. If you are interested in how to design and implement a compiler, two good books are Compilers: Principles, Techniques, and Tools [Aho 1986] and Modern Compiler Implementation in ML [Appel].