The Not So Secret Life of C

The idea of this class is to understand how C++ is translated into assembly (how assembly is translated into object code is left as an exercise for the student).

So first we will talk about how C is translated into assembly, because C++ is mostly a superset of C, and because it a good way to learn techniques for achieving this understanding.

Hello World

First lets look at what hello world looks like in assembly.


        

        

        

        

        
Or on the mac:
Test`helloWorld() at main.cpp:12:
0x100000c80:  pushq  %rbp
0x100000c81:  movq   %rsp, %rbp
0x100000c84:  leaq   0x2b6(%rip), %rdi         ; "Hello, World"
0x100000c8b:  popq   %rbp
0x100000c8c:  jmp    0x100000ec0               ; symbol stub for: puts
          

Generating assembly and listings

How do we do that? We run cc -s -o hello-world.s hello-world.c Or we set a breakpoint in XCode and check Debug -> Debug Workflow -> Always Show Assembly. Or we get the built objects and we run odump on them. If building with xocde they will be at /Users/tibbetts/Library/Developer/Xcode/DerivedData/Test-gklyipaixaqhfmaztcrgfnmrvniq/Build/Intermediates/Test.build/Debug/Test.build/Objects-normal/x86_64

We can also see it with source interpolated by running cc -g -c -Wa,-adhls=hello-world.listing hello-world.c The output of that will look like hello-world.listing

Our makefile looks like this:


        

        

Crash Course in x86_64 Assembly

http://www.cs.cmu.edu/~fp/courses/15213-s07/misc/asm64-handout.pdf

Registers

64 bit32 bit16 bitSecond 8bit8 bitUsage
%rax%eax%ax%ah%alReturn value
%rbx%ebx%ax%bh%blCallee saved
%rcx%ecx%cx%ch%cl4th argument
%rdx%edx%dx%dh%dl3rd argument
%rsi%esi%si%sil2nd argument
%rdi%edi%di%dil1st argument
%rbp%ebp%bp%bplBasis Pointer, Callee saved
%rsp%esp%sp%splStack pointer
%r8%r8d%r8w%r8b5th argument
%r9%r9d%r9w%r9b6th argument
%r10%r10d%r10w%r10bCallee saved
%r11%r11d%r11w%r11bUsed for linking
%r12%r12d%r12w%r12bUnused for C
%r13%r13d%r13w%r13bCallee saved
%r14%r14d%r14w%r14bCallee saved
%r15%r15d%r15w%r15bCallee saved

Operations

Operation prefixArgumentsSuffixesDescriptionExamples
movSource, DestSign or Zero extend (s/z),Size(q/l/b)Move
push/popSourceSizePush to stack %rsp
leaAddress, DestSizeLoad effective address
inc/decDestSizeIncrement/Decrement
neg/notDestSizeNegate, Complement
add/sub/imulAccumulator, ArgumentSizeAdd/Substract/Multiply
and/or/xorAccumulator, ArgumentSizeBitwise And/Or/Xor
sal/shl/sar/shrArgument, AccumulatorSizeShift Left/Right, Arith/Logical
(only different for right)
cmpArg2, Arg1SizeNumerical comparison by substraction
testArg1, Arg2SizeBitwise AND and set flags
jmpAddressBasic jump
j*AddressWhy to jump. Equal/Not Equal, Greater/Less, Zero, etc
callAddressCall subroutine
retReturn from subroutine
Control flow reference: X86 Assembly Control Flow

Types

C declarationIntel data typeGAS suffix x86-64Size (Bytes)
charByteb1
shortw2
intDouble wordl4
unsignedDouble wordl4
long intQuad wordq8
unsigned longQuad wordq8
char *Quad wordq8
float Single precisions4
doubleDouble precisiond8
long doubleExtended precisiont16

Some notes about memory

In a running process there are 5 parts to memory:
Text
The area where the program code is found.
Data
Area where initialized global variables are loaded.
BSS
Area for uninitialized global variables, part of your free store.
Heap
Area where memory allocators get their memory.
Stack
Place were stack frames and local variables are allocated.

Global Variable

We can look at how global variables are defined in assembly. They are basically symbols which point into these memory areas and never change. Sometimes they are exposed to ld, sometimes not.

        

        

        

        

        

Functions

Functions are chuncks of code that get called using the call instruction. The basic idea is that we jump into that location, run the code, and jump back to where we came from.

We use the stack to store where we came from, so that we can get back. But since the code we call might trash the registers, we also want to save some registers on the stack. And we need to pass arguments, so we put those on the stack too, unless if they don't fit in the registers.

We also pass return values back through a register, eax on Intel.

Calling Convention

Calling convention just means identifying which registers the caller saves, which registers the callee saves, where to put the arguments and where to put the return value.

        

        

        

        

        

Control Structures

Control structures such as if/else, while and do-while are all implemented in terms of conditional jumps in assembly. This is fairly straightforward.

          

          

          

          

          

Pointers

You may have noticed that assembly already deals with pointers a great deal. So it is pretty obvious how pointers and pointer arithmatic are dealt with when they are compiled.

          

          

          

          

          

Array

An array is just a pointer to the beginning of the array. To mess with a value to it, you just calculate the offset.

Structures

Like arrays, structures are just going to be pointers to the start of a memory area. To access the contents, you just calculate the offset.

Since they are of a fixed size, you can put them on the stack easily enough.

Unions

And of course Unions are like structures, but with slightly more complex sorts of offsets.


That's it for our introduction to how C is compiled. As you can see, C is very simple to compile, asssuming you aren't doing any optimization.
Fork me on GitHub