Before we can get started with reverse engineering, we need to understand the language required for reverse engineering: the language of the processor. In this section, we will focus on learning this language and also perform some reverse engineering.
The language of computers¶
There are several spoken languages in the world. Similarly, in computers, there are several programming languages to interact with the computer. Of these, the language understood by the processor is the assembly programming language. Compared to most programming languages, assembly programming language is very low level: a lot more details are visible at this level as we shall see shortly. Similar to how different people in the world speak different languages, different processors speak different assembly languages. Here, we will focus on the 32 bit version of language spoken by Intel, AMD and similar processors, popular known as x86 assembly language. However, most languages have similar concepts: they only differ in syntax and few other implementation details. If you know one language, it is easy to learn another assembly language with some effort.
Why do we need to learn assembly programming?
Very often in reverse engineering, we have access only to the executable: the source code is rarely available. In all these cases, we have only the assembly language instructions in the executable to work with. Thus, if you do not know assembly language, you will have a very difficult time analysing and reverse engineering the executable.
x86 assembly language is a collection of instructions, a command that instructs the processor to perform a specific action. It was created by Intel in 1972 and has been continually improved over the years. You can think of instructions as words of a language: if you put several words in a specific order, it forms meaningful sentences. Similarly, if you put several chosen instructions carefully in a specific order, it will form a meaningful computation. Despite the limited number of instructions available, the x86 assembly language can perform nearly all possible computations by combining and sequencing the instructions in a particular order.
Today, x86 assembly language is has been superseded by the bigger 64 bit version but it is a subset of 64 bit version and thus very much relevant. Additionally, several computer processors still exist that can speak only the 32 bit x86 language and thus, it is still very relevant today.
Processor vs programmer view of memory¶
As said above, there are a limited set of instructions in the assembly language. This set of instructions is called the instruction set architecture(ISA) of the x86 language or x86 ISA for short. We will not learn the entire x86 ISA but learn a handful of important instructions that account for majority of the instructions used in real world executables. By understanding these instructions and their effects, you can combine the effects of several instructions together and understand their cumulative effects. This is commonly done in reverse engineering to understand the binary.
If you have completed the C programming section, you will remember encountering several data types of varying sizes. This is the how the programmer views memory - as a collection of several integers, floats, characters, short integers, structures etc.
However, at the processor level, these data types are mostly non-existent: the processor has a very different view of memory compared to the programmer view. The processor views the memory as a collection of bytes: it has no idea if any sequence of bytes are an integer, a character array or two short integers etc.
There are only 3 possible types in the processor level, depending on the number of bytes the processor accesses: Byte(8 bits), Word(16 bits) and Double Word(32 bits). Here is a table showing the correspondence between some data types in C to types in x86 assembly:
|Data type in C||Data type in x86 assembly||Size of data type|
|int||Double word||4 bytes|
|pointers||Double word||4 bytes|
Frequently in assembly code, you will see byte, word and dword used, especially in instructions that read or write to some memory location. These words indicate the number of bytes being read or written to by the instruction. Memorize the words and what sizes they correspond to.
Registers: processor's high speed local storage¶
In addition to the main memory, processors also feature a limited number of local storage called registers. These registers are extremely fast: reading from memory is several times much slower than reading from registers. Programmers in high level languages are often unaware(or unconcerned) about existence of registers but when you come to assembly(and hence, reversing), registers become extremely visible and important. Many of the processor's instructions involve manipulating registers and sometimes, intermediate results are temporarily stored in registers. In addition, registers also play a very important role in function calls as we will see later in this series.
Registers are also extremely expensive to manufacture and hence, there are only a limited number of registers available in every process. The number of registers is fixed during manufacturing and each register is assigned a name as per the x86 assembly standard. Below is a list of all the general purpose registers(GPRs) available in x86 processors:
|Register name||Register name|
Of these above GPRs, the following have fixed purpose at all times:
- EIP: Contains the address of the next address that will be executed.
- ESP: Points to the top of the program stack. More on this later.
- EFLAGS: Contains several 1 bit flags used for decision making. More on this later.
These registers have one fixed purpose and used for no other reason(eg: store intermediate values). The other registers have specific uses in some scenarios but not always. In addition, the EIP register cannot be directly read or modified and EFLAGS register cannot be directly modified. There are specific instructions that permit reading or modifying their values, which we shall encounter later. Sometimes, even the EBP register is not available for storing intermediate results. We will discuss more about registers when we start discussing x86 assembly programs.
In addition to the above GPRs, there are several special purpose registers present for supporting a specific set of instructions. We will not be discussing them but there are several resources available online which describe and discuss these.
All of the above GPRs are 32 bits in size and accessible in x86 assembly code using their names. In addition, it is also possible to access a subset of bits in those registers using a different name. Below is a table listing which symbol can be used to access specific bits of a register:
|All 32 bits||Bits 1 to 16||Bits 9 to 16||Bits 1 to 8|
In the above list, the number of bits start from right to left. The rightmost bit is bit 1 and leftmost bit is bit 32.
"Hello world" in x86 assembly¶
Alright, we've been looking at a lot of material so now let's start looking at some example programs. Just like how every programming language lesson starts with the venerable "Hello world" program, we shall do so in x86 assembly as well. We will be writing code in the popular Intel syntax for x86 assembly and will be using the netwide assembler(nasm) and the GNU C compiler(gcc) to compile assembly programs to executables. The entire "Hello world" program x86 assembly code is included below and also available for download here.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; ; ; Author : dnivra ; ; Program : print "Hello world" using printf ; ; ; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; BITS 32 extern printf section .rodata hello_world: db "Hello, world!", 10, 0 section .text global main main: push ebp mov ebp, esp push hello_world call printf add esp, 4 mov eax, 0 mov esp, ebp pop ebp ret
This is a very simple program that simply prints the string "Hello, world!" to stdout using printf. This is an ideal and simple program to learn and understand the basics of x86 assembly. We will also develop a reverse engineering mindset through multiple example programs like this.
Compiling and running assembly programs¶
Before we start walking through the code, let's compile and run the assembly code to see the output. In order to compile the code, we will need netwide assembler(nasm) and GNU C Compiler installed. On Debian based machines(including Ubuntu), you can install the packages nasm and gcc to install both the tools.
The compilation is a two step process. In the first step, we convert the assembly source to an object file, a special intermediate file that gcc understands, using nasm. The command is as follows:
$ nasm -f elf hello-world.asm
In the above command, we ask nasm to generate an ELF object file. ELF is a standard file format used in Linux based OS. The above command will generate a file called "hello-world.o". Now, we use gcc to convert the "hello-world.o" to an executable file as follows:
$ gcc -m32 hello-world.o -o hello-world.out
The above command accepts the object file as input and produces an executable file named "hello-world.out". You can view the output of the assembly program by running "hello-world.out".
$ ./hello-world.out Hello, world!
Take your time to understand the 2 step compilation process to obtain the final executable and viewing the output. It's important to complete the above before starting the walkthrough.
Okay so now that we have executed the binary and seen the output, let's dive into the source code and try understand it better. Lines numbered 1 to 6 are comments since they started with the character ';'. Any characters after ';' is considered a comment and never processed: they are simply discarded.
In line 8, we declare that this is a 32 bit assembly program(remember we said earlier that 64 bit version also exists? 32 bit programs are not compatible with 64 bit and thus, it is good to declare upfront which version is the program compatible with).
In line 9, we declare the function printf for using in the program below. If you are familiar with extern keyword in C, it is more or less the same. If you do not know, you can think of extern as similar to the #include directive in C. In C programming, before we could use any library function, we had to include certain header files. For instance, before using printf and scanf, we had to include the header file stdio.h and similarly for other library functions. By including the headers, we tell the compiler to check in the header files for the definition of many functions. Similarly, by using extern here, we tell the compiler that the function printf is not written by us and it is present elsewhere. If you wish to use any library function, you can declare it in the code similar to printf and invoke the function just like in C.
In line 12, we inform the compiler that the next one or more lines declare contents of the rodata section of the binary. The rodata section consists of global variables and constants that cannot be modified during execution(rodata stands for "Read Only Data"). Typically, most strings hardcoded into the program are placed in the rodata section. All this was done by the compiler during compilation of C programs so you did not have to worry about how this is performed. The appendix has more information about the various sections present in the binary.
Here, the rodata section in the example program has only 1 member named "hello_world". Also, by looking at the declaration, we can infer that it is a string since it is enclosed in double quotes. There are 3 directives to declare values in nasm:
- db: Declares byte values(of size 1 byte).
- dw: Declares word values(of size 2 bytes).
- dd: Declares double word values(of size 4 bytes).
The same declaration is used for declaring array of values of the above types as well: there is no distinction. This is apparent from line 12: we are declaring a string, which is an array of characters or bytes. Thus, hello_world is actually a pointer to an array of characters, which together form the string "Hello, world". Also, you can specify which characters to include by using their ASCII value instead of typing the characters out, as shown in the declaration of hello_world. Notice at the end of the string, there are two characters 10 and 0. These are ASCII values for the new line('\n' in C) and null character('\0' in C). Thus, line 12 is equivalent to the following C code:
char *hello_world = "Hello, world!\n";
ASCII stands for American Standard Code for Information Interchange. It is a standard that maps different characters to different 7 bit numbers since numbers are easier to exchange in digital communication and was formulated in 1963. Today, Unicode, which supports many languages and bigger sized numbers(16 bits), is the more prevalent today: ASCII mappings are a subset of Unicode mappings.
That's a lot of information about just a few lines. We recommend you briefly pause here, go back and explain lines 1 to 12 on your own without looking at the description above. You can do this either verbally or by writing down your explanation as comments on the side. This is a good exercise that will help test your understanding.
Hopefully you have revised and are comfortable with everything discussed till now. Let's now take a look at the actual assembly code in the program.
Similar to line 12, line 15 is declaring that text section starts here. The text section contains the program code that you wrote i.e. all the functions that you wrote. For more information about the text section, we recommend reading the appendix. In the following line 16, we see that the symbol main is being declared as global. This is necessary so that gcc can find the function main when compiling the object file to generate the executable. If this is missing, gcc will refuse to generate the executable file.
Try removing the line 16("global main") and compiling the code to see what happens.
In line 18, we can see that the function main is declared. This is how function are defined in nasm assembly code: the name of the function followed by ':'. This main function is same as the C function main, which is the first function that gets executed. If you carefully analyse the lines that follow(lines 19 to 29), we see that the entire assembly code is divided into 3 groups of instructions. The first group is called the function prologue, the set of instructions executed at the start of the function and the final group is function epilogue, the set of instructions executed at the end of the function just before it returns to the caller. In order to understand them, we need to learn about the program stack and how it's used during execution.
Diving into the program stack¶
A data structure represents a technique or method of organizing and storing data in memory. It defines specific rules of how the data can be accessed and modified. The most popular data structures are queues and stacks.
You have probably encountered several stacks in daily life: stack of books, papers, clothes and much more! In all those stacks there are two distinct characteristics:
- You can remove an item from the top of stack. If you try to take an item from anywhere else, the stack will collapse.
- You can add an item only to the top of the stack. If you try to add an item anywhere else, the stack will collapse.
Figure: Push and pop operations on a stack visualized. Image courtesy Maxtremus from Wikimedia Commons
Program stacks in computer science have a very similar behaviour and the items they store are values needed by functions. These can be arguments, variables, constants or some other runtime values not normally visible to programmers. A program stack is an integral concept for function execution and thus, it is very essential to understand how it works and how it's used.
In x86 processors, the stack grows from higher addresses to lower memory addresses - it's fixed that way in the processor. In stack data structure, there is a variable that keeps track of the next available location on the stack to store data. In x86 processors, this is done by the ESP register. By modifying the value in ESP, the stack grows or shrinks.
Figure: Growth direction of program stack in x86. Image copyright Team bi0s.
There are two special instructions in x86 assembly for adding and removing values onto the program stack:
- push: This instruction takes 1 argument and pushes it onto the top of the stack. The argument is usually a constant or a register. Implicitly, the value of ESP is decremented(remember stack grows towards lower memory addresses).
- pop: This instruction takes 1 argument into which the top most value on the stack is stored. The argument is usually a register. Implicitly, the value of ESP is incremented(remember stack grows towards lower memory addresses).
The program stack is unique to a program: it cannot used by another programs. However, the program stack is shared by all functions with a program. Since several functions use the stack during program execution, it is important to ensure that functions are assigned non-overlapping areas in the program stack for them to use and no function can manipulate the contents of stack portion of another function. This is achieved by allocating a new stack frame for the callee function invoked by the caller on top of the caller's stack frame. Similarly, before a function returns, its stack frame is deallocated since it's no longer needed.
Figure: Stack frame creation and destruction during function calls. Image copyright Team bi0s.
In x86 assembly, the stack frame allocation and deallocation is performed using the ESP and EBP registers. Let's analyse the prologue and epilogue to understand how these registers help with this process. As per the x86 assembly language convention:
- ESP register always points to the top of the program stack.
- EBP register always points to the base of the current function's stack frame.
Thus, when a function is executing, it's stack frame might be similar to this:
Figure: Stack frame, EBP and ESP. Image copyright Team bi0s.
Let's assume the above function is invoking the function main in above example. The function calls main using the call instruction. The call instruction does two specific actions:
- Saves the address of the next instruction after the call onto the stack.
- Updates value of EIP register to the argument passed.
Thus, the stack would look like the diagram below after the call instruction is executed and control passed onto main:
Figure: Stack after call instruction. Image copyright Team bi0s.
We recommend that you attempt drawing the stack layout after each instruction to ensure you understand what is happening. You can either attempt drawing stack layout and then validating with diagrams here or draw along with the walkthrough and explain to yourself.
After the call, the first instruction of main is executed - "push ebp"(line 19). After this instruction is executed, the value of EBP register is saved on the stack as show below:
Figure: Stack after EBP is pushed onto stack. Image copyright Team bi0s.
After line 19 is executed, in line 20, the mov instruction is executed which, unlike what the name suggests, copies the value of ESP register to the EBP register. After this instruction is executed, the stack would look similar to the following:
Figure: Stack after ESP is copied to EBP. Image copyright Team bi0s.
In x86 assembly, the destination is mentioned first and then the source is mentioned. Thus, in line 20, the value of ESP is copied to EBP because EBP is destination and ESP is source.
Thus, the mov instruction allocated a new stack frame for main on top of the existing stack frame. Note that since base of main's stack frame is also the top of the stack frame, the size of main's stack frame is 0. This is because main does not use any local variables and thus doesn't need a stack frame for itself. If it were using the stack frame, then space would have been appropriately allocated by modifying the ESP register.
Now that we have a good understanding of stack frame allocation, let's look at how the stack frame is deallocated when the function is about to return. Consider the same stack frame as the diagram above and the function epilogue is about the execute(lines 27 to 29).
In line 27, the value of EBP register is copied to the ESP register. Here, it makes no difference because size of stack frame is 0. However, if stack frame were of non-zero size, then this would reduce the stack size - the entire frame would be deallocated. After this instruction, line 28 removes a 4 byte value from top of the stack and store it in the EBP register. As a result, the stack will look as follows:
Figure: Stack after EBP modified by top value on stack. Image copyright Team bi0s.
As we can see from the above diagram, the instruction restores the stack frame of main's caller. After this, the ret instruction executed(line 29). The ret is the dual of the call instruction: it removes a 4 byte value from top of the stack and stores in the EIP register. The value on top of the stack is the address of the next instruction saved by the call instruction before starting main. After this instruction is executed, main has finished executing and control to main's caller with the stack restored to the original state before main was called:
Figure: Stack after main returns to its caller. Image copyright Team bi0s.