intro-buffer-overflow.md 13 KB


title: Introduction to Buffer Overflow show-content: 1

layout: console

Introduction

In computer security and programming, a bufer overflow or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boudnary and overwrites adjacent memory. This is a special case of violation of memory safety.

Buffer overflows can be triggered by inputs that are designed to execute code, or alter the way the program operates. This may result in erratic program behaviour, including memory access errors, incorrect results, a crash or a breach of system security. They ar, thus the basis of many software vulnerabilities and me maliciously exploited.

Programming languages commonly associated with buffer overflows include C and C++, which provide no built-in protection against accessing or overwriting data in any part of memory and do not automatically check that data written to an array (the built-in buffer type) is within the boundaries of that array. Bounds checking can prevent buffer overflows.

You probably have a lot of questions:

What is the reason of this crazy bug?

It's an error of programming.

Is it important to learn or is it just a small and stupid exploit?

This type of exploit is what makes the difference between professional and normal hackers. I will explain it later.

Can you give me a technical description?

A buffer overflow occurs when data written to a buffer, due to insufficient bounds checking, corrupts data values in memory, addresses adjacent to the allocated buffer. Most commonly this occurs when copying strings from one buffer to another.

Basic example

In the following example, a program has defined two data items which are adjacent in memory: an 8-byte-long string buffer (A) and a two-byte integer (B). Initially, A contains nothing but zero bytes, and B contains the number 1979. Characters are one byte wide:

| Variable name | A | B | Value | [null string] | 1979 | Hex value | 00 00 00 00 00 00 00 00 | 07 BB

Now, the program attempts to store the null-terminated string "excessive" in the A buffer. By failing to check the length of the string, ut overwrites the value of B:

| Variable name | A | B | Value | 'e' 'x' 'c' 'e' 's' 's' 'i' 'v' | 25856 | Hex | 65 78 63 65 73 73 69 76 | 65 00

Although the programmer did not intend to change B at all, B's value has now been replaced by a number formed from part of the character string. In this example, on a big-endian system that uses ASCII, "e" followed by a zero byte would become the number 25856. If B was the only other variable data item defined by the program, writing an even longer string that went past the end of B could cause an error such as a segmentation fault, terminating the process.

Buffer overflow exploits

Let's talk about it.

A buffer overflow problem is based in the memory where the program store its data.

Why is that

What buffer overflow do, is overwrite especific memory places where should be something you want that will make the program do something you want.

Let's follow a program and try to find and fix the buffer overflow:

Discovering and attacking buffer overflows

The thing you should know is that everyone knows how to use them. Just go to sites like security focus, Exploit DB or fyodor's exploit world or Injector, download it, run it and then get busted. But, why doesn't everybody write exploits and shell codes? Well, the problem is that many people doesn't know how to spot some vulnerability in the source code or even if they can, they are not able to write an exploit.

Let's take a look at the following code:

int main(int argc, char **argv) {
        char *somevar;
        char *important;
        somevar = (char *)malloc(sizeof(char)*4);
        important = (char *)malloc(sizeof(char)*14);
        strcpy(important, "command"); // This one is the important variable
        strcpy(important, argv[1]);
}

So, let's say that the variable "important" stores some system command like, let's say "_chmod o-r file" (for example) and since that file is owned by the root, the program is run under root user too, this means that if you can send commands to it, you can execute ANY system command (mkdir, ls -la, cd ...). You will play with the server like a doll, so you can start thinking "how the hell can I put something that I want in the important variable?". Well, the way is to overflow the memory so we can reach it. But let's see variables memory addresses.

To do that, you need to re-write the code. Check the following one:

int main(int argv, char **argv) {
        char *somevar;
        char *important;
        somevar = (char *) malloc(sizeof(char) * 4);
        important = (char *) malloc(sizeof(char) * 14);
        printf("%p\n%p\n", somevar, important);
        exit(0);
}

Well, I just added 2 new lines in the source code and left the rest unchanged. Let's see what does these two lines do:

  • The prinf("%p\n%p\n", somevar, important) line will print the memory addresses for somevar and important variables.

  • The exit(0) will keep the rest of the program running, after all, you don't want it for nothing, your goal was to know where the variables are stored.

After running the program , you would get an output like the following: (You will probably not get the same memory addresses)

0x5556d165b2a0 <---- This is the address of somevar
0x5556d165b2c0 <---- This is the address of important

As we can see, the important variable is next to somevar, this will let us use our buffer overflow skills, since somevar is got from from "argv[1]". Now, we know that one follow the other, but let's check each memory address so we can have the precise notion of the data storage. To do this, let's rewrite the code again:

int main(int argc, char **argv) {
        char *somevar;
        char *important;
        char *temp; /* We'll need another variable */

        somevar = (char *) malloc(sizeof(char) * 4);
        important = (char *) malloc(sizeof(char) * 14);
        strcpy(important, "command");
        strcpy(somevar, argv[1]); /* This one is the important variable*/
        printf("%p\n%p\n", somevar, important);
        printf("Starting to print memory addresses:\n");

        temp = somevar; // This will put temp at the first memory addres we want

        while (temp < important + 14) {
                /**
                 * This loop will be broken when we get to the last memory
                 * address we want, last memory address of important variable
                 */
                printf("%p: %c (0x%x)\n", temp, *temp, *(unsigned int *)temp);
                temp++;
        }

        exit(0);
}

Now let's say that the argv[1] should be in normal use send. So you just type in your prompt:

gcc overflow.c -o overflow
./overflow send

You'll get an output like:

0x55c8cf4c82a0
0x55c8cf4c82c0
Starting to print memory addresses:
0x55c8cf4c82a0: c (0x6d6d6f63)
0x55c8cf4c82a1: o (0x616d6d6f)
0x55c8cf4c82a2: m (0x6e616d6d)
0x55c8cf4c82a3: m (0x646e616d)
0x55c8cf4c82a4: a (0x646e61)
0x55c8cf4c82a5: n (0x646e)
0x55c8cf4c82a6: d (0x64)
0x55c8cf4c82a7:  (0x0)
0x55c8cf4c82a8:  (0x0)
0x55c8cf4c82a9:  (0x0)
0x55c8cf4c82aa:  (0x0)
0x55c8cf4c82ab:  (0x0)
0x55c8cf4c82ac:  (0x0)
0x55c8cf4c82ad:  (0x0)
0x55c8cf4c82ae:  (0x0)
0x55c8cf4c82af:  (0x0)
0x55c8cf4c82b0:  (0x0)
0x55c8cf4c82b1:  (0x0)
0x55c8cf4c82b2:  (0x0)
0x55c8cf4c82b3:  (0x0)
0x55c8cf4c82b4:  (0x0)
0x55c8cf4c82b5:  (0x21000000)
0x55c8cf4c82b6:  (0x210000)
0x55c8cf4c82b7:  (0x2100)
0x55c8cf4c82b8: ! (0x21)
0x55c8cf4c82b9:  (0x0)
0x55c8cf4c82ba:  (0x0)
0x55c8cf4c82bb:  (0x0)
0x55c8cf4c82bc:  (0x0)
0x55c8cf4c82bd:  (0x73000000)
0x55c8cf4c82be:  (0x65730000)
0x55c8cf4c82bf:  (0x6e657300)
0x55c8cf4c82c0: s (0x646e6573) <-- This line represents a memory address
0x55c8cf4c82c1: e (0x646e65) <-- This line represents a memory address
0x55c8cf4c82c2: n (0x646e) <-- This line represents a memory address
0x55c8cf4c82c3: d (0x64) <-- This line represents a memory address
0x55c8cf4c82c4:  (0x0)
0x55c8cf4c82c5:  (0x0)
0x55c8cf4c82c6:  (0x0)
0x55c8cf4c82c7:  (0x0)
0x55c8cf4c82c8:  (0x0)
0x55c8cf4c82c9:  (0x0)
0x55c8cf4c82ca:  (0x0)
0x55c8cf4c82cb:  (0x0)
0x55c8cf4c82cc:  (0x0)
0x55c8cf4c82cd:  (0x0)

Nice, isn't it? You can now see that there exist 27 memory addresses empty between somevar and important. So, let's say that you run the program with a command line like:

./overflow send---------------------------newcommand

You'll get an output like:

0x563d882382a0
0x563d882382c0
Starting to print memory addresses:
0x563d882382a0: s (0x646e6573) <-- important variable
0x563d882382a1: e (0x2d646e65) <-- important variable
0x563d882382a2: n (0x2d2d646e) <-- important variable
0x563d882382a3: d (0x2d2d2d64) <-- important variable
0x563d882382a4: - (0x2d2d2d2d) <-- important variable
0x563d882382a5: - (0x2d2d2d2d) <-- important variable
0x563d882382a6: - (0x2d2d2d2d) <-- important variable
0x563d882382a7: - (0x2d2d2d2d) <-- important variable
0x563d882382a8: - (0x2d2d2d2d) <-- important variable
0x563d882382a9: - (0x2d2d2d2d) <-- important variable
0x563d882382aa: - (0x2d2d2d2d) <-- important variable
0x563d882382ab: - (0x2d2d2d2d) <-- important variable
0x563d882382ac: - (0x2d2d2d2d) <-- important variable
0x563d882382ad: - (0x2d2d2d2d) <-- important variable
0x563d882382ae: - (0x2d2d2d2d) <-- important variable
0x563d882382af: - (0x2d2d2d2d) <-- important variable
0x563d882382b0: - (0x2d2d2d2d) <-- important variable
0x563d882382b1: - (0x2d2d2d2d) <-- important variable
0x563d882382b2: - (0x2d2d2d2d) <-- important variable
0x563d882382b3: - (0x2d2d2d2d) <-- important variable
0x563d882382b4: - (0x2d2d2d2d) <-- important variable
0x563d882382b5: - (0x2d2d2d2d) <-- important variable
0x563d882382b6: - (0x2d2d2d2d) <-- important variable
0x563d882382b7: - (0x2d2d2d2d) <-- important variable
0x563d882382b8: - (0x2d2d2d2d) <-- important variable
0x563d882382b9: - (0x2d2d2d2d) <-- important variable
0x563d882382ba: - (0x2d2d2d2d) <-- important variable
0x563d882382bb: - (0x2d2d2d2d) <-- important variable
0x563d882382bc: - (0x6e2d2d2d) <-- important variable
0x563d882382bd: - (0x656e2d2d) <-- important variable
0x563d882382be: - (0x77656e2d) <-- important variable
0x563d882382bf: n (0x6377656e) <-- important variable
0x563d882382c0: e (0x6f637765) <-- important variable
0x563d882382c1: w (0x6d6f6377) <-- important variable
0x563d882382c2: c (0x6d6d6f63) <-- important variable
0x563d882382c3: o (0x616d6d6f) <-- important variable
0x563d882382c4: m (0x6e616d6d) <-- important variable
0x563d882382c5: m (0x646e616d) <-- important variable
0x563d882382c6: a (0x646e61) <-- important variable
0x563d882382c7: n (0x646e) <-- important variable
0x563d882382c8: d (0x64) <-- important variable
0x563d882382c9:  (0x0)
0x563d882382ca:  (0x0)
0x563d882382cb:  (0x0)
0x563d882382cc:  (0x0)
0x563d882382cd:  (0x0)

New command got over command. Now it does something you want, instead of something it was supposed to do.

NOTE: Remember, sometimes those spaces between somevar and important can have other variables instead of being empty, so check their values and send them to the same address or the program can crash before getting to the variable that you modified.

Now let's think a little.

Why does this happen?

As you can see in the source code, somevar is declared before important, this will make, most of times, that somevar will be first in memory. Now, let's check how each one is got.

Somevar gets its value from argv[1] and important gets it from the strcpy() function, but the real problem is that important value is assigned first, so when you assign the value to somevar, that is before "important" can be overwritten. This program could be patched against this buffer overflow, switching those two lines, becoming:

strcpy(somevar, argv[1]);
strcpy(important, "command");

If this was the way that the program was done, even if you give an argument that would get into the memory address of important, it will be overwritten by the true command, since after getting somevar is assigned the value command to important.

This kind of buffer overflow, is a heap buffer overflow. Like you probably has seen, they are really easy todo, in theory, but in the real world, it's not easy to do them, after all, the example I gave was a really dumb program, right? It's a real pain to find those important variables and also to overflow that variable, you need to be able to write the one that is in a lower memory address.

The Buffer Overflow is like a sea, if you are really interested and you want to learn everything about, you can check the entry on Wikipedia.