Address Space Layout Randomization (ASLR)

A history lesson

Before we describe what ASLR is or how it's used to prevent exploitation, we need to understand why it was created.

In 1996, a hacker by the name of Aleph One published an article in Phrack Magazine titled, "Smashing the Stack for Fun and Profit". In this article, Aleph One describes, in great detail, how buffer overflow vulnerabilities can be exploited to gain arbitrary code execution. These exploitation techniques utilized a stack buffer overflow to overwrite the saved return pointer of a function with a stack address. When the vulnerable function executes its final ret instruction, the process jumps into the stack and begins to execute whatever data is residing at the provided location in the stack - the data being the attacker's shellcode. [1]

The hacker, Solar Designer, took this a step further by introducting the ret2libc technique. ret2libc leverages a stack buffer overflow to gain control of a function's saved return address, however, instead of returning into the stack, the process returns into libc's system() symbol. The attacker crafts the stack frame so that system() interprets the string '/bin/sh' as its first argument, hijacking the process to call system('/bin/sh'). [2]

In order to thwart these stack buffer overflow exploitation techniques, the PaX Team released PaX, a patch for the Linux kernel that implements the W^X protection for memory pages and ASLR. [3] In a future section we'll cover the concept of W^X and the non-executable stack.

So what is ASLR?

"The goal of Address Space Layout Randomization is to introduce randomness into addresses used by a given task. This will make a class of exploit techniques fail with a quantifiable probability and also allow their detection since failed attempts will most likely crash the attacked task." - The PaX Team [4]

The exploit techniques of the past relied upon the memory image of the process being exploited to be the same for each exploit, with minor differences due to changes in the environment. Attackers could expect that the libc system() symbol and a '/bin/sh' string would be at static locations within memory for each exploit. ASLR randomizes the base addresses of the heap, shared objects, and the stack when these segments are mapped into process memory, attempting to introduce enough entropy to prevent an attacker from successfully brute forcing their locations. Without knowledge of the base address of libc within a target's memory, attackers conducting the ret2libc exploit technique have to guess the system() symbol address and the '/bin/sh' string's address.

ASLR in action

To see ASLR for yourself, first let's turn it off. This technique works on Ubuntu 18.04 - if it doesn't seem to work for you, try looking up how your operating system implements ASLR and how you can enable/disable it. To turn off ASLR globally for all processes, execute the following command:

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Now, lets see how this affects the mappings of our processes. Run strace on /bin/ls and notice the results of the mmap() calls being made to setup the process's memory mappings. Each call to mmap() ends up with the same address each time, right? Let's re-enable ASLR on your operating system:

echo 2 | sudo tee /proc/sys/kernel/randomize_va_space

Now, run strace on /bin/ls a couple times. You should see that the results of the mmap() calls are no longer the same each time /bin/ls is invoked. The memory map of our /bin/ls process is being randomized each time we execute it.

Breaking ASLR

Is it possible to brute force?

Well, it depends. In this paper researchers demonstrated that, with enough tries on the same memory mapping, attackers could brute force the delta_mmap value described in [4]. Symbols in libc have static offsets, we can find these by using objdump. If we know the usual base address of libc within process memory, we can compute the address of our target symbol and then account for the entropy introduced by ASLR. A condition that enabled the researchers to continuously brute force a target to determine its delta_mmap value is that the parent process executed fork() each time a connection was made. When a process fork()s, the child process's memory mapping will be identical to the parent's. The researchers could crash the child process as many times as necessary to discover the delta_mmap value. Once they discovered the delta_mmap value, the researchers computed the base address of libc within the target's memory, allowing them to compute the location of any symbol in the target's libc to build the final payload.

Other methods?

Another interesting thing to think about is the amount of entropy introduced by a system's implementation of ASLR. If the number of bits randomized within the virtual address space of a process is small enough, even if the process doesn't fork() like in the example above and the mapping is different on each invocation, we could still feasibly brute force the location of libc symbols in memory with enough guesses.

A common method of beating ASLR is through utilizing some sort of sensitive information leak, like a format string vulnerability. [6] If an attacker can leak information from the process to discover the location of a libc symbol in the process's memory, the attacker can calculate the base of libc within the process's memory, beating ASLR. [7]

Introduction to the Dark Arts