Tags: cache serialization

Rating: 0

**Description**

> Ray said that the challenge "Leaf-Similar Trees" from last LeetCode Weekly was really same-fringe problem and wrote it in the form of coroutine which he learned from a Stanford friend. Can you decrypt the cache file dumped from a language server without reading the source code? The flag is not in the form of rwctf{} because special characters cannot be used.

**Files provided**

- [ccls-fringe.tar.xz](https://s3-us-west-1.amazonaws.com/realworldctf/ccls-fringe.tar.xz)

**Solution**

First let's examine the contents of the archive:

$tar xzfv ccls-fringe.tar.xz x .ccls-cache/ x .ccls-cache/@home@flag@/ x .ccls-cache/@home@flag@/fringe.cc.blob$ xxd .ccls-cache/\@home\@flag\@/fringe.cc.blob | head
0000000: 2202 ff00 5832 c065 8487 2a04 002f 686f "...X2.e..*../ho
0000010: 6d65 2f66 6c61 672f 6672 696e 6765 2e63 me/flag/fringe.c
0000020: 6300 0225 636c 616e 6700 2f68 6f6d 652f c..%clang./home/
0000030: 666c 6167 2f66 7269 6e67 652e 6363 00a9 flag/fringe.cc..
0000040: 2f75 7372 2f69 6e63 6c75 6465 2f63 2b2b /usr/include/c++
0000050: 2f38 2e31 2e31 2f65 7874 2f61 746f 6d69 /8.1.1/ext/atomi
0000060: 6369 7479 2e68 00ff 007c 5bae 8205 682a city.h...|[...h*
0000070: 2f75 7372 2f69 6e63 6c75 6465 2f61 736d /usr/include/asm
0000080: 2d67 656e 6572 6963 2f65 7272 6e6f 2e68 -generic/errno.h
0000090: 00ff 0008 b092 b81c 472a 2f75 7372 2f69 ........G*/usr/i

([full fringe.cc.blob file](https://github.com/Aurel300/empirectf/blob/master/writeups/2018-07-28-Real-World-CTF-Quals/scripts/fringe.cc.blob))

The .blob file contains a lot of C++ library names, and even some fragments of code. But clearly it is not a source code file, nor an executable. With some googling ([ccls-cache format](https://github.com/MaskRay/ccls/wiki/Initialization-options)) we can easily find what this cache system is – it is a file created by [ccls](https://github.com/MaskRay/ccls/). The documentation mentions there are two serialisation formats, JSON and binary, but it doesn't really go into the specifics of the binary format. However, after some skimming through the repository, we can find the [key files](https://github.com/MaskRay/ccls/tree/master/src/serializers) for the actual serialisation formats.

These essentially only specify how to encode various primitives and standard library types, but the meat of the process, i.e. destructuring the classes used internally ba ccls is done [here](https://github.com/MaskRay/ccls/blob/master/src/serializer.cc). In particular:

std::string Serialize(SerializeFormat format, IndexFile& file) {
switch (format) {
case SerializeFormat::Binary: {
BinaryWriter writer;
int major = IndexFile::kMajorVersion;
int minor = IndexFile::kMinorVersion;
Reflect(writer, major);
Reflect(writer, minor);
Reflect(writer, file);
return writer.Take();
}
// ...
}

And the actual IndexFile:

// IndexFile
bool ReflectMemberStart(Writer& visitor, IndexFile& value) {
visitor.StartObject();
return true;
}
template <typename tvisitor="">
void Reflect(TVisitor& visitor, IndexFile& value) {
REFLECT_MEMBER_START();
if (!gTestOutputMode) {
REFLECT_MEMBER(last_write_time);
REFLECT_MEMBER(language);
REFLECT_MEMBER(lid2path);
REFLECT_MEMBER(import_file);
REFLECT_MEMBER(args);
REFLECT_MEMBER(dependencies);
}
REFLECT_MEMBER(includes);
REFLECT_MEMBER(skipped_ranges);
REFLECT_MEMBER(usr2func);
REFLECT_MEMBER(usr2type);
REFLECT_MEMBER(usr2var);
REFLECT_MEMBER_END();
}

With this, we can start writing a deserialiser. It might have been faster to just clone the repo and see if it could be used to convert from the binary format into the JSON format, but I was worried the build would be problematic, since ccls depends on LLVM.

Some more relevant source code files:

- https://github.com/cquery-project/cquery/blob/master/src/lsp.h
- https://github.com/cquery-project/cquery/blob/master/src/symbol.h

So with the data deserialised, I had all the information known to the caching system, except the original source code, of course. The data includes the C++ includes, classes, functions, and variables defined in the file. One thing I noticed while writing the deserialiser is that there is a "comments" field in all defined members (classes, functions, variables).

One of these comments fields says flag is here (though this can clearly be seen in the file with a hex editor as well). With the deserialised data, we can tell which member this comment is attached to. Interestingly, it was a field called int b – clearly its 32-bit value cannot contain the actual flag, so what could this mean?

Another useful piece of information in the data is spell, presumably the place where the name of each member is initially given (i.e. declaration). spell includes a range, i.e. the line-column positions delimiting the beginning and ending of the member name.

At this point I was thinking my best bet would be to reconstruct as much of the original source code as possible from the positional data, then deduce control structures from the article mentioned in the challenge description and hope that the code somehow produces the flag.

Well, in the process of doing this, I got a file that looked like this:



TreeNode
val
left
right

Co
c
stack
ret
l
e
s
s

yield x w
o
d

dfs c x w
h
o
i
s

Solution

leafSimilar root1 root2 i
c n
c2 c1 h
k

insert x y

main
xs
ys
zs
tx ty tz
x
y
z
s


([full deserialiser script](https://github.com/Aurel300/empirectf/blob/master/writeups/2018-07-28-Real-World-CTF-Quals/scripts/Fringe.hx))

Most of it seems normal enough, but some variables in the rightmost columns spell out bless wod whois inhk. Clearly this wasn't a coincidence so I checked to see if this was the flag ... and sure enough, it was!

blesswodwhoisinhk