Tags: boost bruteforce rev pe c++ 

Rating:

```
$ file bass_boosted.exe
bass_boosted.exe: PE32 executable (console) Intel 80386, for MS Windows
```

The binary is not obfuscated in any way, but does contain a few anti-debug functions. We can just nop them out.

While the whole activity is happening in the `main` function, it is not clear what is going on because of heavy use of C++ templates — we get a ton of functions nested deeply. We can find a string that mentions Boost library, which explains a bit why the code looks like a garbage.

It is pretty hard to explain the exact process of reverse-engineering here: we just click our way through the endless stream of functions trying to find something that will catch out sight. In this case, such helpful things were the constants: for example, in one case we see a number 16807, and googling "boost 16807" reveals `typedef random::linear_congruential<int32_t, 16807, 0, 2147483647, 1043618065> minstd_rand0;`. That means, we found a linear congruental generator. This generator was used deeply in functions that also mentioned number 607 — again, searching for "boost minstd_rand0 607" gets us `typedef lagged_fibonacci_01_engine< double, 48, 607, 273 > lagged_fibonacci607;`. The other algoritms found were mt19937 and mt11213b — variants of Mersenne twister RNG.

Going back to `main`, what happens there? The binary asks for a key, splits it to characters, filters them (only characters `Aro{}0123456789abcdef` are permitted), reverses the array and uses some mapping to change those values. After that, the mentioned `lagged_fibonacci607` is used in following way: it is initialized with some seed computed from our input, and each value in array is XORed with `int(lagged_fibonacci607::next() * 100) % 100`. We know the encrypted flag, it is stored in the binary, so we need to reverse this process somehow. Everything before `lagged_fibonacci607` part is easily reversible, and there are two ways to deal with seed computing: either understand how it works and feed some constraints to z3, or just brute-force it, since the seed is only 32 bits long (more than that, `lagged_fibonacci607` actually uses `seed & 0x7FFFFFFF`, so only 31 bits are used). I decided to go with the second approach, while the bruteforcer is not really fast, we can easily parallelize it. The answer can be computed in several hours using one core (the exact seed is 0x132744ab). Here is my code (note that you will need to reverse the string afterwards):
``` cpp
#include <iostream>
#include <thread>
#include <mutex>
#include <boost/random/lagged_fibonacci.hpp>

std::mutex io_mutex;

void worker(size_t n) {
uint8_t valid[] = {
0x11, 0xc9, 0xe9, 0x0e, 0xd3, 0x43, 0xb8, 0xd9,
0x6d, 0x33, 0xdd, 0xa3, 0xeb, 0x59, 0xbd, 0xb2,
0x00, 0xc2, 0x29, 0x18, 0x50, 0x00, 0xfe, 0x1b,
0x60, 0x4d, 0x0f, 0x76, 0x15, 0x0c, 0xe0, 0xd2,
0xe0, 0x9b, 0x1f, 0xae, 0x3b, 0xf4, 0xa0, 0xd6,
0xea, 0x71, 0xc2, 0xae, 0x20, 0xe0, 0xcd, 0x31,
0xf2, 0x23, 0x33, 0x77, 0xa5, 0xcb, 0x2f, 0xc3,
0x2d, 0x76, 0xd8, 0xb6, 0x56, 0x8e, 0x32, 0xb2,
0x2c, 0x6a, 0x76, 0x24, 0xa5, 0x25 };

constexpr auto sz = sizeof(valid);
uint8_t res[sz + 1] = { 0 };

uint8_t dict[256] = { 0 };
dict[0x7A] = 'A';
dict[0x6B] = 'r';
dict[0x73] = 'o';
dict[0x58] = '{';
dict[0x4C] = '}';
dict[0x63] = '0';
dict[0x49] = '1';
dict[0x1E] = '2';
dict[0xE7] = '3';
dict[0xBA] = '4';
dict[0x97] = '5';
dict[0x15] = '6';
dict[0x9D] = '7';
dict[0xE8] = '8';
dict[0xE2] = '9';
dict[0x33] = 'a';
dict[0xD2] = 'b';
dict[0x29] = 'c';
dict[0x74] = 'd';
dict[0x83] = 'e';
dict[0x04] = 'f';

for (size_t i = n << 27; i < (n + 1) << 27; ++i) {
if ((i & 0xFFFFF) == 0) {
std::lock_guard<std::mutex> lock(io_mutex);
std::cout << '\r' << std::hex << i;
std::cout.flush();
}
boost::random::lagged_fibonacci607 kek(i);
bool ok = true;
for (int j = 0; j < sz && ok; ++j) {
res[j] = dict[valid[j] ^ (int(kek() * 100) % 100)];
if (res[j] == 0) {
ok = false;
}
}
if (ok) {
std::lock_guard<std::mutex> lock(io_mutex);
std::cout << std::endl << std::hex << i << ' ' << (char*)res << std::endl;
}
}
}

int main() {
std::thread threads[16];
for (size_t i = 0; i < 16; ++i) {
threads[i] = std::thread(worker, i);
}
for (size_t i = 0; i < 16; ++i) {
threads[i].join();
}
return 0;
}
```