Tags: android reversing binary 

Rating:

# Tools
- [Ghidra](https://ghidra-sre.org) for reverse engineering
- [Ghidra bridge](https://github.com/justfoxing/ghidra_bridge) to run our script from outside ghidra
- `gdb-multiarch`, `gcc-aarch64-linux-gnu`, `qemu-user-aarch64` Debian packages to use and debug the native library

# Getting the code

An apk file is actually just a zip file. To get the contents just unzip it. The code is in a file called `classes.dex`, which is the binary format used for the android jvm runtime.

So let's load this file into ghidra and get started.

# Search for the flag

We know the format of the flag. Let's see if we can find that string in memory.

Indeed, we've got a match! We jumped to the memory location and searched for references to this string in the code.

It's used in the `onClick` Handler of the Deepest class. We can see that the flag is assembled in here.

```java
void onClick(Deepest this,View p1)

{
boolean bVar1;
View ref;
CharSequence ref_00;
String input;
byte[] pbVar2;
Context pCVar3;
Toast ref_01;
StringBuilder stringbuilder;

ref = this.findViewById(0x7f0a0164);
checkCast(ref,TextView);
ref_00 = ref.getText();
input = ref_00.toString();
bVar1 = this.isPassword(input);
if (bVar1 != false) {
pbVar2 = Base64.decode(input,0);
stringbuilder = new StringBuilder();
stringbuilder.append("CSR{");
input = new String(pbVar2,"UTF-8");
stringbuilder.append(input);
stringbuilder.append("}");
input = stringbuilder.toString();
pCVar3 = this.getApplicationContext();
ref_01 = Toast.makeText(pCVar3,input,0);
ref_01.show();
return;
}
pCVar3 = this.getApplicationContext();
ref_01 = Toast.makeText(pCVar3,"Nope",0);
ref_01.show();
return;
}
```

So the Flag is `CSR{pwd}`. Where pwd is accepted by `isPassword`. Unfortunately, isPassword is an external function.

So we should look for a native library. There's `assets/deeper/encrypted.dex` and `assets/deepest/encrypted`, we'll have to decrypt them somehow.

Where do we get the password from? When we look at the `MainActivity` class, we'll see that the app starts with a WebView displaying `assets/andsoitbegins.html`.

```java
void onCreate(MainActivity this,Bundle p1)

{
View ref;
WebSettings ref_00;
WebChromeClient ref_01;
e ref_02;

super.onCreate(p1);
this.setContentView(0x7f0d001e);
ref = this.findViewById(0x7f0a0193);
if (ref != null) {
checkCast(ref,WebView);
WebView.setWebContentsDebuggingEnabled(true);
ref_01 = new WebChromeClient();
ref.setWebChromeClient(ref_01);
ref_00 = ref.getSettings();
ref_00.setJavaScriptEnabled(true);
ref.loadUrl("file:///android_asset/andsoitbegins.html");
ref.addJavascriptInterface(this,"x500");
return;
}
ref_02 = new e("null cannot be cast to non-null typeandroid.webkit.WebView");
throwException(ref_02);
return;
}
```

We also see the `weMustGoDeeper` Method, which seems to be the entry point for retrieving the flag, as it sets the `pwd` parameter to its argument, but it opens the `Deeper` class instead of `Deepest`, so there's another layer in between. We'll get to that layer later.

A naive `grep -R assets -e weMustGoDeeper` didn't work to find the javascript source, but a `grep -R assets -e x500` did (That's the Javascript object that is exposed by the MainActivity). We find this in `assets/vendor/bootstrap/bootstrap.js`:

```javascript
$("#theform").on("submit",function(){var t=$("#txtUser"),e=$("#txtPass"),n=atob("d2VNdXN0R29EZWVwZXI=");return t.val()==atob("TGVvbmFyZG8=")&&e.val()==atob("VGhlQWN0b3JOb3RUaGVOaW5qYVR1cnRsZQ==")?window.x500[n](!0,t.val()+e.val()):window.x500[n](!1,""),!1});
```

It looks like there are some things base64 encoded:

`d2VNdXN0R29EZWVwZXI=` = weMustGoDeeper
`TGVvbmFyZG8=` = Leonard
`VGhlQWN0b3JOb3RUaGVOaW5qYVR1cnRsZQ==` = TheActorNotTheNinjaTurtle

This is the prettified function:

```javascript
$("#theform").on("submit", function() {
var t = $("#txtUser"),
e = $("#txtPass"),
n = "weMustGoDeeper";
return t.val() == "Leonardo" && e.val() == "TheActorNotTheNinjaTurtle" ? window.x500[n](!0, t.val() + e.val()) : window.x500[n](!1, ""), !1
```

It's calling `weMustGoDeeper` with `pwd` as `LeonardoTheActorNotTheNinjaTurtle`. Let's see where this password takes us.

# Decrypting the files

Let's resume where we left in our code. We've been seeing that `pwd` is being passed to the `Deeper` class. If we look at the onCreate method, we'll see that it calls `d.n` with an InputStream reading from `assets/deeper/encrypted.dex` and an OutputStream to a temporary location.

We see a lot of crypto code in here:

```java
int n(char[] data,InputStream in,OutputStream out)

{
boolean bVar1;
int iVar2;
SecretKeyFactory ref;
SecretKey ref_00;
byte[] pbVar3;
Cipher cipher;
int iVar4;
byte[] pbVar5;
AlgorithmParameterSpec ref_01;
SecretKey ref_02;
KeySpec key;
String pSVar6;
a ref_03;
c ref_04;
Key pKVar7;
b ref_05;

iVar2 = in.read();
iVar2 = iVar2 * 8;
if (((iVar2 != 0x80) && (iVar2 != 0xc0)) && (iVar2 != 0x100)) {
ref_03 = new(a);
iVar2 = ref_03.a();
throwException(ref_03);
return iVar2;
}
pbVar5 = new byte[0x10];
in.read(pbVar5);
ref = SecretKeyFactory.getInstance("PBKDF2WithHmacSHA1");
key = new KeySpec(data,pbVar5,0x8000,iVar2 + 0x40);
ref_00 = ref.generateSecret(key);
pbVar5 = ref_00.getEncoded();
pbVar3 = Arrays.copyOfRange(pbVar5,0,8);
pSVar6 = "AES";
ref_00 = new SecretKey(pbVar3,pSVar6);
pbVar5 = Arrays.copyOfRange(pbVar5,8,pbVar5.length);
ref_02 = new SecretKey(pbVar5,pSVar6);
ref_04 = new c(ref_02,ref_00);
pbVar3 = new byte[8];
in.read(pbVar3);
ref_00 = ref_04.b;
pbVar5 = ref_00.getEncoded();
bVar1 = Arrays.equals(pbVar5,pbVar3);
if (bVar1 == false) {
ref_05 = new(b);
iVar2 = ref_05.b();
throwException(ref_05);
return iVar2;
}
pbVar5 = new byte[0x10];
in.read(pbVar5);
cipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
pKVar7 = ref_04.a;
ref_01 = new AlgorithmParameterSpec(pbVar5);
cipher.init(2,pKVar7,ref_01);
pbVar5 = new byte[0x400];
while (iVar4 = in.read(pbVar5), 0 < iVar4) {
pbVar3 = cipher.update(pbVar5,0,iVar4);
if (pbVar3 != null) {
out.write(pbVar3);
}
}
pbVar5 = cipher.doFinal();
if (pbVar5 != null) {
out.write(pbVar5);
}
return iVar2;
}
```

We've got to reimplement this in python:

```python
from hashlib import pbkdf2_hmac
from Crypto.Cipher import AES
import binascii

f = open('assets/deeper/encrypted.dex', 'rb').read()

l = f[0] * 8 + 0x40

pwd = b"LeonardoTheActorNotTheNinjaTurtle"
salt = f[1:0x11]

key = pbkdf2_hmac(
hash_name = 'sha1',
password = pwd,
salt = salt,
iterations = 0x8000,
dklen = l // 8
)

iv = f[0x19:0x29]

print(len(key))
print("KEY: ", binascii.hexlify(key))
print("IV: ", binascii.hexlify(iv))

assert (key[:8] == f[0x11:0x19])

cipher = AES.new(key[8:], AES.MODE_CBC, IV=iv)

out = open('assets/deeper/decrypted.dex', 'wb')

f = f[0x29:]

out.write(cipher.decrypt(f))

out.close()
```

# Reversing decrypted.dex
Load that file into Ghidra and you'll find a `isPassword` Method.

```java
boolean isPassword(String str)

{
boolean bVar1;
int len;
int last_value;
String slice;
Iterator ref;
Integer pIVar2;
Object ref_00;
BigInteger i;
ArrayList factors;
int iteration;
BigInteger target;
Vault vault;
boolean ret;

ret = false;
vault = new Vault();
factors = new ArrayList();
i = BigInteger.ONE;
iteration = 0;
last_value = 0;
while( true ) {
len = str.length();
if (len + -2 <= iteration) {
target = new BigInteger("591320437755618967257798351702211492893040132184058172662568576456153139621");
bVar1 = target.equals(i);
if ((bVar1 != false) && (last_value = str.length(), last_value== 0x17)) {
ret = true;
}
return ret;
}
slice = str.substring(iteration,iteration + 3);
len = vault.getValue(slice);
if (len <= last_value) break;
ref = factors.iterator();
while (bVar1 = ref.hasNext(), bVar1 != false) {
ref_00 = ref.next();
checkCast(ref_00,Integer);
last_value = ref_00.intValue();
bVar1 = TheDream.FUN__0050033cd0(len,last_value);
if (bVar1 == false) {
return false;
}
}
pIVar2 = Integer.valueOf(len);
factors.add(pIVar2);
target = BigInteger.valueOf((long)len & -0x100000000 |ZEXT48(len));
i = i.multiply(target);
iteration = iteration + 1;
last_value = len;
}
return false;
}
```

This one's a bit difficult to grasp at first, but what it does is get all substrings of length 3 and multiplies the result of `vault.getValue(substring)` together and this should result in `591320437755618967257798351702211492893040132184058172662568576456153139621`.

So what does `vault.getValue` do?

```java

int getValue(Vault this,String s)

{
boolean bVar1;
Class ref;
Method ref_00;
Object ref_01;
int iVar2;
Object[] ppOVar3;
Class[] ppCVar4;

bVar1 = s.isEmpty();
if (bVar1 == false) {
ref = this.getClass();
ppCVar4 = new Class[0];
ref_00 = ref.getDeclaredMethod(s,ppCVar4);
ppOVar3 = new Object[0];
ref_01 = ref_00.invoke(this,ppOVar3);
checkCast(ref_01,Integer);
iVar2 = ref_01.intValue();
}
else {
iVar2 = 1;
}
return iVar2;
}
```

It calls the method with the name of the substring and returns that result.

There are a lot of 3 character functions in the binary. Extracting this information by hand would take hours. We'll have to write a script that does so. Using GhidraBridge to connect to our Ghidra instance makes this a pleasant experience.

We can factorize the number and see if we can create the functions return value using the primes.

There's a command line utility that comes in handy to determine the primes `factor`, which is part of coreutils.

```python
from ghidra_bridge import *

b = ghidra_bridge.GhidraBridge(namespace=globals())

primes = [11, 739, 857, 1013, 1879, 1907, 2017, 2503, 3793, 5869, 6043, 6619, 8101, 9857, 11593, 11933, 12241, 12781, 13963, 14033, 15361]

target = 591320437755618967257798351702211492893040132184058172662568576456153139621

addr = []

b.remote_exec("import string,itertools")

for val in b.remote_eval("[ [i.getName(), int(str(getInstructionAt(i.entryPoint).getOpObjects(1)[0]), 16)] for i in currentProgram.getFunctionManager().getFunctions(True) if len(i.getName()) == 3]"):
for x in primes:
if val[1] % x == 0:
val[1] /= x

if val[1] == 1:
addr.append(val[0])

s = ""

x = [""] * len(addr)

print(addr)
```

We're using remote_eval here, because the bridge is quite slow and looping over many functions is just so slow.

The nice thing is we can just decode the first instruction as this loads the return value into a register and they all look the same.

If we run it, we get these functions names:

```python
['did', 'dth', 'ebj', 'ems', 'eto', 'het', 'idt', 'ing', 'inn', 'mhf', 'mst', 'nax', 'nin', 'nni', 'ops', 'ote', 'pin', 'psp', 'spi', 'sto', 'tem', 'the', 'top', 'tot', 'ujo']
```

We'll have to reassemble them to get our password. We know it's `0x17` in length and they are overlapping by two characters. So we need only 21 of these functions and 4 are not needed.

If we try to combine them we'll find the password is `didthetotemstopspinning`. In general, the combinations can be found as Euler paths in a graph containing the 2 letter overlaps as nodes and the 3 letter names as directed edges. But here guessing that `'the'` is part of an english sentence and knowing the movie was quicker. We can use this to decrypt our last file using the same algorithm we used for encrypted.dex.

# Reversing decrypted

This time it's a native library. One thing to note is that the Java native interface prefixes functions with "Java_" and the package. So instead of looking for a `isPassword` we need to look for `Java_be_dauntless_inception_Deepest_isPassword`.

In this case it's a thin wrapper around the `ip` function.

```c
ulonglong ip(char *param_1)

{
longlong lVar1;
size_t len;
int extracted;
longlong lVar2;
int i;
int local_318;
int index_bits;
byte local_301;
int bits [160];
uint32_t sizes [12];

sizes._40_8_ = 0x60000000d;
sizes._32_8_ = 0x1b0000000c;
sizes._24_8_ = 0x70000000f;
sizes._16_8_ = 0xa00000004;
sizes._8_8_ = 0x1f00000018;
sizes._0_8_ = 0x700000004;
index_bits = 0;
local_318 = 0;
len = strlen(param_1);
if (len == 0x14) {
s2ba(param_1,bits);
i = 0;
while (i < 0xc) {
extracted = b2i(bits,index_bits,sizes[i]);
extracted = rm(i + 1,extracted);
if (extracted == 0) {
local_318 = local_318 + 1;
}
index_bits = index_bits + sizes[i];
i = i + 1;
}
if (local_318 +
((local_318 / 0x11 + (local_318 >> 0x1f)) -
(int)((longlong)local_318 * 0x78787879 >> 0x3f)) * -0x11 == 0)
{
local_301 = 1;
}
else {
local_301 = 0;
}
}
else {
local_301 = 0;
}
return (ulonglong)local_301;
}
```

The `s2ba` (string to bit array maybe?) extracts each bit as either a `1` or `0` int into the given array on the stack. The `b2i` rebuilds an int from a specific amount of bits. The size array holds the number of bits to extract each time.

Therefore the input is split into: [4 bits, 7 bits, 24 bits, 31 bits, 4 bits, 10 bits, 15 bits, 7 bits, 12 bits, 27 bits, 13 bits, 6 bits]

The extracted bits are then validated in the `rm` function. This function was basically a giant switch-case over to call the right validation function for the given index.

The validation functions were quite complicated, so we wanted to brute force them. ARM code can be run on other platforms with `qemu-user`. Problem: We haven't got an `aarch64` android setup and we needed to improvise. The Debian repository has an arm cross compilation toolchain in the package `gcc-aarch64-linux-gnu`, which is much easier to install than the android-ndk.

We tried to link the native library into our brute force code, but faced a lot of troubles, because we haven't got the android libraries it needed.

So we just built our own "dynamic" loader for `libdeepest.so`...

# Implementing dynamic linkers, the definitive guide

The core of loading a library is loading the sections (e.g. code, data) from the ELF-file, mapping them into memory with the correct access mode, and adding base addresses to the GOT (global offset table). In our case, `objdump -x libdeepest.so` told us
```
[...]
Sections:
Idx Name Size VMA LMA File off Algn
[...]
9 .text 00000e0c 0000000000000d90 0000000000000d90 00000d90 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
[...]
10 .rodata 00000080 0000000000001b9c 0000000000001b9c 00001b9c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
[...]
17 .got 000000f8 0000000000002f08 0000000000002f08 00002f08 2**3
CONTENTS, ALLOC, LOAD, DATA
```

The `VMA` column contains addresses where the sections are expected at runtime or, for position independent code, an offset from an base address. Here it contains the same values as the `File off`-sets, so we can load the library in one piece. We want to run code from it, and we want to alter the GOT, so it is mapped readable, writable, and executable.

As we already had reversed the first validation function, `z`, which checks if it's argument is 8, we have a first quick test:

```c
#include <stdarg.h>
#include <sys/mman.h>
#include <sys/fcntl.h>
#include <sys/stat.h>
#include <string.h>
#include <assert.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <err.h>

int main(){
int fd = open("libdeepest.so", O_RDONLY);
if (fd == -1) err(EXIT_FAILURE, "open");

struct stat sb;
if (fstat(fd, &sb) == -1) err(EXIT_FAILURE, "stat");

char *lib = (char*) mmap(NULL, sb.st_size, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
if(lib == NULL) err(EXIT_FAILURE, "mmap");
printf("base address: %p\n", lib);

uint64_t (*z)(unsigned) = (void*)(lib + 0x1388);
printf("z(2) = %lx, z(8) = %lx\n", z(2), z(8));
}
```

We compiled with `aarch64-linux-gnu-gcc -Wall -o test test.c`.
Now `qemu-aarch64 ./test` outputs `z(2) = 0, z(8) = 1` - yay:)

We don't want to call all validation functions by hand, but use `rm(index, secret)`. So we have to set up the GOT:

```c
const uint64_t got_offset = 0x2f08;
const size_t got_size = 0xf8;
uint64_t got[] = { // values copied from ghidra
0x0,0x0,0x0,0x001054,0x0016cc,0x001388,0x003000,0x001844,0x003008,0x003010,
0x0013e4,0x001948,0x0016f0,0x001b14,0x0017c8,0x003018,0x001250,0x003020,
0x000e3c,0x001484,0x00177c,0x001120,0x00196c,0x003028,0x0017ec,0x003030,
0x0015e0,0x0011ac,0x000e7c,0x003038,0x2d28};

assert(sizeof(got) == got_size);

for(int i=0; i<got_size/8; i++){
got[i] += (uint64_t)lib;
}
memcpy(lib + got_offset, got, sizeof(got));
```

and now we can call `rm` in a loop:

```c
uint64_t (*rm)(int, unsigned) = (void*)(lib + 0x1250);

const int bitlengths[] = {4, 7, 24, 31, 4, 10, 15, 7, 12, 27, 13, 6}; // see above.
for(int index = 0; index < sizeof(bitlengths)/sizeof(*bitlengths); index++){
for(unsigned number=0; number < (1u<<bitlengths[index]); number++){
uint64_t res = rm(index+1, number);
if(res) printf("pass[%d] = %u\n", index+1, number);
}
}
```

This gives us `pass[1] = 8`, `pass[2] = 22`, and a segfault. To debug this, we use `gdb-multiarch` and attach to qemus gdb-server, which can started by passing `-singlestep -g 1234` to the `qemu-aarch64` command. We find a call to `__vsnprintf_chk` via the GOT, which was not resolved by us. A google search for `android libc __vsnprintf_chk` leads to it's signature and source, so we can supply it to the library:

```c
int my_vsnprintf_chk(char* dest, int flags,
size_t supplied_size, const char* format, va_list va) {
(void) flags;
return vsnprintf(dest, supplied_size, format, va);
}
```

And before the `memcpy` for the GOT, insert:

```c
const int VSNPRINTF_CHK_INDEX = 23;
assert(got[VSNPRINTF_CHK_INDEX] == (uint64_t)lib + 0x3028); // this was called by the binary
got[VSNPRINTF_CHK_INDEX] = (uint64_t)my_vsnprintf_chk;
```

putting all together, we get the output:

```
pass[1] = 8
pass[2] = 22
pass[3] = 13215336
pass[4] = 1772505672
pass[5] = 11
pass[6] = 329
pass[7] = 25196
pass[8] = 20
pass[9] = 1425
pass[10] = 110318673
pass[11] = 2746
pass[12] = 18
```

as the 3rd number took quite long and the 4th has more bit's, and it was quite late, we decided to let it run while going to sleep. But then the other results fell out very quick, so we we had to to postpone sleeping to get the flag.

# assembling the flag

We can copy the values into a python script and extend it to reverse the `s2ba` and `b2i` calls:

```python
lens = [4, 7, 24, 31, 4, 10, 15, 7, 12, 27, 13, 6]
answer= [8, 22, 13215336, 1772505672, 11, 329, 25196, 20, 1425,110318673, 2746, 18, ]

# the array as used by s2ba
a = bytearray(160)

#inverse of b2i
def i2b(i, l): return [(i>>j)&1 for j in range(l)][::-1]

done = 0
for l,ans in zip(lens, answer):
if done == 0:
a[-l:] = i2b(ans, l)
else:
a[-done-l:-done] = i2b(ans, l)
done += l

# invert s2ba
n=0
for x in a:
n *= 2
n += x

password = n.to_bytes(20, "big")
print(password)

# the password is base64 decoded to get the flag
import base64
print("CSR{" + base64.b64decode(password).decode() + "}")
```

The bit assembly yields `IWZMQEdJblRoM2RFM3Ah`, and by decoding it we find the flag `CSR{!fL@GInTh3dE3p!}`.

Now we finally had time for a short rest.