Tags: rev logic 

Rating: 5.0

# Smart Solver solved without SMT and Z3
Well the heading is intentional since I did not know about SMT and Z3 when I solved this problem during the CTF, and solved it logically. So read this if you are interested in a logical solution to the problem.

# Overview

The binary that we are provided uses a lot of instructions to compare each character of a string with all other characters of the string, if any of the test fails it just jumps to a location that exits. So we have to find that string which passes all tests.

# Observations
- The string length is `73`
- The operations compare the numeric value of characters with each other so it defines a lexographic order for a character with all other characters of the string.

# Idea

Given a string with 73 characters `s[73]`, Consider a `73 X 73` 2-d array `a[73][73]`. Define `a[i][j]` in the following manner -:
```
if(s[i] == s[j]):
a[i][j] = -1
elif(s[i] < s[j]):
a[i][j] = 0
else:
a[i][j] = 1
```

Now note that we can obtain the entire 2-d array `a` from the binary if we are able to parse the instructions in the disassembled code of the binary. This is because each character is compared with a number of other characters using either the `jbe` or `jae` instruction and if it is not compared with some character then it is safe to assume that both of them are equal, since no other possibility exists.

## Code to parse the binary and get the 2-d array

### Some debugging aids which can be safely ignored
The code includes some `try-except` and `assert` statements to verify that what we are doing is right. The variable `ct` is the error counter. The code throws one error at an instruction before the comparisons are even started so `ct` is not allowed to be greater than 1.

```
import re
import json
import sys
a=[[-1 for i in range(73)] for i in range(73)]

def dump_data():
# Objective a[i][j] = 0 if s[i] < s[j] , 1 if(s[i] > s[j]), -1 if (s[i] == s[j])

regex = '.*movzx e([ad])x,BYTE PTR \[rbp-(0x[0-9a-f]*)\].*' # regex to match src and dest
jmp_reg = '.*j([ab])e.*' # regex to match operator (less or greater)

src,dst=-1,-1
ct=0
with open("disassembly",'r') as f:
for line in f.readlines():
# print(line)
try:
reg, pos = re.match(regex, line).groups()
pos = int(pos, 16) - 0xd8

if(reg == 'd'):
assert src == -1
src = pos
print("Setting src to ",pos)
elif(reg == 'a'):
assert dst == -1
dst = pos
print("Setting dst to ", pos)


except Exception as e:

if isinstance(e, AssertionError):
ct += 1
print(src, dst)
print(line)
if(ct>1):
sys.exit(-1)
try:
oper = re.match(jmp_reg, line).groups()[0]
assert src != -1 and dst != -1 and a[src][dst] == -1
if oper == 'b':
print("oper is b")
a[src][dst] = 0
else:
print("Oper is a")
a[src][dst] = 1
src, dst = -1, -1
except Exception as e:
if isinstance(e,AssertionError):
ct += 1
print(src, dst)
print(line)
if(ct > 1):
sys.exit(-1)


open("out", 'w').write(json.dumps(a))
```

# Reversing the 2-d array to obtain flag
So now we have `a[73][73]`. How to obtain the string `s[73]` from this 2-d array. So the idea is this, we can first group indices that map to the same character i.e indexes `x` and `y` belong to the same group if `s[x][y] == -1 == s[y][x]`.

**Note:** We do not need to ensure the second condition since that is a problem constraint and it cannot fail otherwise the data would be wrong. So what I mean is that for data to be correct the following invariant needs to be satisfied
```
s[i][j] = ~s[j][i] if (s[i][j] != -1) else s[i][j] == -1 == s[j][i]
```

- Once we have grouped the characters we know that characters at indices in the same group are same.
- Also note that suppose for an index `i` we can compute `less_ct[i] = count({j: s[i][j] == 0})`, `gt_ct[i] = count({j: s[i][j] == 1})` and `eq_ct[i] = count({j: a[i][j] == 0})` . `less_ct[i]` is a count of the number of characters which are lexographically **larger** than `s[i]`.
- Now suppose character indices `i1, i2, i3` belong to a group `G` i.e they map to same character. So they will have the same values of these quantities since the character comparison will yield equivalent results. So the variables `less_ct, gt_ct` are indeed a characterstic of the group and not of a particular index.
- Now suppose a group is mapped to the character `'b'`. How will its `less_ct` compare to the group that maps to the character `'c'`? It is obvious that the number of characters that are lexographically larger than `'c'` are less than the number of characters that are lexographically larger than `'b'` since the latter set will also include the characters that are equal to `'c'`.
- So it is convincing enough that if we sort these groups on the basis of `less_ct` and start assigning them the characters from `'a'` we could complete the entire string.

**Note:** We have made an assumption that all characters from `'a'` to `'z'` are being used. And this assumption only holds true if the number of groups that we are able to form are >= 26. In our case we got 28 groups i.e. all 26 alpahabets and `'{'` and `'}'`

## Code to obtain string from 2-d array

**Note:** It seemed like that we need to reverse the flag after obtaining the entire string to have the format `flag{...}`
```
def rev():
data=json.loads(open('out','r').read())
ingroup = [False for i in range(73)]

groups=[]

for i in range(73):
if(ingroup[i]):
continue
ele = data[i]
eq_ct,less_ct,gt_ct=0,0,0
same_indexes = []
for j in range(73):
b=ele[j]
if b==-1:
eq_ct+=1
same_indexes.append(j)
ingroup[j] = True
elif (b==0):
less_ct+=1
else:
gt_ct+=1
groups.append({"eq_ct":eq_ct,"less_ct":less_ct,"gt_ct":gt_ct,"same_indexes":same_indexes})
groups.sort(key=lambda x: x['less_ct'])
flag=['' for i in range(73)]
for i in range(len(groups)):
c = chr(ord('a')+i) if(i!=len(groups)-1) else '}'
group = groups[i]
for index in group['same_indexes']:
flag[index] = c
flag="".join(flag)[::-1]
print(flag)
```

# Result
`flag{thequickbrownfoxjumpedoverthelazydogandlearnedhowtoautomateanalysis}`