Tags: c++ reversing graphs

Rating:

# Algorithm

**Category**: Reverse Engineering

**Points**: 994 (26 solves)

**Author**: hukc

## Challenge

**Description**: I made this well-known algorithm into a reversing problem.

**Attachments**: algorithm

## Overview

### Basic Info


$file algorithm algorithm: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=48a3c96e612fd1fed7ce748c3e25f2982b0a0473, for GNU/Linux 4.4.0, not stripped  The biggest thing to note is that the binary wasn't stripped, which made reversing easier. Running strings produced a bunch of lines like this:  _ZNSt8_Rb_treeIiSt4pairIKiSt6vectorIiSaIiEEESt10_Select1stIS5_ESt4lessIiESaIS5_EE12_M_drop_nodeEPSt13_Rb_tree_nodeIS5_E _ZNSt15_Deque_iteratorISt4pairIiiERS1_PS1_EC2Ev _ZSt14__copy_move_a1ILb0EPKiPiET1_T0_S4_S3_  These mangled function names, in addition to std:: in the demangled names, let me know that this is a C++ binary. While doing this write up, I also noticed a line that I missed during the CTF that would have been super helpful (you'll see why later): graphs.cpp ### First Execution I ran the binary for the first time: $ ./algorithm
Input a number:
4
Input 4 pairs of numbers:
1 4
2 4
3 4
4 4
Wrong!
$ So the program prompts for num_pairs, then num_pairs pairs of numbers (integers specifically). With that background info, I loaded the binary into Ghidra. **Note that I have also included in this repo the adjusted decompilation output with all my variable names from Ghidra for both the [main()](ghidra-main.c) and [compute()](ghidra-compute.c) functions**. Since the binary isn't stripped, main was already found. Ghidra helpfully demangled the function names to provide type info in the decompilation view, but with all the object-oriented generics the view became very cluttered. Since this is the first public write-up I've posted for a C++ reversing challenge, in the next section I point out a few code C++ code patterns that I used to understand the decompilation better. There are many other introductions to reversing C++ online, but here I only use excerpts from the challenge binary (specifically main() and compute()). ### C++ Patterns in Ghidra #### std:: and Namespaces cpp std::cin  :: is the scope resolution operator. When prefixed with std, it indicates use of the standard library (which has [plenty](https://www.cplusplus.com/reference/) of [documentation](https://en.cppreference.com/w/) online). Scope resolution is also commonly used to reference class member functions: cpp std::vector<int,std::allocator<int>>::push_back(pvVar6,(int *)&local_440);  Ignoring the stuff between the <> for now, this line calls the push_back member function of the vector class of the standard library. #### Operators cpp std::operator<<((basic_ostream *)std::cout,"That\'s not a number\n");  This function, and any other function name prefixed with operator, indicates the use of operator overloading. The original C++ looked similar to this: cpp std::cout << "That's not a number\n";  #### Containers The C++ standard library includes a bunch of commonly used data structures. Looking over the challenge binary, we see the use of: * [std::map](https://en.cppreference.com/w/cpp/container/map) * [std::vector](https://en.cppreference.com/w/cpp/container/vector) * [std::pair](https://en.cppreference.com/w/cpp/utility/pair) * [std::set](https://en.cppreference.com/w/cpp/container/set) * [std::queue](https://en.cppreference.com/w/cpp/container/queue) * [std::deque](https://en.cppreference.com/w/cpp/container/deque) #### Generics C++ programmers can avoid rewriting a general algorithm for each type using templates. When using generic functions, the actual type is specified and wrapped in <> as follows: cpp std::pair<int,int>::pair<int,int,true>((int *)&local_b0,(int *)&local_c8);  Templating can even be nested: cpp local_c0 = (vector<int,std::allocator<int>> *) std::map< int, std::vector< int, std::allocator<int> >, std::less<int>, std::allocator< std::pair< int_const, std::vector< int, std::allocator<int> > > > > ::operator[](this,&local_e0);  For the purposes of reversing, generics/templating can mostly be ignored unless you need type information. Understanding this reduces the amount of code you have to look at to understand the binary. #### (Con|De)structors Functions with the same name as the class are constructors. Functions with the same name as the class but prefixed with ~ are destructors. So this mess which takes up 2 lines in Ghidra: cpp std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>>::~basic_string ((basic_string<char,std::char_traits<char>,std::allocator<char>> *)local_1c8);  just destroys the basic_string at local_1c8. #### this Pointer Calling member functions on C++ objects makes use of an implicit this pointer which typically [isn't shown in the function prototype in documentation](https://www.cplusplus.com/reference/queue/queue/pop/) but shows up as the first parameter in Ghidra. For example, the following decompilation output: cpp std:: queue<std::pair<int,int>,std::deque<std::pair<int,int>,std::allocator<std::pair<int,int>>>>:: pop(local_78);  roughly correlates to: cpp local_78.pop();  #### Operator Chaining The below code keeps using the stream insertion operator (operator<<) with pbVar5 and storing the result in pbVar5: cpp pbVar5 = std::operator<<((basic_ostream *)std::cout,"Input "); pbVar5 = (basic_ostream *) std::basic_ostream<char,std::char_traits<char>>::operator<< ((basic_ostream<char,std::char_traits<char>> *)pbVar5,local_458); std::operator<<(pbVar5," pairs of numbers:\n");  operator<< expects a std::basic_ostream as its first parameter, which it also returns, allowing chaining. The original code for the above was something like the following: cpp cout << "Input " << local_458 << " pairs of numbers:\n");  #### Iterators C++ uses iterators to loop over items. Usually the begin() and end() member functions of classes return an iterator pointing to the first and last element of the container respectively and operator++ advances an iterator. Seeing these functions used like below: cpp local_d0 = std::vector<int,std::allocator<int>>::begin(local_c0); local_c8 = std::vector<int,std::allocator<int>>::end(local_c0); while (bVar2 = __gnu_cxx::operator!= ((__normal_iterator *)&local_d0,(__normal_iterator *)&local_c8), bVar2 != false) { piVar4 = (int *)__gnu_cxx::__normal_iterator<int*,std::vector<int,std::allocator<int>>>:: operator*((__normal_iterator<int*,std::vector<int,std::allocator<int>>> *) &local_d0); local_dc = *piVar4; local_d8 = local_d4 + 1; std::pair<int,int>::pair<int&,int,true>((pair<int,int> *)&local_b0,&local_dc,&local_d8); std:: queue<std::pair<int,int>,std::deque<std::pair<int,int>,std::allocator<std::pair<int,int>>>>:: emplace<std::pair<int,int>>(local_78,(pair *)&local_b0); __gnu_cxx::__normal_iterator<int*,std::vector<int,std::allocator<int>>>::operator++ ((__normal_iterator<int*,std::vector<int,std::allocator<int>>> *)&local_d0); }  represents iterating over a container (with some logic inside the loop too). #### Missing Parameters This aspect isn't C++ specific but it showed up in this binary so I wanted to include it here. Take the following code: cpp plVar3 = (long *)std:: queue<std::pair<int,int>,std::deque<std::pair<int,int>,std::allocator<std::pair<int,int>>>> ::front();  std::queue::front() should have the implicit this parameter but we don't see one. The solution is to check the corresponding listing (disassembly) output: asm 00102437 48 8d 45 90 LEA RAX=>local_78,[RBP + -0x70] 0010243b 48 89 c7 MOV param_1,RAX 0010243e e8 83 13 CALL std::queue<std::pair<int,int>,std::deque<std:: undefined front(void) 00 00  Per the System V ABI (since this binary is an ELF), the first parameter should be placed in rdi/edi, and hovering over param_1 shows us exactly that, so the parameter for front() is rbp - 0x70. This example just goes to show you that decompilation output isn't perfect and you should consider the assembly to be closer to ground truth (though it could also be incorrect!). ### General Program Flow Applying the above C++ knowledge made it a lot easier to quickly see what actually mattered. Before analyzing more intensely, a quick glance showed the general structure of main(): * read in user input strings and parse pairs of numbers from the strings * construct a map<int, vector<int>> based on the input pairs * pass the map to compute() * if the call to compute() is non-zero, pull pairs out of the map and use them to generate a string * pass the string to SHA1() * XOR the SHA1 hash and output as a string This binary makes heavy use of data structures, which made sense because it allegedly implements some algorithm. Since I didn't immediately recognize it, I set about trying to figure out how exactly the map is built from input and what map to pass to compute() to get it to return a non-zero value. ### A Deeper Look #### map Building The binary: * uses getline() to read a string and std::basic_istream with std::operator>> to either parse an int (which I called num_pairs) or fail out * creates a map<int, vector<int>> which I called pair_map * loops num_pairs times: * again uses getline() and std::basic_stream with std::operator>>, this time to read 2 integers (which I called num_key and num_val) or fail out * if pair_map[num_key] doesn't exist yet, assigns it to an empty vector<int> * if pair_map[num_val] doesn't exist yet, assigns it to an empty vector<int> * adds num_val to the pair_map[num_key] vector * adds num_key to the pair_map[num_val] vector So if I entered: $ ./algorithm
Input a number:
5
Input 5 pairs of numbers:
1 2
2 3
2 4
3 4
4 5
Wrong!
$ it made: | Key | Value Vector | |:---:|:-------------| | 1 | < 2 > | | 2 | < 1, 3, 4 > | | 3 | < 2, 4 > | | 4 | < 2, 3, 5 > | | 5 | < 4 > | After a while of staring at this code and compute(), it slowly dawned on me that the code is creating an [adjacency list](https://en.wikipedia.org/wiki/Adjacency_list)! An adjacency list is a common way to store a graph. This is the part where noticing graphs.cpp would have saved a bit of time. Regardless, realizing this made me realize that compute() is implementing some sort of graph algorithm, and I had to provide a graph with a certain structure to pass the check. #### The compute() Check Now that I had a semantic understanding of the map and its values, figuring out compute() was much faster. The function creates a set, queue, and hardcoded first key/value pair (4, 0) which gets added to the queue. The data structures are then used as follows: * pairs which I denote (curr_key, curr_val) are removed from the queue until one is found which isn't already in the set * every curr_mapval_vec_val in the vector pair_map[curr_key] is added to the queue as pair (curr_mapval_vec_val, curr_val + 1) If that process sounds familiar, that's because this function implements Breadth-First Search! The map is the adjacency list, the queue indicates upcoming nodes to visit, and the set contains already-visited nodes. The value which gets incremented by 1 every iteration of the loop is the cost to reach nodes! That means each pair is (node_#, node_cost). Some of this reversing was made more confusing by the compiler's reuse of the same stack area for different purposes in different parts of the function, which I haven't quite figured out how to label cleanly yet. ## Solution Now that I understood the data structure and the algorithm, I just had to figure out the structure of the graph that would pass the additional checks. The main loop in compute() exits in 3 cases: * returns failure if the node_# is ever not equivalent to (node_cost + 4) * returns success if the queue becomes empty when (set size == map size == 6) * returns failure if the queue becomes empty otherwise The check for node_# == (node_cost + 4) confused me for a bit until I remembered that the first node is node 4 with cost 0. This check basically ensures that the nodes must be found in order of their numbering, i.e. node 5 must come next with cost 1, node 6 must come after that with cost 2, etc. This means the structure of the graph is just a boring line. Since the size of the map is the number of lists (std::vectors) in the adjacency list, there must be 6 nodes in this line. The following graph passes all the checks: ![Image demonstrating valid graph](valid-graph.svg) The adjacency list for this graph is: | Node | Connected Nodes| |:----:|:---------------| | 4 | 5 | | 5 | 4, 6 | | 6 | 5, 7 | | 7 | 6, 8 | | 8 | 7, 9 | | 9 | 8 | I entered the edges to the program as follows to get the flag: $ ./algorithm
Input a number:
5
Input 5 pairs of numbers:
4 5
5 6
6 7
7 8
8 9
utflag{graphs_lol}
$ It's important to note that while the order of the two nodes in each edge doesn't matter, the order of the edges DOES matter. In other words, this still produces the flag: $ ./algorithm
Input a number:
5
Input 5 pairs of numbers:
5 4
5 6
6 7
8 7
9 8
utflag{graphs_lol}
$ while this passes the checks but outputs an incorrect flag: $ ./algorithm
Input a number:
5
Input 5 pairs of numbers:
4 5
6 7
5 6
7 8
8 9
4\7!>Xu3d3
\$


Original writeup (https://github.com/cscosu/ctf-writeups/tree/master/2021/utctf/Algorithm).