Tags: steganography 

Rating: 5.0

# War and Pieces
## The Problem

We are given a photo of some toy soldiers of varying colours and poses, and an instruction that there’s nothing in the EXIF data.

![Toy soldiers of varying colours and poses](https://i.imgur.com/oW1FUcQ.png)

All signs seem to point to the code existing in the actual soldiers themselves. One of the soliders is even turned around to face backwards. But what could the code actually be?

## Theory 1: Binary?

Maybe the colours are binary? Well, that seems to break down for a few reasons. We would expect to be receiving code in bytes, and there is one row with two instead of eight bits. Also, if we were receiving it in bytes, we would probably expect all the leading bits to be 0 (because all the normal printable characters are on the lower half of the ASCII table)

I thought at this point that maybe each figure has its own binary value associated with it -- for instance, there are five or six different poses, so that could be encoded in three bits, and then the colour of the figure could be one bit. This would then give some binary code to work with.

## Theory 2: Wait, is there a quicker way?

After starting the process of painstakingly encoding this way, I realised I could just count the figures and assign them a code for where they first appear in the photo, and then do the encoding afterwards

![The same toy soldiers of varying colours and poses, now with numbers on them](https://i.imgur.com/k7av98Q.png)

(There are actually two mistakes on the second last row -- it should read 6 7 6 12 4 1 8 4. Both of these are fixed in the text file at the bottom of the writeup.)

## Pattern Recognition
After encoding in this way, some patterns became obvious: there is a repeat of the 04 04, which would be consistent with being the repeating Ss in the UMASS{flag_here} format. We also see that the two curly braces have the same leading digit, an 8. This makes sense because they’re next to each other on the ASCII table.

If we look through all our leading digits, there is a set of only five unique digits. This is consistent with being an ASCII encoding.

We also see that we have more than 10 unique encodings, so we’re probably looking at hex values.

## Code Breaking
So what is the code we’re looking at here?

Each individual soldier corresponds to a unique hex value. Each pair of soldiers is a pair of hex values, which can be converted to a printable character.

By counting them in the way we’ve counted, we have substituted all the hex values, so they’re all jumbled up. With a bit of code, we can cross-reference the ones we know to be true with the ones we still need.

We don’t actually have enough information yet to magically bring a flag into existence. But we can write some code that prints out all the possible flags with the information that we have.

We can edit line 10 to change parts that we’re more confident about.

## Tying up loose observations
There’s a few observations to follow here:

Remember how we first observed that our leading digits are in a set of size 5? We said that was because they’re likely in the regular ASCII range. Let’s print out the mappings of all the leading hexcodes so far (this is already in the code). We see that 6’s mapping isn’t revealed yet. We also see that the new mappings for our other four possible leading hexcodes are 3, 4, 5, and 7.That would just leave 6 as a possible mapping for regular ASCII characters. Let’s put in a line that updates our dictionary to always map 6 to 6. We should now only see the expected English ASCII characters in our flag options.

Aha! Now we have a lot fewer unprintable characters. If we run again, we can see that there’s really only one obvious delimiter that stands out, and that’s the underscore (although I did think maybe the colon was spelling something like “ifiXt:vXs:il4s” for a little while.)

## The Home Stretch
We see that the underscore appears as the 4th and 8th character of some of the strings. We can edit the underscore into the string on line 10, and we’ll now have only those lines appear. And what do you know, a word suddenly leaps out from the list at the end of the flag! If we look through our options, we see s0lj4s at least once, (which sounds like soldiers).

So we change the control string to now read s0lj4s, and hit run. We get a strange gibberish string: UMASS{lfl_t0v_s0lj4s}. That is our only option. Now, we might think “Oh, it should be a y there instead of a v”, but the way the code is written, it will just write the y back to being a v anyway. So, this is the flag.

(It turns out that there is an error in the original photo because two different hexcodes were encoded using the same figure. The flag is supposed to be UMASS{lil_t0y_s0lj4s). I leave it as an exercise for the reader to check their understanding by determining which two figures were incorrect.)

# Files
## Code written in python3
import itertools


with open('army.txt','r') as savefile:
for line in savefile:

# Edit the Zs in this string to be something we know appears in that position.

# Takes the hex values from our .txt file input and saves it in a dictionary
# that maps it to the hex values of the characters in the string.
for x,y in zip(a,b):
if y != "Z":
for i,j in zip(x,hex(ord(y))[2:]):
substitution_dict[i] = j

# Prints out all the LEADING digits, and whether they currently have a mapping.
for x in set(p[0] for p in a):
if x not in substitution_dict:

# Prepare all the hex values that are found in the original, but aren't
# found in the current dictionary.
used_hex = set()
for x in a:
used_hex |= set(x)
allhex=set([hex(x)[2] for x in range(16)])

# Iterate through all the possible mappings.
for com in itertools.permutations(f,len(e)):

new_array = []
for char in a:
s = ""
for p in char:
s += substitution_dict[p]

print("".join([chr(int(x,16)) for x in new_array]))
except UnicodeEncodeError:


## army.txt input file