Rating:

# Poor Stego Guy Writeup (en)

## Problem

A PNG file and its generation script are given. The generation script firstly creates a random JPEG file with the following command:

```
docker run -v `pwd`:/mnt dpokidov/imagemagick:7.0.10-9 -size 128x128 xc:"gray(50%)" +noise random -level -20%,120%,1.0 -quality 50 /mnt/noise.jpg
```

and XORs the LSB of each pixel of the generated image with the flags, and outputs it as a PNG file.

It is typical technique of steganography to fill LSB of each bit, but this is XOR, so you need the original JPEG file to recover the flags... So the question here is how to do it.

## Solution

First of all, let's make sure that this question has solution in principle. If the first image generated by the Imagemagick command is a PNG file, then this question is not answerable in principle. This is because XORing the plaintext with randomly generated pixel values is essentially the same as a one-time pad, and cannot be deciphered unless the original random values can be predicted.

It's important to note that, of course, the first image generated is not a PNG file, but a JPEG file, which is an irreversible compression. This is achieved by dropping the amount of information in the image. In other words, the possible states of the image pixels in the reduced information state are sparser than those of a mere random noise image, and it is possible, in principle, to recover the original image from an image that has one of these possible states with a specific operation.

It is possible to use this property to recover the flags by obtaining the original image and comparing it to the generated output.png. However, this requires in-depth knowledge of JPEG. The specific methods are described below.

First, let's briefly discuss the principle of JPEG compression, which is roughly as follows

1. Divide the image into blocks of 8x8 pixels (each block is then processed individually)
1. Two-dimensional discrete cosine transformation is performed on pixel values to convert them into frequency components
1. Quantize the frequency component values and round them to a multiple of the specified vector

The two-dimensional discrete cosine transform is a transformation technique that maps the pixel values of an image to the frequency domain as discrete signals, which, without getting too complicated, are ultimately represented by a linear superposition of 64 different patterns, as shown below.

![](https://abyx.be/images/2431007f3188e37f4e156a8374b4f611.png)

The DCT coefficients are usually real values, but in JPEG they are further compressed and rounded to a multiple of a specific value for each pattern. This is called quantization, and this list of "specific values" for each pattern is called a quantization table.

Since the quantization table is defined within the JPEG file, various values can be taken, but it is common for standard encoders such as libjpeg to use a fixed quantization table defined in advance. In particular, when the image quality parameter is set to 50, it is easy to know the quantization table of the JPEG file generated by the above Imagemagick command because it uses [the quantization table described in Section K1 of the JPEG specification](https://github.com/LuaDist/libjpeg/blob/6c0fcb8ddee365e7abc4d332662b06900612e923/jcparam.c#L64-L87).

Now, let's consider this JPEG compression technique again: the pixel pattern output by the JPEG is represented by a linear superposition of the 64 patterns above, and the coefficients are rounded to a specific multiple of the value, which those of you who are familiar with the crypto problem may have figured out. As you might expect, a lattice is a mathematical structure with these spatial properties. **That is, the value of each block encoded in the JPEG can be considered as a specific state on a 64-dimensional lattice as it is.**

Let us then reflect on the purpose of this problem. Our goal is to recover the original value given a possible value in the mathematical structure of the JPEG, plus a small noise, from which we can recover the original value. Such problems are called Closest Vector Problems (CVP) in lattice theory and are known to be somewhat efficient in solving them using LLL. In particular, when the dimension is about 64 dimensions, the original coefficients can be recovered in a realistic time.

We now know the patterns for constructing the lattice, its quantization table, and the target vector values. It is therefore possible to recover the pixel values of the original JPEG image, from which the flags can be restored.

Note that the inverse discrete cosine transform by JPEG decoders usually contains some errors, so the flags can only be obtained by completely mimicking the decoding by the Pillow library given in the problem (a highly accurate inverse discrete cosine transform calculation will not produce the flags in reverse). The easiest thing to do would be to get the DCT coefficients, export them as a JPEG file, and then reload them in Pillow in Python.

An example of the solver script with SageMath is placed in [solve.sage](https://github.com/tsg-ut/tsgctf2020/blob/master/misc/poorguy/solver/solve.sage).

Translated with www.DeepL.com/Translator (free version)

Original writeup (https://github.com/tsg-ut/tsgctf2020/blob/master/misc/poorguy/writeup.en.md).