Tags: ocr hash image misc 


## myopia

The code is using the [abonander/img_hash](https://github.com/abonander/img_hash) library to calculate the perceptual hash of the image. To get this flag, we need to find two images with the same perceptual hash of `ERsrE6nTHhI=` (in Base64), but one showing the text "sudo please", while the other showing the text "give me the flag". The text is OCR'd by the tesseract library.

Since the given image `img1.png` with the text "sudo please" already have this target perceptual hash, what's remaining is to generate the second image that will be recongnized by the OCR library as having the text "give me the flag", but having the same perceptual hash calculated by the `img_hash` library. We don't think it's a good idea to trick the OCR library since the codebase is large and is widely used, so we started by looking at the `img_hash` library.

The first thing the `img_hash` library does to the image is to convert it into a grayscale image. This logic is done by the `image` library. Looking into the source code, it's doing the basic luminance calculation. However, we found that this image library supports reading PNG files with transparent layer (RGBA), but will ignore the alpha value in the calculation. Therefore, near-transparent pixels will be included in the final grayscale image as well. Here we took a guess that tesseract library will deal with transparent images correctly, meaning it should ignore near-transparent pixels.

We generated an image that has the same pixel values of `img1.png`, but every pixel having alpha values of only 1. Then we added a black text saying "give me the flag" on top of the black area of the image, so if you directly view it on a light background, you can only see the text. But if you convert the image into grayscale, you would see the same image as `img1.png`. As what we have guessed, this worked very well.


## hyperopia

The second flag is the reverse of the first: we need to find two images with different perceptual hash, but the file content only differs by one bit, and both have to be OCR'd as having the same text "give me the flag".

Thinking that the JPEG format is extremely robust, we simply generated a small image in JPEG format with "give me the flag" text on it, and brute-forced every possible bit flip, until we find one that satifies the conditions (can be read by the image library and OCR'd to have the correct text, while having different perceptual hash).

Although we have no idea which part in the file we touched, but by the look of the final image, it seems that it's some coefficient of the DCT block.