Tags: haskell code-golf np-complete 

Rating:

Code Golf

Google CTF 2019

Index

Acknowledgements

This writeup is done by Cole Kurashige.

I'd like to thank Kye Shi for his help designing a faster algorithm, and Giselle Serate for algorithm verification and actually running the code. Also for showing me the CTF.

The problem

As a foreward/warning, this is a lengthy writeup. If you want the TL;DR highlights , first read the problem statement if you aren't familiar with the problem. I think the most interesting highlights are the golfing tips, specifically this one, and the NP-Completeness proof.

Or just skim the titles and skim/read what is interesting. A lot of the length comes from headers, newlines, and code blocks - as far as text goes I've tried to edit things so they're to-the-point.

Problem statement

The problem could be found here as of 6/27/19. In case it gets moved or taken down, I've described it below.

Given a list of strings with gaps in them (denoted by a space), you're asked to combine them into a single string. Imagine all of the strings stacked on top of each other. You want to shift the strings to the so that only one or zero characters are in each position. You also want to minimize the length of the resulting string (ignoring trailing and leading gaps). The tie-breaker for multiple solutions is lexicographic length. Your solution is a function g :: [String] -> String.

An example given is the strings "ha m" and "ck e":

If you overlay them:

"ha  m"
"ck e"

Shift them by an appropriate offset:

"ha  m"
  "ck e"

You get the resulting string "hackme".

The catch to all of this is that it must be done in Haskell in 181 bytes or fewer.

Oh, also, it is NP-Complete.

Some background

I spent a long time on this problem. Its theme for me was "comfort's a killer." I really like Haskell, and have been using it recently. I really like code golf, and have golfed code in the past. I probably should've spent more time thinking through an algorithm, but I dove in headfirst. I knew there was a tips for golfing in Haskell on the code golf stack exchange, but I neglected to use it.

And so, I spent an entire weekend on one problem. Welcome to my hell.

Lessons

I learned three important lessons from this endeavor.

Lesson #1

Code golf skills don't just magically manifest themselves in another language.

I primarily code golf in J and ><>. Haskell is neither of these. I used pretty much none of my existing code golf knowledge in golfing this challenge.

Lesson #2

Imports are useful.

I should've used more imported functions, especially those from Data.List more. Our final solution used only two imports: join from Control.Monad and (\\) from Data.List. It can probably be reduced to just using Prelude, while still mantaining an acceptable bytecount.

Lesson #3

Think before you write code.

My first algorithm was far from perfect, and even had a flaw in its logic. When this was fixed and it was golfed, we learned much to my chagrin that it used too much memory and was silently failing. Soooooo ... we had to come up with a new one from scratch. And then golf it again. Did I mention that I spent a lot of time on this problem?

The solution

A first pass

The first thing I did was write a half-golfed program.

The algorithm was essentially to find every possible way to shift each of the strings and overlay them. Some of these shifts would be invalid, so I discarded those.

I ignored the possibility of strings having trailing spaces, since that would just be cruel. Later solutions also ignored the possibility of strings having leading spaces. This turend out to be an OK assumption to make (though I wish it were told to us in the prompt).

There is a mistake in this code: I am only taking the lexicographically minimal solution, not the minimal length solution. This is fixed in the golfed version, at the cost of many bytes.

import Data.List
import Data.Maybe
import Control.Monad

-- | the solution function
g :: [String] -> String
g xs = minimum . catMaybes $ [sequence . dropWhile (==Nothing) 
                             . map collapseW . transpose $ s | s <- shifts l xs]
  where
    l = sum $ map length xs

-- | find all possible ways to shift a string 's' with at most 'l'
-- characters of padding.
wordShifts :: Int -> String -> [String]
wordShifts l s = [space <> s | space <- spaces]
  where
    spaces = inits $ replicate l ' '

-- | find every possible way to shift a list of strings, where
-- each string can be shifted at most 'l' characters to the left
shifts :: Int -> [String] -> [[String]]
shifts l [] = [[]]
shifts l (w:ws) = do
  shifted <- wordShifts l w
  rest <- shifts l ws
  return $ shifted : rest

-- | try to collapse a string into a single character (think of this as reducing
-- a column to 1 character, this fails if there are more than 2 non-space characters
-- or no non-space characters)
collapseW :: String -> Maybe Char
collapseW s = do
  [y] <- foldMap convert s
  pure y
  where
    convert ' ' = Nothing
    convert c   = Just [c]

I golfed this down to 178 bytes.

f=foldMap
h ' '=[]
h c=[c]
k l s=[f<>s|f<-inits l]
z l[]=[[]]
z l(x:y)=[a:b|a<-k l x,b<-z l y]
g x=minimum[concat y|s<-transpose<$>z(f(>>" ")x)x,let y=f h<$>s,all((<=1).length)y]

The first correct code

At precisely 181 bytes, this was the first code that picked the correct solution, sorting the possible ones by length, then lexicographically.

h=length
f[]=" "
f l=l
z l[]=[[]]
z l(x:y)=[a:b|a<-[i<>x|i<-inits l],b<-z l y]
g x=snd$minimum[(h a,a)|s<-z((>>" ")=<<x)x,let y=concat.words<$>transpose s,all((<=1).h)y,let a=f=<<y]

The problem now is that the code was failing silently! Usually the server would tell you if you had an error (compilation, runtime, byte length), but it didn't respond and instead failed after about 25-30 seconds.

We tested with code that ran infinitely, which would get cut off at exactly 1 minute, so with some more testing we concluded that the code was using too much memory and getting killed.

(we tested the memory usage with foldl (+) 0 [1..], which timed out after 20 or so seconds, and confirmed it was a problem with memory that caused silent failure with foldl' (+) 0 [1..], which only timed out after a minute. Ahhh Haskell, where one character makes a huge difference)

More compact code

I further golfed the first correct solution to 151 bytes. I don't remember why I did this. It was golfed during the phase where we were trying to figure out why the server wouldn't accept our solution. Maybe I was trying to fit space stripping into the code, but didn't get around to it.

I added some spaces to the last function g so it doesn't wrap so hard, but they aren't part of the bytecount.

h=length
f""=" "
f l=l
g x=snd$minimum[(h a,a)|s<-forM[(>>" ")=<<x|_<-x]inits,
  let y=concat.words<$>transpose(zipWith(<>)s x),all((<=1).h)y,let a=f=<<y]

The first accepted code

The first correct code took somewhere between 2 to 4 hours of effort to reach. By midday Saturday, I had something that was morally right, but didn't pass the tests. That evening, Giselle and I tracked down the root of the problem to space issues.

I complained to Kye about my solution not being good enough, despite being short enough, and he devised an algorithm that was better (at least memory-wise) than my like O(n^n) space algorithm. This was maybe around midnight on Saturday (which was 3 AM his time...).

He, however, did not golf it, so I still had to reduce his ~500 bytes to the below.

I spent an hour or two golfing his solution after I woke up on Sunday and brought it down to exactly 181 bytes.

m(a:b)(x:y)=max a x:m b y
m x y=x<>y
f s[]=[s]
f s r=join[f(m s x)$r\\[x]|x<-r,and$zipWith(\x y->x==' '||y==' ')s x]<>[h:y|(h:t)<-[s],y<-f t r]
g y=snd$minimum[(length x,x)|x<-f""y]

Yup, this code is a lot different. It ended up being a lot more inefficient than his original code, too, since I cut out all of his optimizations when I golfed it.

Kye ended up writing an even more efficient version, which thankfully I didn't need to golf.

I just wish that we knew earlier that the "make it snappy" flavortext didn't mean to make the code short, but instead meant "efficient enough." I'm used to seeing time/space restrictions given upfront on the Code Golf Stack Exchange, where there can be answers that are right by observation but too inefficient to run anything but the simplest of test cases.

Golf you a Haskell

Here were some of the more inventive or useful golfing transformations I used. Since I foolishly neglected to use any resources other than the documentation, these were all found by myself.

Infix your code!

Haskell has a lot of infix operators (operators like + or - that go between their arguments). It's made fun of in this article (warning: somewhat NSFW language), which includes the following code that produces an infinite list of powers of 2.

import Data.Function (fix)
-- [1, 2, 4, 8, 16, ..]
fix$(<$>)<$>(:)<*>((<$>((:[])<$>))(=<<)<$>(*)<$>(*2))$1

The next few points are on how you can use these operators.

$

A simple transformation that I often use in code read by people other than myself is $. $ is a function that just applies its left argument to its right argument.

f $ x = f x

It has really low priority, though, so it can save you parentheses like in

-- these are the same
gcd (1+2)  (3*4)
gcd (1+2) $ 3*4

map

<$> and map are the same for lists, but the former has lower priority, which lets you reap some of the benefits of $, while also not needing a space between its arguments (unlike map).

concatMap

I found myself often using concatMap. I first reduced this to foldMap, and then to the infix =<<. All of these have the same definition for lists.

return

I used return to convert an element x to a singleton list. This becaume pure and then finally [x] once I realized I was being a moron.

Filling holes

A tricky part of this problem was combining two strings with holes (spaces) in them to produce one where the holes were filled. As it turns out, the only printable ASCII less than space (ASCII 32) is whitespace, so I figured these wouldn't show up in the strings. Thus, given two chars in the same column, we can take their maximum to find the non-space char (if it exists).

Cheeky pattern matching

I had code that I wanted to give a list if x pattern matched one thing and the empty list if it did not. Something that looked like

case x of
  (h:t) -> foo h t
  [] -> []

I converted this to

[foo h t|(h:t)<-[x]]

This abuses the fact that when the pattern match (h:t) fails in the context of a list comprehension, an empty list is returned instead of an error. This is a special case of how pattern matching is desugared inside of a do block.

N.B. foo has a different type between these examples.

Finding the possible shifts

Given strings :: [String], I wanted to find all of the ways these strings could be shifted. I eventually boiled this down to finding paddings ::[[String]], where each padding :: [String] in the list was the same length as strings and had a varied number of spaces in each element. So each padding, when combined element-wise with strings would give a different shift.

A way of doing this would be

import Data.List (inits)

cartesianProduct :: [[a]] -> [[a]]
cartesianProduct []       = [[]]
cartesianProduct (xs:xss) = [x : ys | x <- xs, ys <- cartesianProduct xss]

paddings :: [String] -> [[String]]
paddings strings = cartesianProduct $ replicate (inits maxPadding) (length strings)
  where
    maxPadding = replicate (sum $ map length strings) ' '

Let's take a look at

maxPadding strings = replicate (sum $ map length strings) ' '

In order to make sure that I was shifting enough, I found a maximum padding that was equal to the sum of the lengths of the strings.

We can reframe this as converting each element in strings to a string that is the same length but only consisting of spaces, then concatenating all of these elements together.

maxPadding strings = concatMap (\str -> replicate (length str) ' ') strings

replicate . length is way too long, so let's replace it with (>> " ").

maxPadding strings = concatMap (\str -> str >> " ") strings

Eta reduce and obfuscate concatMap to give

maxPadding strings = (>> " ") =<< strings

Much better.

This is only the max padding for a single element though. I wanted to find paddings. inits :: [a] -> [[a]] will get us part of the way there, since it will give all possible prepended spaces, from 0 to maxPadding.

inits [1,2,3] = [[],[1],[1,2],[1,2,3]]
inits (maxPadding ["a","bc","d"]) = ["", " ", "  ", "   ", "    "]

We then want the cartesian product of inits maxPadding repeated length strings times. cartesianProduct is a long definition, so why don't we use the list Monad some more?

paddings strings = sequence (replicate (inits maxPadding) (length strings))

sequence is the same as \x -> forM x id or \x -> mapM id x, so we can convert to

paddings strings = forM (replicate (inits maxPadding) (length strings)) id

We want to be applying inits to every element anyway, so we can pull it out.

paddings strings = forM (replicate maxPadding (length strings)) inits

Then get rid of this replicate nonsense by using a list comprehension that ignores all of the values of strings.

paddings strings = forM [maxPadding | _ <- strings ] inits

Substitute the definition of maxPadding and we're done.

paddings strings = forM[ (>> " ") =<< strings | _ <- strings] inits

354 bytes of (reasonably) readable code down to 67 bytes of nonalphanumeric soup.

Don't you love code golfing?

NP Completeness

On Saturday evening, I was banging my head against a wall trying to optimize the space and time complexity of my algorithm. But every time I tried to think through a faster algorithm, something felt wrong. I had a feeling that the problem was NP Complete, and so the only thing I could get optimal would be space. I eventually sat down and proved it NP-Complete.

Proof

The proof is by reduction from the Bin Packing Problem (BPP). This is a pretty rigorous proof, but I tried to make it understandable. I think the Reduction, visualized section does a pretty good job of building intuition for how the proof works, but if you haven't seen a reduction before this all might seem kind of obtuse.

Bin Packing Problem (decision version)

The BPP asks, given m bins of size V and a set S of elements with an associated cost function f : S -> N (where N is the natural numbers), can S be partitioned into at most m subsets, each of whose total cost is less than or equal to V? In plain English, given m bins, each with capacity V, can we fill each bin to at most capacity with all of the elements in S?

Crypto Problem (decision version)

First, I will state the decision variant of the Crypto Problem (CP). Given a "set" T of strings that have holes in them represented by spaces and a target length l, the Crypto Problem asks whether all elements of T can be overlaid such that there is at most one non-hole per column and the length of the overall result (after removing trailing and leading spaces) is l or less.

(this "set" may have duplicates - I don't want to be as rigorous as in the BPP)

Reduction

We can construct a CP from an arbitrary BPP as follows.

We first will construct the set T. Create m strings, each of length 3V + 2. The first and last V+1 characters of these strings are #, and the rest (the center) are spaces (holes). Character choice doesn't matter here, so I pick an arbitrary one. These strings will represent our bins, and I will refer to them as "bin strings".

For each element s in the set S from the BPP, we want to make an analagous element for the CP. This element will be f(s) long, and consist only of the character *. Again, character choice doesn't matter here. I'll refer to these as "* strings".

Now, we pick a length l. This CP will have a maximum length of (3V + 2) * m.

This reduction clearly takes polynomial time as it involves iterating as many times as there elements of S and bins.

We claim that the given BPP is solvable if and only if this constructed CP is solvable.

Reduction, visualized

Let's consider a BPP with V = 3, m = 3, and elements of size 1, 1, 2, and 2.

The bins:

# #  # #  # #
# #  # #  # #
# #  # #  # #
###  ###  ###

The elements:

*  *  *  *
      *  *

A valid placement of these is putting a 1 in its own bin, a 2 in its own bin, and a 1 and 2 in the last bin. Another valid placement would be to fill two bins entirely and leave one empty.

# #  # #  #*#
# #  #*#  #*#
#*#  #*#  #*#
###  ###  ###

The equivalent problem in CP is the following strings:

####   #### (x3)
* (x2)
** (x2)

And an equivalent solution is the following:

####   ####
    *      
           ####   ####
               **
                      ####   ####
                          ***

which, when flattened looks like

####*  ########** ########***####

Note how the length is (3V + 2) * m = 33.

Reduction proof (forward direction)

Given a solution to the BPP, we can make a solution to the CP. Suppose in the BPP solution that some arbitrary bin B was filled to a volume V' <= V using the elements in the set S'. This implies that

sum(f(s') for s' in S') = V' <= V

Since we created a bijection from elements of S in the BPP to elements of T in the CP, find the corresponding elements from S' and put them in the set T'. We claim that the elements of the set T' can be made to fit into a single bin string.

Why is this? Well, the sum of the lengths of these strings is less than V, which is the amount of space in the middle of each "bin" string. Therefore, we can just concatenate all of these strings together and place them in the center of the bin string.

Since we can do this for an arbitrary bin, and there is a bijection from S to T, i.e. all bins in the solution of the BPP span all of S, we can fit all of the elements in T corresponding to elements in S inside of the "bin" strings. This means that there exists a solution where we place all of the * strings inside of the bin strings. Therefore, the overall length of this solution is just the length of all the bin strings combined, which is (3V + 2) * m as desired.

Reduction proof (backwards direction)

We'll prove the contrapositve of the backwards direction. If there doesn't exist a solution to the BPP, we cannot make a solution to the CP.

First, it should be reasonably obvious from the previous direction that the strategy of placing the * strings in the bin strings won't work, since the BPP is unsolvable. If we could place them in the bins, then that would imply that the BPP was solvable, which contradicts the premise.

So we would have to find a solution to the CP that doesn't involve just putting the * strings inside of the bin strings. Since our target length is (3V + 2) * m, we need to fill all of the holes in the bins. If we aren't just putting * strings inside of bin strings, this means we would have to have bin strings intersect. This is impossible, as on either end of the V holes in the center there are V + 1 holes on the side.

Visually, for V = 3, we get alignments that look like this.

top bin    | ####   ####  | ####   ####   | ####   ####    | ####   ####     | ...
bottom bin |  ####   #### |   ####   #### |    ####   #### |     ####   #### | ...

The bins all have to intersect in some column, which is not allowed in an answer.

Therefore, it is impossible to have a solution to the CP.

Conclusion

Since we showed a reduction existed from the BPP to the CP that took polynomial time, and that the constructed CP was solvable if and only if the corresponding BPP was, we conclude that the Crypto Problem is NP-Complete.

Original writeup (https://github.com/doublestandardctf/GoogleCTF2019/blob/master/Code-Golf.md).