Tags: linear_algebra 

Rating:

```
#################################################################################
## Problem statement
#################################################################################
# Fishy is trying to hack into a secret government website again!!!
#
# They learned from fishy's attempts last year, so they created a password of
# 10^5 integers, and each integer x in the password satisfies 0 <= x <= 2^31.
# However, this password was too long for them to check,
# so they made up a method that they hoped was quicker.
# They included 10^5 tests that the password checker will do.
#
# In each test, the computer checks from some left bound L to some right bound R,
# where 1 <= L <= R <= 10^5. The computer takes the xor of each value in the
# range L to R inclusive of the password and checks if that equals the expected
# value V. Fishy has found the the values for L, R, V for each test, but he
# needs your help to find a valid password. Can you help him?
#
# You will be given the L, R, and V values for each test,
#################################################################################

#################################################################################
# insights about problem
#################################################################################
#
# 1. taking the xor of two 1-bit numbers is the same as addition modulo 2
# 2. Taking the xor of two 31-bit numbers is like adding two 31-dimensional
# vectors modulo 2
# 3. The test the computer does is, thus, equivalent to a matrix multiply
# modulo 2 (over the field Z_2, if you're into that sort of thing)
# 4. The "password" is a (10^5)x31 (rowsxcolumns) matrix over Z_2
# (the field of integers modulo 2). Lets call this "P"
# 5. The L and R parts of the checks are a weird sparse representation of
# a (10^5)x(10^5) matrix over Z_2 -- let's call this "C" for "checks"
# 6. The V parts of the checks are another (10^5)x31 matrix over Z_2
# We'll re-use the letter V for this, pardon the confusion, and call it "V"
# 7. So our goal is to come up with some P (not neccessarily unique) satisfying
# V = C * P

#################################################################################
# Solution high-level overview
#################################################################################
# So, we ended up hand-writing something that smells like a super-custom
# implementation of Gauss-Jordan-Elimination, except over Z_2 and exploiting the
# sparse representation of the matrix C
#
# The solution first "subtracts" rows of the matrix equation from one another
# to make an equivalent matrix equation where each column of the revised C
# has exactly one row that starts in that column.
# This is analogous to the "make upper triangular" step of solving a matrix
# equation through GJE
#
# This can plausibly produce invalid rows if the matrix C wasn't invertible
# to begin with. This is fine. We just ignore those rows.
#
# Then it scans through the checks of C from the check whose start position is
# greatest to that whose start position is least.
# For each check, it satisfies that check by setting the value of the solution vector
# at the (unique) start position of the check (recall that at most one check starts at
# this position), given the solutions already set (recall, also, that any solution entry
# at any position past the start of the current check has already been set).
#
# Satisfying any check given all solutions that start later just entails
# xor-ing together all solutions whose position intersects the range of the check,
# then xor-ing in the value required by the check.
#
# This is, if you squint at it, analogous to the GJE step of solving
# a matrix equation with an upper triangular matrix.
#
# The last step uses a mildly clever acceleration trick of keeping an array of
# running xors of all solutions computed up to each point. This makes it fast
# to find the xor of all solutions over a range (provided the range is contained
# in the set of solutions we've finalized so far)

#################################################################################
# Actual implementation
#################################################################################

# When we first started working on this problem we extracted sample outputs to text files
# in an easily python-readable format. This saves whoever is writing the solver from having
# to worry about the parsing. The following three lines are that.
filename = "input2.txt"
# filename = "expohash.txt"
entries = eval(open(filename).readlines()[0])
# Feel free to replace them with whatever you use to read the data in and parse it.
# Result should be a python list of no more than 10^5 python lists of three elements,
# [L, R, V]

#################################################################################
# Compute equivalent modified set of checks where each check starts at a unique
# location
#################################################################################

# So, a normal person would first sort the checks by start point, and then
# perform all the transformations. We didn't do that.
# we, instead, keep an "indices" array where the "ith" item
# either points to a check whose range starts at i, or to -1
# if we haven't encountered such a check yet
#
# One clever trick we used to get around the fact that the problem
# uses 1-based indexing : we allocate our "indices" array to be 1 larger
# than it has to be, and then just ignore the entry at index 0
#
# This saves us from doing the kind of math we don't want to have to do.
indices = [-1 for i in range(len(entries) + 1)]

# I apologize for this function. It is far too general
# When we first started down this solution path, I thought we might
# need to also perform a "make sure each end is unique" step.

def handle_start_or_end(idx, s_or_e=0):
""" Given the index (idx) of a constraint, searches in indices
to see if we've already encountered a constraint with its start pos.
If yes, replace (in situ) the constraint with one produced by xoring
the constraint indexed by idx with the colliding constraint and start again,
if no, update the indices entry at the "start" value of the constraint at idx
and return.
Returns : the number of such merges we had to do.
"""
s = entries[idx][s_or_e]
merge_count = 0
entry = entries[idx]
# Checks like this enable us to short-circuit invalid constraints,
# which can be created when, e.g., merging two duplicate checks
if entry[0] > entry[1]:
return 0
while indices[s] >= 0: # while there's something else that starts where entries[idx] does
entry = entries[idx]
if entry[0] > entry[1]:
# print("Eliminated row " + str(idx))
return merge_count
alt = entries[indices[s]]
if alt[0] > alt[1]:
indices[s] = idx
return merge_count
merged = [min(entry[1 - s_or_e], alt[1 - s_or_e]) + 1,
max(entry[1 - s_or_e], alt[1 - s_or_e]), entry[2] ^ alt[2]]
# The above line is the merge magic. It's equivalent to adding two rows of the
# matrix equation modulo 2. The unique span structure of our constraints
# make it easy to compute that xor and keep it in terms of start and end.
entries[idx] = merged
s = merged[0]
merge_count += 1
if merged[0] > merged[1]:
return merge_count # completely eliminated row, no need to keep merging
indices[s] = idx
return merge_count

print("ingested raw data")

merge_sum = 0
indices = [-1 for i in range(len(entries) + 1)]
for i in range(len(entries)): # for each row
# xor it with existing rows until its start is unique so far
merge_sum += handle_start_or_end(i, s_or_e=0)
# Recall that while this loop looks linear, it may not be
# handle_start_or_end contains an internal while loop
# It seems to be fast in practice though.
# I haven't timed which part of this code takes the longest.

# printing cruft
print("We did " + str(merge_sum) + " merges")
remaining_entries = [e for e in entries if e[0] <= e[1]]
print("There are " + str(len(remaining_entries)) + ' entries left')
len_sum = sum([e[1] for e in remaining_entries]) - sum([e[0] for e in remaining_entries])
print("Average cover per num is " + str(len_sum/len(entries)))
print("Average constraint length is " + str(len_sum/len(remaining_entries)))

#################################################################################
# Now compute solutions from modified checks
#################################################################################

# recall that "indices[i]" already contains the index of the unique check starting
# at each location i, or it contains a -1 if no such check exists

# Initialize solutions to 0. Use the same trick of making the array too big
# so we can pretend we're using 1-based indexing
solutions = [0 for i in range(len(entries) + 1)]

# This is our speedup trick.
# xors[i] contains the xor of all solutions[j] for j >= i
# This allows us to quickly compute the xor of all solutions
# in a span
xors = [0 for i in range(len(entries) + 2)]

# and this is our running xor that we use to update the next entry in xors
running_xor = 0

# we compute from last to first, reverse order
for i in range(len(indices)-1, 0, -1):
idx = indices[i]
if idx < 0: # if no check starts at i, continue
xors[i] = running_xor
continue
entry = entries[idx] # grab the unique (valid) check whose start is i
span_xor = running_xor ^ xors[entry[1] + 1] # quickly compute the xor over the span
sol = entry[2] ^ span_xor # and, since none of the filled-in solutoins depend on i
# we can set that to whatever we want
solutions[i] = sol
running_xor = running_xor ^ sol # then update the running xor
xors[i] = running_xor # then update the list of running xors

#################################################################################
# Now we run some quick tests just to make sure our solution is valid
# Because, of course, we were developing this separately before hooking it up
print("Now to test solutions")
import random

ntests = 5

for iter in range(ntests):
print("On test " + str(iter) + " of " + str(ntests))
idx = random.randrange(len(entries))
entry = entries[idx]
running_xor = 0
for i in range(entry[0], entry[1]+1): # for the test we compute runnin xors by hand
running_xor = running_xor ^ solutions[i]
print("entry is " + str(entry) + ", sol xor is " + str(running_xor))
# then print out running xor and check
# Inspect visually

# $ time python3 process.py
# ingested raw data
# We did 334067 merges
# There are 83776 entries left
# Average cover per num is 21101.08604
# Average constraint length is 25187.507209702064
# Now to test solutions
# On test 0 of 5
# entry is [99980, 99979, 0], sol xor is 0
# On test 1 of 5
# entry is [90458, 93028, 1342293332], sol xor is 1342293332
# On test 2 of 5
# entry is [36810, 58348, 406642781], sol xor is 406642781
# On test 3 of 5
# entry is [24777, 51769, 184547010], sol xor is 184547010
# On test 4 of 5
# entry is [84267, 96086, 286881875], sol xor is 286881875

# real 0m1.086s
# user 0m0.986s
# sys 0m0.100s

```

Original writeup (https://gist.github.com/mds2/c45f2f47b8f866d16496fc32eb09d5e2).