Tags: python regex
Rating:
Author: syyntax Points: 100 Category: SQL
Without counting duplicates, how many courses are being offered in the FALL2020
term at Shallow Grave University?
Submit the flag in the following format:
flag{#}
Use the file from Address Book.
Max attempts: 10
We are given a database dump containing the database schema and instances. The challenge category suggests solving the task using SQL, but I thought it would be pretty straightforward to use regex on the database dump to get the number of courses offered in the FALL2020 semester.
Examining the database dump (with line numbers), we see that courses and terms are stored in the
term_courses
table:
303 CREATE TABLE `term_courses` (
304 `term_crs_id` int NOT NULL AUTO_INCREMENT,
305 `course_id` int NOT NULL,
306 `term_id` int NOT NULL,
307 `instructor` int NOT NULL,
(...)
The instances of this table are in the format:
324 INSERT INTO `term_courses` VALUES (1,6547,1,130),(2,6804,1,491),(3,6233,2,171)
(...)
To find courses offered in the FALL2020 term, we need to find out the term_id
of FALL2020.
$ grep "FALL2020" shallowgraveu.sql
INSERT INTO `terms` VALUES (1,'SPRING2020','2020-04-06','2020-07-20','Spring semester 2020'),
(2,'FALL2020','2020-08-03','2020-11-20','Fall semester 2020');
So FALL2020 has term_id=2
.
Now we can make a regex that matches against the instances of term_courses
added
in line 324.
As a reminder, term_courses
instances are in the format:
(term_crs_id,course_id,term_id,instructor)
Therefore, the regex should match the following format:
(num,num,2,num)
However, we should only extract pairs matching course_id,term_id
(for this
purpose we will add a capture group to the regex).
We can use python to solve the challenge. Python has the re
library for regex.
The re.findall
function will be useful:
>>> import re
>>> help(re.findall)
Help on function findall in module re:
findall(pattern, string, flags=0)
Return a list of all non-overlapping matches in the string.
If one or more capturing groups are present in the pattern, return
a list of groups (...)
The following python code will help us to find the flag:
First, we import the re
library, and read database dump line 324 (containing all
term_courses instances) into a variable, the_line
:
>>> import re
>>>
>>> the_line=""
>>> with open("../shallowgraveu.sql") as f:
... for i, line in enumerate(f):
... if i == 323: # i is 0-indexed so we start at line 0 instead of 1
... the_line = line
... break
Next, we can use re.findall
to match our regex against the_line
.
Due to embedding course_id and term_id in parentheses in the regex (thus using a capture group),
the match variable will be a list, containing all matching 'course_id,term_id'
pairs.
>>> match = re.findall("\([0-9]{1,4},([0-9]{1,4},2),[0-9]{1,4}\)", the_line)
>>> print(match)
['6233,2', '6468,2', '6157,2', '6469,2', '6187,2', '5936,2', '6243,2', (...), ]
To remove duplicate pairs in the match
variable, we transform it from a list to a set,
before printing the number of matching 'course_id,term_id'
instances.
>>> len(set(match))
401
So now we know that 401 courses were offered in the FALL2020 semester.
Thus, according to the flag format, given in the description, the flag is
flag{401}
.