Tags: python regex

Rating:

# Fall Classes
Author: syyntax
Points: 100
Category: SQL

## Problem description

Without counting duplicates, how many courses are being offered in the FALL2020
term at Shallow Grave University? \
Submit the flag in the following format:
flag{#}

Use the [file](https://tinyurl.com/yxv5qbla) from Address Book.

Max attempts: 10

## Concepts
* [Regular expression](https://en.wikipedia.org/wiki/Regular_expression)
* [Python3 regex](https://docs.python.org/3/howto/regex.html)

## Solution

We are given a database dump containing the database schema and instances.
The challenge category suggests solving the task using SQL, but I thought it would
be pretty straightforward to use regex on the database dump to get the number of courses
offered in the FALL2020 semester.

Examining the database dump (with line numbers), we see that courses and terms are stored in the
term_courses table:


303 CREATE TABLE term_courses (
304 term_crs_id int NOT NULL AUTO_INCREMENT,
305 course_id int NOT NULL,
306 term_id int NOT NULL,
307 instructor int NOT NULL,
(...)


The instances of this table are in the format:

324 INSERT INTO term_courses VALUES (1,6547,1,130),(2,6804,1,491),(3,6233,2,171)
(...)


To find courses offered in the FALL2020 term, we need to find out the term_id
of FALL2020.


\$ grep "FALL2020" shallowgraveu.sql
INSERT INTO terms VALUES (1,'SPRING2020','2020-04-06','2020-07-20','Spring semester 2020'),
(2,'FALL2020','2020-08-03','2020-11-20','Fall semester 2020');


So FALL2020 has term_id=2.

Now we can make a regex that matches against the instances of term_courses added
in line 324.

As a reminder, term_courses instances are in the format: \
(term_crs_id,course_id,term_id,instructor)

Therefore, the regex should match the following format: \
(num,num,2,num)

However, we should only extract pairs matching course_id,term_id (for this
purpose we will add a capture group to the regex).

We can use python to solve the challenge. Python has the re library for regex.
The re.findall function will be useful:


>>> import re
>>> help(re.findall)
Help on function findall in module re:

findall(pattern, string, flags=0)
Return a list of all non-overlapping matches in the string.

If one or more capturing groups are present in the pattern, return
a list of groups (...)


The following python code will help us to find the flag:

First, we import the re library, and read database dump line 324 (containing all
term_courses instances) into a variable, the_line:

>>> import re
>>>
>>> the_line=""
>>> with open("../shallowgraveu.sql") as f:
... for i, line in enumerate(f):
... if i == 323: # i is 0-indexed so we start at line 0 instead of 1
... the_line = line
... break


Next, we can use re.findall to match our regex against the_line.
Due to embedding course_id and term_id in parentheses in the regex (thus using a capture group),
the match variable will be a list, containing all matching 'course_id,term_id' pairs.

>>> match = re.findall("$$[0-9]{1,4},([0-9]{1,4},2),[0-9]{1,4}$$", the_line)
>>> print(match)
['6233,2', '6468,2', '6157,2', '6469,2', '6187,2', '5936,2', '6243,2', (...), ]


To remove duplicate pairs in the match variable, we transform it from a list to a set,
before printing the number of matching 'course_id,term_id' instances.


>>> len(set(match))
401


So now we know that 401 courses were offered in the FALL2020 semester.

Thus, according to the flag format, given in the description, the flag is \
flag{401}.

Original writeup (https://github.com/jeanettesa/ctf-writeups/blob/master/2020/HacktoberCTF/fall_classes/fall_classes.md).