Tags: misc context-free-grammar 

Rating: 3.0

## Grammar Nazi

> The flag is in format `dctf{7_4_2_3_7_4}` where numbers represent number of characters between underscores. For clarification: you will get the whole flag as a result, including the `dctf{}` part.

### Files

### What's going on?

We have `cfg.zip`, which contains 2134 folders. An example of the structure is shown below:


- Some folders contain only subfolders. Subfolders name is not random: subfolders have the same name of a folder from the upper layer.
- Some folders contain only `<letter>.txt` files.
- Some folders contain both `txt` files and subfolders.

Between challenge tags, we saw `context-free-grammar`. In order to make the solution more understandable, we briefly introduce what a context free grammar is.

### Context free grammar
A context free grammar is a formal grammar which is used to generate all possible strings in a given formal language. Sentences are not influenced by context, but only by a set of rules, which define the syntax and the structure of sentences in these languages.

Context free grammars have the following components:

- A set of **terminal symbols** which are the characters that appear in the language/strings generated by the grammar (e.g. characters, words etc). Terminal symbols never appear on the left-hand side of the production rule and are **always** on the right-hand side.

- A set of **non-terminal** symbols (or variables), which are placeholders for patterns of terminal symbols that can be generated by the non-terminal symbols. These are the symbols that will always appear on the left-hand side of the production rules, though they can be included on the right-hand side. The strings that a CFG produces will contain only symbols from the set of nonterminal symbols.

- A **set of production rules** which are the rules for replacing nonterminal symbols. Production rules have the following form: $variable \rightarrow string$ (of variables and terminals).

- A **start symbol** $S$, which is a special non-terminal symbol that appears in the initial string generated by the grammar.

To create a string from a context-free grammar, we follow these steps:

- Begin the string with a start symbol $S$.

- Apply one of the production rules to the start symbol on the left-hand side, by replacing the start symbol with the right-hand side of the production.

- Repeat the process of selecting nonterminal symbols in the string, and replacing them with the right-hand side of some corresponding production, _until all non-terminals have been replaced by terminal symbols_. Note, *it could be that not all production rules are used*.

#### Example
Given the following rules
S --> nounPhrase verbPhrase
nounPhrase --> adj noun
verbPhrase --> verb nounPhrase
adj --> the
noun --> monkey | banana
verb --> ate

We consider:

- Terminal symbols: `the`, `monkey`, `banana`, `ate`
- Non terminal symbols: `nounPhrase`, `verbPhrase`, `adj`, `noun`, `verb`
- Start symbol: `S`

We substitute in this way:
S --> adj noun verb nounPhrase
--> adj noun verb adj noun
--> the <monkey | banana> ate the <monkey | banana>
--> the monkey ate the banana
### Solution

We found out folders were non-terminal symbols, and files with letters and numbers with `.txt` inside some folders were terminal symbols. We understood we had to replace all the folders with their content. This action could be recursive in case of more subdirectories in directory.

First, we got the `cfg` folder tree:

tree . > tree.txt

Applying what we understand from Context Free Grammar:

- Start symbol is the `S` folder.
- Terminal symbols are `.txt` files, which will compose our flag.
- Non-terminal symbols are folders and subfolders.
- Rules consist in substituting all folders with their content, recursively.


- We can see `S` contains seven times `ZC` folders
- Flag format is `dctf{7_4_2_3_74}`, which has seven symbols (`{` `}` and five `_`)
- `ZC` contains `AMT/`, `HC/`, `XL/`

- `AMT -> }.txt` (the last `ZC` is `}`)
- `HC -> {.txt` (the first `ZC` is `{`)
- `XL -> _.txt` (the middle `ZC` are `_`)

- `CZ` will be `dctf` (warning: the characters are in the wrong order!)

And so on.

We just `Ctrl + F` on `tree.txt` file, ordered the characters, and got the flag:

> `dctf{c0nt3xt_fr33_15_n07_m34n1ng_fr33}`

### Learning sources

Thanks to the authors of these sources, who helped us understand better how context free grammar works:
- [https://brilliant.org/wiki/context-free-grammars/](https://brilliant.org/wiki/context-free-grammars/)
- [https://www.freecodecamp.org/news/context-free-grammar/](https://www.freecodecamp.org/news/context-free-grammar/)
- [https://shiffman.net/a2z/cfg/](https://shiffman.net/a2z/cfg/)

Original writeup (https://wiki-ulisse.fuo.fi/en/CTFs/Dragonsec-2022/Grammar-Nazi).