Rating: 5.0

# MetaCTF 2021
## Online Presence (550pts)
### Description:
>Sources say Vedder may have used Reddit the past two weeks. It would make sense for him to post some comments on some major subreddits related to his main career interests or near his local area. Can you help find his account?
The answer format is the username followed by the post ID and comment ID in the URL to their shortest comment, underscore delimited. For example, for [this comment](https://www.reddit.com/r/AskReddit/comments/r5mmdj/comment/hmnszk4/), the answer would be MetaCTF{Specialistimran_r5mmdj_hmnszk4}.
Hint: There's a logical way to solve this challenge that doesn't require much time. There are two major (both have 500k+ subscribers) work-related subreddits.

### Step 1 - Baseline OSINT:
We need to dig up some information on Vedder himself, his interests, and his career to begin to piece together where he could be on Reddit.
By doing some quick research we can find a Twitter account and a personal website:
>https://veddercasyn.me/

>https://twitter.com/veddercasyn

Knowing he's in Hammond, IN, who's closest metro area is Chicago, and his interests in fitness and nutrition, we can begin to piece together some places he might be.

### Step 2 - Data Collection:
We built out a list of the most common subreddits using this site, sort by 'SFW':
>https://frontpagemetrics.com/top-sfw-subreddits

We figured in a challenge like this, Vedder would not be posting in NSFW subreddits and his presence would be relatively clean.
We filtered the subreddits that had more than 500k subscribers, but kept some that were very pertinent, like /r/Chicago.

With a list of the subreddits, we could now either triangulate him manually... which would take forever, or use a nifty tool called [Google Colaboratory](https://research.google.com/colaboratory/) for doing 'Big Data' queries. (Not such big data in this case)

### Step 3 - Triangulation with Google Colab
First, we need to get the Reddit API Wrapper set up with Google Colab to crawl these subreddits. Set up a Reddit account and fill in the API:
```python
pip install praw
---------------------------------------------------------------
import praw

reddit = praw.Reddit(
user_agent="Comment Extraction (by u/AccountYouSetUpToFindVedder)",
client_id="x",
client_secret="y",
username="AccountYouSetUpToFindVedder",
password="PasswordYouSetUpToFindVedder",
check_for_async=False
)
```
Then we have can gather submissions from subreddits and store usernames into an array. We began by collecting usernames from the last 150 fitness posts:
```python
submissions = reddit.subreddit("fitness").new(limit=150)
cusernames1 = []

i = 1
for submission in submissions:
redditor1 = submission.author
print("~~~~~~~~~~~~~~~Submission " + str(i) + " ID= " + submission.id + " Title: " + submission.title + "~~~~~~~~~~~~~~~")
print("Posted by:" + str(redditor1.name))
submission.comments.replace_more(limit=None)
i=i+1
for comment in submission.comments.list():
cusernames1.append(comment.author)
print("ID: " + comment.id + "; " + str(comment.author) + ": " + comment.body)
print("\n")
```

The code block above was then repurposed for each subreddit we wanted to search, changing out the `cusernames1` variable with a new one to separate our lists of posters.
After running it on a few different subreddits, we could determine who were the common posters between two subreddits using the following codeblock.

```
cu1set = set(cusernames1)
set(cuset2).intersection(cu1set)
```

This is basically all of the code required, we just need to try different combinations of subreddits and find the users that are posting comments in each one. Store the users in different arrays and intersect those sets to filter it down.

See the entire script that was ran (with comments along the way!) over on [Google Colab](https://colab.research.google.com/drive/1IhywVzP5E_ifFLWg_WkgkmqhrKD-LyVp?usp=sharing).

### Step 4 - Some Hits!
When we were filtering between /r/bodybuilding, /r/fitness, and other health-related subreddits, there were tons of users in the list to go through. Who would've thought people who comment in one subreddit comment in another related one?
We decided it needed to be filtered down to the metro area, since we were getting very few users who posted in a city and a health subreddit.
Eventually, after intersecting /r/fitness and /r/Indiana, we had only two users to look at:
>{Redditor(name='Dptwin'), None, Redditor(name='dtdnumberone')}

Dptwin questionable content, but dtdnumberone was enticing. It's reddit birthday was Nov 30, 5 days before this challenge. It had very few posts, and all were within fitness, indiana, chicago, chicagosuburbs, and related subreddits.
We were confident this was Vedder, and now just had to fulfill the flag formatting.

### Step 5 - Generate the Flag
The flag was stipulated to be the username, then post ID, then comment ID of his shortest comment.
That comment would be:



![image](https://user-images.githubusercontent.com/43623870/144759695-a6a5c992-75d0-45b4-abf9-0c08008a854f.png)

Going to the exact comment URL gives us the following information:
```
Username: dtdnumberone
Post ID: r4rant
Comment ID: hmqpa52
```
Therefore our flag is
<details>
<summary>Flag Spoiler:</summary>

MetaCTF{dtdnumberone_r4rant_hmqpa52}
</details>

## Learning Takeaways
We learned how to use the Reddit PRAW API for crawling comments and subreddits, and how to filter that data. Correlative logic skills were improved when thinking about where he would be posting.

Original writeup (https://github.com/team23ctf/writeups/blob/main/metactf2021/Online%20Presence/Online%20Presence.md).