4CAT: Capture and Analysis Toolkit is a tool that can be used to scrape incoming posts on various forums, image boards and web platforms (including Reddit, Telegram, and 4chan) and then process them for further analysis.
It allows one to query a corpus of posts selectively by keyword, date range or other criteria, and then output the results for further analysis. The tool was inspired by the
TCAT, a tool with comparable functionality that can be used to scrape and analyse Twitter data.
Currently, access to DMI's 4CAT instance is restricted. Collaborators can request an account via the instance. It is also possible (and quite straightforward) to install 4CAT for yourself with Docker. The
GitHub repository has more instructions. We also have a
playlist of video tutorials.