Dataset Notes

/ 2021-07-26


The Upworthy Research Archive #

The Upworthy Research Archive is an open dataset of thousands of A/B tests of headlines conducted by Upworthy from January 2013 to April 2015. In August 2021, this archive has been public and intended to promote the knowledge in polical science, communication, and other fields. This dataset can be used to develop large-scale online experiments research.

The Pushshift Reddit Dataset #

Pushshift is a social media data collection, analysis, and archiving platform that since 2015 has collected Reddit data and made it available to researchers. Pushshift’s Reddit dataset is updated in real-time, and includes historical data back to Reddit’s inception. In addition to monthly dumps, Pushshift provides computational tools to aid in searching, aggregating, and performing exploratory analysis on the entirety of the dataset. The Pushshift Reddit dataset makes it possible for social media researchers to reduce time spent in the data collection, cleaning, and storage phases of their projects.

What does the Reddit universe look like? #

Isaac Waller and Ashton Andreson (2019) made a data-driven map of Reddit communities to show the similarity of different communities on Reddit. Also they made a website to show the data visualization. That’s cool!

Last modified on 2021-07-26