When the Facebook–Cambridge Analytica (CA) scandal burst into the headlines in March 2018, it cast a spotlight on tech companies’ uses and abuses of personal data (Cadwalladr & Graham-Harrison, 2018). For years, platforms such as Facebook and Google had been gobbling up user data, tracking the minutiae of our daily lives, and selling or sharing that information with companies that desired our attention. Whether these companies sought to influence our consumer choices or, like CA, impact our political behavior, they found a treasure trove of information with such specificity that it could reveal a user’s race, religion, wealth, partisanship, and physical and mental health status, among many other sensitive personal characteristics.
Feeling the weight of public scrutiny in the wake of this scandal, many of the platforms quickly moved to restrict access to what were perhaps the most generous and least scrutinized sources of digital data: their Application Programming Interfaces (APIs). The APIs allowed anyone with a few programming skills to gather massive volumes of data about a given platform’s users and content. And this included academics. From anthropology to psychology, economics to health science, scholars from a wide variety of disciplines relied on these APIs to gather large amounts of data for research into the content and behaviors found in digital spaces, and the CA-inspired restrictions significantly undermined multiple lines of research (Freelon, 2018; Hemsley, 2019).