Social media are a promising new data source for real-world behavioral monitoring. Despite clear advantages, analyses of social media data face some challenges. In this paper, we seek to elucidate some of these challenges and draw relevant lessons from more traditional survey techniques. Beyond standard machine learning approaches, we make the case that studies that conduct statistical analyses of social media data should carefully consider elements of study design, providing behavioral examples throughout. Specifically, we focus on issues surrounding the validity of statistical conclusions that may be drawn from social media data. We discuss common pitfalls and techniques to avoid these pitfalls, so researchers may mitigate potential problems of design.
Validating Social Media Monitoring: Statistical Pitfalls and Opportunities from Public Opinion
October 11, 2020