Publications Hero

Publications

Publications

Publications

 
Cornell University

A Multi-Modal Method for Satire Detection using Textual and Visual Cues

Satire is a form of humorous critique, but it is sometimes misinterpreted by readers as legitimate news, which can lead to harmful consequences. We observe that the images used in satirical news articles often contain absurd or ridiculous content and that image manipulation is used to create fictional scenarios. While previous work have studied text-based methods, in this work we propose a multi-modal approach based on state-of-the-art visiolinguistic model ViLBERT. To this end, researchers created a new dataset consisting of images and headlines of regular and satirical news for the task of satire detection. The researchers also fine-tune ViLBERT on the dataset and train a convolutional neural network that uses an image forensics technique. Evaluation on the dataset shows that our proposed multi-modal approach outperforms image-only, text-only, and simple fusion baselines.
Springer

Validating Social Media Monitoring: Statistical Pitfalls and Opportunities from Public Opinion

Social media are a promising new data source for real-world behavioral monitoring. Despite clear advantages, analyses of social media data face some challenges. In this paper, we seek to elucidate some of these challenges and draw relevant lessons from more traditional survey techniques. Beyond standard machine learning approaches, we make the case that studies that conduct statistical analyses of social media data should carefully consider elements of study design, providing behavioral examples throughout. Specifically, we focus on issues surrounding the validity of statistical conclusions that may be drawn from social media data. We discuss common pitfalls and techniques to avoid these pitfalls, so researchers may mitigate potential problems of design.
Problems of Post Communism

Pandemic Politics in Eurasia: Roadmap for a New Research Subfield

The sudden onset of COVID-19 has challenged many social scientists to proceed without a robust theoretical and empirical foundation upon which to build. Addressing this challenge, particularly as it pertains to Eurasia, our multinational group of scholars draws on past and ongoing research to suggest a roadmap for a new pandemic politics research subfield. Key research questions include not only how states are responding to the new coronavirus, but also reciprocal interactions between the pandemic and society, political economy, regime type, center-periphery relations, and international security. The Foucauldian concept of “biopolitics” holds out particular promise as a theoretical framework.
The Disinformation Age

The Disinformation Age

The Disinformation Age is a book co-authored by Steven Livingston
AJPH

Facebook Pages, the “Disneyland” Measles Outbreak, and Promotion of Vaccine Refusal as a Civil Right, 2009–2019

The dynamics of health misinformation on Facebook pose a threat to vaccination programs. Social media exposure is theorized to amplify vaccine skepticism, exposing billions of users to misinformation about vaccines, increasing hesitancy and delay, eroding trust in health care providers and public health experts, and reducing vaccination rates, with repeated exposures potentially exacerbating this hesitancy.
AJPH

Adapting and Extending a Typology to Identify Vaccine Misinformation on Twitter

The rise in vaccine hesitancy—the delay and refusal of vaccines despite the availability of vaccination services—may be fueled, in part, by online claims that vaccines are ineffective, unnecessary, and dangerous. While opposition to vaccines is not new, these arguments have been reborn via new technologies that enable the spread of false claims with unprecedented ease, speed, and reach.
Cornell University logo

Not sure? Handling hesitancy of COVID-19 vaccines

From the moment the first COVID-19 vaccines are rolled out, there will need to be a large fraction of the global population ready in line. It is therefore crucial to start managing the growing global hesitancy to any such COVID-19 vaccine. The current approach of trying to convince the "no"s cannot work quickly enough, nor can the current policy of trying to find, remove and/or rebut all the individual pieces of COVID and vaccine misinformation. Instead, we show how this can be done in a simpler way by moving away from chasing misinformation content and focusing instead on managing the "yes--no--not-sure" hesitancy ecosystem.
Harvard Kennedy School Misinformation Review

Not just conspiracy theories: Vaccine opponents and pro-ponents add to the COVID-19 ‘infodemic’ on Twitter

In February 2020, the World Health Organization announced an ‘infodemic’ — a deluge of both accurate and inaccurate health information — that accompanied the global pandemic of COVID-19 as a major challenge to effective health communication.
Science Direct logo

Unifying casualty distributions within and across conflicts

The distribution of whole war sizes and the distribution of event sizes within individual wars, can both be well approximated by power laws where size is measured by the number of fatalities. However the power-law exponent value for whole wars has a substantially smaller magnitude – and hence a flatter distribution – than for individual wars.
Cornell University logo

Covid-19 infodemic reveals new tipping point epidemiology and a revised R formula

Many governments have managed to control their COVID-19 outbreak with a simple message: keep the effective 'R number' R<1 to prevent widespread contagion and flatten the curve. This raises the question whether a similar policy could control dangerous online 'infodemics' of information, misinformation and disinformation.
Cornell University logo

Hidden order in online extremism and its disruption by nudging collective chemistry

Here researchers show that the eclectic "Boogaloo" extremist movement that is now rising to prominence in the U.S., has a hidden online mathematical order that is identical to ISIS during its early development, despite their stark ideological, geographical and cultural differences. The evolution of each across scales follows a single shockwave equation that accounts for individual heterogeneity in online interactions. This equation predicts how to disrupt the onset and 'flatten the curve' of such online extremism by nudging its collective chemistry.
Cornell University logo

The COVID-19 Social Media Infodemic Reflects Uncertainty and State-Sponsored Propaganda

Significant attention has been devoted to determining the credibility of online misinformation about the COVID-19 pandemic on social media. Here, researchers compare the credibility of tweets about COVID-19 to datasets pertaining to other health issues.
Cornell University logo

Iterative Effect-Size Bias in Ridehailing: Measuring Social Bias in Dynamic Pricing of 100 Million Rides

Algorithmic bias is the systematic preferential or discriminatory treatment of a group of people by an artificial intelligence system. In this work, researchers developed a random-effects based metric for the analysis of social bias in supervised machine learning prediction models where model outputs depend on U.S. locations. They defined a methodology for using U.S. Census data to measure social bias on user attributes legally protected against discrimination, such as ethnicity, sex, and religion, also known as protected attributes.
Cornell University logo

Content analysis of Persian/Farsi Tweets during COVID-19 pandemic in Iran using NLP

Iran, along with China, South Korea, and Italy was among the countries that were hit hard in the first wave of the COVID-19 spread. Twitter is one of the widely-used online platforms by Iranians inside and abroad for sharing their opinion, thoughts, and feelings about a wide range of issues. In this study, using more than 530,000 original tweets in Persian/Farsi on COVID-19, we analyzed the topics discussed among users, who are mainly Iranians, to gauge and track the response to the pandemic and how it evolved over time.
IEEE Xplore logo

Quantifying COVID-19 content in the online health opinion war using machine learning

A huge amount of potentially dangerous COVID-19 misinformation is appearing online. Here we use machine learning to quantify COVID-19 content among online opponents of establishment health guidance, in particular vaccinations ("anti-vax"). We find that the anti-vax community is developing a less focused debate around COVID-19 than its counterpart, the pro-vaccination (“pro-vax”) community. However, the anti-vax community exhibits a broader range of “flavors” of COVID-19 topics, and hence can appeal to a broader cross-section of individuals seeking COVID-19 guidance online, e.g. individuals wary of a mandatory fast-tracked COVID-19 vaccine or those seeking alternative remedies.
Nature logo

The online competition between pro- and anti-vaccination views

Distrust in scientific expertise is dangerous. Opposition to vaccination with a future vaccine against SARS-CoV-2, the causal agent of COVID-19, for example, could amplify outbreaks as happened for measles in 2019. Homemade remedies and falsehoods are being shared widely on the Internet, as well as dismissals of expert advice. There is a lack of understanding about how this distrust evolves at the system level. Here we provide a map of the contention surrounding vaccines that has emerged from the global pool of around three billion Facebook users. Its core reveals a multi-sided landscape of unprecedented intricacy that involves nearly 100 million individuals partitioned into highly dynamic, interconnected clusters across cities, countries, continents and languages. Although smaller in overall size, anti-vaccination clusters manage to become highly entangled with undecided clusters in the main online network, whereas pro-vaccination clusters are more peripheral.
OSF Preprints logo

Misinformation on the Facebook News Feed: Experimental Evidence

As concerns about the spread of misinformation have mounted, scholars have found that fact-checks can reduce the extent to which people believe misinformation. Whether this finding extends to social media is unclear. Social media is a high-choice environment in which the cognitive effort required to separate truth from fiction, individuals' penchant for select exposure, and motivated reasoning may render fact checks ineffective. Furthermore, large social media companies have not permitted external researchers to administer experiments on their platforms. To investigate whether fact-checking can rebut misinformation on social media, we administer two experiments using a novel platform designed to closely mimic Facebook's news feed.
Cornell University logo

Detecting East Asian Prejudice on Social Media

The outbreak of COVID-19 has transformed societies across the world as governments tackle the health, economic and social costs of the pandemic. It has also raised concerns about the spread of hateful language and prejudice online, especially hostility directed against East Asia. In this paper we report on the creation of a classifier that detects and categorizes social media posts from Twitter into four classes: Hostility against East Asia, Criticism of East Asia, Meta-discussions of East Asian prejudice and a neutral class. The classifier achieves an F1 score of 0.83 across all four classes. We provide our final model (coded in Python), as well as a new 20,000 tweet training dataset used to make the classifier, two analyses of hashtags associated with East Asian prejudice and the annotation codebook. The classifier can be implemented by other researchers, assisting with both online content moderation processes and further research into the dynamics, prevalence and impact of East Asian prejudice online during this global pandemic.
Conrell University logo

Hate multiverse spreads malicious COVID-19 content online beyond individual platform control

We show that malicious COVID-19 content, including hate speech, disinformation, and misinformation, exploits the multiverse of online hate to spread quickly beyond the control of any individual social media platform. Machine learning topic analysis shows quantitatively how online hate communities are weaponizing COVID-19, with topics evolving rapidly and content becoming increasingly coherent. Our mathematical analysis provides a generalized form of the public health R0 predicting the tipping point for multiverse-wide viral spreading, which suggests new policy options to mitigate the global spread of malicious COVID-19 content without relying on future coordination between all online platforms.
Conrell University logo

Pro-Russian Biases in Anti-Chinese Tweets about the Novel Coronavirus

The recent COVID-19 pandemic, which was first detected inWuhan, China, has been linked to increased anti-Chinese sentiment in the United States. Recently, Broniatowski et al. found that foreign powers, and especially Russia, were implicated in information operations using public health crises to promote discord– including racial conflict – in American society (Broniatowski et al., 2018).
Conrell University logo

The Twitter Social Mobility Index: Measuring Social Distancing Practices from Geolocated Tweets

Social distancing is an important component of the response to the novel Coronavirus (COVID-19) pandemic. Minimizing social interactions and travel reduces the rate at which the infection spreads, and "flattens the curve" such that the medical system can better treat infected individuals. However, it remains unclear how the public will respond to these policies. This paper presents the Twitter Social Mobility Index, a measure of social distancing and travel derived from Twitter data. We use public geolocated Twitter data to measure how much a user travels in a given week. We find a large reduction in travel in the United States after the implementation of social distancing policies, with larger reductions in states that were early adopters and smaller changes in states without policies. Our findings are presented on this http URL and we will continue to update our analysis during the pandemic.
Science Direct logo

Chinese social media suggest decreased vaccine acceptance in China: An observational study on Weibo following the 2018 Changchun Changsheng vaccine incident

China is home to the world’s largest population, with the potential for disease outbreaks to affect billions. However, knowledge of Chinese vaccine acceptance trends is limited. In this work we use Chinese social media to track responses to the recent Changchun Changsheng Biotechnology vaccine scandal, which led to extensive discussion regarding vaccine safety and regulation in China. We analyzed messages from the popular Chinese microblogging platform Sina Weibo in July 2018 (n = 11,085), and August 2019 (n = 500). Thus, we consider Chinese vaccine acceptance, before, during, immediately after, and one year after the scandal occurred. Results show that expressions of distrust in government pertaining to vaccines increased significantly during and immediately after the scandal. Self-reports of vaccination occurred both before, and one year after, the scandal; however, these self-reports changed from positive endorsements of vaccination to concerns about vaccine harms. Data suggest that expressed support for vaccine acceptance in China may be decreasing.
Cornell University

Machines Learn Appearance Bias in Face Recognition

We seek to determine whether state-of-the-art, black box face recognition techniques can learn first-impression appearance bias from human annotations. With FaceNet, a popular face recognition architecture, we train a transfer learning model on human subjects' first impressions of personality traits in other faces. We measure the extent to which this appearance bias is embedded and benchmark learning performance for six different perceived traits. In particular, we find that our model is better at judging a person's dominance based on their face than other traits like trustworthiness or likeability, even for emotionally neutral faces. We also find that our model tends to predict emotions for deliberately manipulated faces with higher accuracy than for randomly generated faces, just like a human subject. Our results lend insight into the manner in which appearance biases may be propagated by standard face recognition models.
Science Direct logo

A computational science approach to understanding human conflict

These findings show that a unified computational science framework can be used to understand and quantitatively describe collective human conflict.
SSRN

Surfacing the Submerged State: Operational Transparency Increases Trust in and Engagement with Government

As trust in government reaches historic lows, frustration with government performance approaches record highs. Academic/practical relevance: We propose that in co-productive settings like government services, people’s trust and engagement levels can be enhanced by designing service interactions to allow them to see the often-hidden work – via increasing operational transparency – being performed in response to their engagement. Methodology and results: Three studies, conducted in the field and lab, show that surfacing the “submerged state” through operational transparency impacts citizens’ attitudes and behavior.
Science Direct logo

Vaccine-related advertising in the Facebook Ad Archive

In 2018, Facebook introduced Ad Archive as a platform to improve transparency in advertisements related to politics and “issues of national importance.” Vaccine-related Facebook advertising is publicly available for the first time. After measles outbreaks in the US brought renewed attention to the possible role of Facebook advertising in the spread of vaccine-related misinformation, Facebook announced steps to limit vaccine-related misinformation. This study serves as a baseline of advertising before new policies went into effect.
ACM Digital Library logo

GraphOne: A Data Store for Real-time Analytics on Evolving Graphs

There is a growing need to perform a diverse set of real-time analytics (batch and stream analytics) on evolving graphs to deliver the values of big data to users. The key requirement from such applications is to have a data store to support their diverse data access efficiently, while concurrently ingesting fine-grained updates at a high velocity. Unfortunately, current graph systems, either graph databases or analytics engines, are not designed to achieve high performance for both operations; rather, they excel in one area that keeps a private data store in a specialized way to favor their operations only. To address this challenge, we have designed and developed GraphOne, a graph data store that abstracts the graph data store away from the specialized systems to solve the fundamental research problems associated with the data store design.
ACL Anthology logo

Identifying Nuances in Fake News vs. Satire: Using Semantic and Linguistic Cues

The blurry line between nefarious fake news and protected-speech satire has been a notorious struggle for social media platforms. Further to the efforts of reducing exposure to misinformation on social media, purveyors of fake news have begun to masquerade as satire sites to avoid being demoted. In this work, we address the challenge of automatically classifying fake news versus satire. Previous work have studied whether fake news and satire can be distinguished based on language differences. Contrary to fake news, satire stories are usually humorous and carry some political or social message. We hypothesize that these nuances could be identified using semantic and linguistic cues. Consequently, we train a machine learning method using semantic representation, with a state-of-the-art contextual language model, and with linguistic features based on textual coherence metrics.
Journal of Peace Research logo

Crimea come what may: Do economic sanctions backfire politically?

Do international economic sanctions backfire politically, resulting in increased rather than decreased domestic support for targeted state leaders? Backfire arguments are common, but researchers have only recently begun systematically studying sanctions’ impact on target-state public opinion, not yet fully unpacking different possible backfire mechanisms. We formulate backfire logic explicitly, distinguishing between ‘scapegoating’ and ‘rallying’ mechanisms and considering the special case of ‘smart sanctions’ aimed at crony elites rather than the masses. We test five resulting hypotheses using an experimental design and pooled survey data spanning the imposition of sanctions in one of the most substantively important cases where the backfire argument has been prominent: Western sanctions on Russia in 2014. We find no evidence of broad sanctions backfire. Instead, sanctions have forced Russia’s president to pay a political price.
Cornell University

Health Wars and Beyond: The Rapidly Expanding and Efficient Network Insurgency Interlinking Local and Global Online Crowds of Distrust

We present preliminary results on the online war surrounding distrust of expertise in medical science -- specifically, the issue of vaccinations. While distrust and misinformation in politics can damage democratic elections, in the medical context it may also endanger lives through missed vaccinations and DIY cancer cures. We find that this online health war has evolved into a highly efficient network insurgency with direct inter-crowd links across countries, continents and cultures. The online anti-vax crowds (referred to as Red) now appear better positioned to groom new recruits (Green) than those supporting established expertise (Blue). We also present preliminary results from a mathematically-grounded, crowd-based analysis of the war's evolution, which offers an explanation for how Red seems to be turning the tide on Blue.
False Alarm cover

False Alarm: The Truth about Political Mistruths in the Trump Era

False Alarm is a book authored by Ethan Porter.
Project Muse

Framing and Strategic Narratives: Synthesis and Analytical Framework

Standard journalistic tropes were of little use after the 2016 US elections. Talk of blue and red states was eclipsed by distressed discussions (or out-of-hand dismissals) of Russian troll farms, bots, and hacked DNC email servers. The new president's peculiar obsession with the size of his inaugural crowd and his baseless claims of massive voter fraud only deepened the sense that America had awoken to both a new president and an altered sense of reality.
Jama Pediatrics logo

Government Role in Regulating Vaccine Misinformation on Social Media Platforms

Freedom of speech is one of the most fundamental rights in the United States. The challenges of balancing free speech against harms caused by misinformation on social media are well illustrated by antivaccine activists, who claim that vaccines cause death or other harmful adverse effects against the evidence. These activists use social media platforms, such as Facebook and Twitter, to share misleading information supporting their views on vaccines. In fact, half of all parents with children younger than 5 years have been exposed to misinformation about vaccines on social media.1 A 2019 study1 even found that neutral searches of the word vaccine by a new user with no friends or likes yielded overwhelmingly antivaccine content unsupported by science both on Facebook and YouTube.
First Monday logo

Elites and foreign actors among the alt-right: The Gab social media platform

Content regulation and censorship of social media platforms is increasingly discussed by governments and the platforms themselves. To date, there has been little data-driven analysis of the effects of regulated content deemed inappropriate on online user behavior. We therefore compared Twitter — a popular social media platform that occasionally removes content in violation of its Terms of Service — to Gab — a platform that markets itself as completely unregulated.
Journal of Digital Social Research logo

In Search of Meaning: Why We Still Don't Know What Digital Data Represent

In the early years, researchers greeted the internet and digital data with almost wide-eyed wonder and excitement. The opportunities provided by digital media such as websites, bulletin boards, and blogs—and later by social media platforms and mobile apps—seemed nearly endless, and researchers were suddenly awash in data. The bounty was so great that it required new methods for processing, organizing, and analysis. Yet in all the excitement, it seems that the digital research community largely lost sight of something fundamental: a sense of what all these data actually represent.
Post Soviet Affairs logo

A surprising connection between civilizational identity and succession expectations among Russian elites

We know from prior research that non-democratic regimes can become vulnerable when elites anticipate succession at the top, but we know little about what shapes these elites’ expectations. This study examines connections between such expectations and Russia’s relationships to the outside world. Analysis of elite opinion data from the 2016 Survey of Russian Elites reveals strong associations between identifying Russia with European civilization and expecting Russian politics to display behaviors more like those believed to characterize European polities, including more frequent dominant party turnover. Elites appear not to expect their top political leadership to pay a political price for what they perceive as foreign policy blunders in a consistent way, though opposition elites critical of Russia’s actions in Ukraine are found to expect an earlier United Russia Party exit. Variations in threat perceptions are not found to influence predictions of leadership tenure.
IEEE Xplore logo

Securing Malware Cognitive Systems against Adversarial Attacks

The cognitive systems along with the machine learning techniques have provided significant improvements for many applications. However, recent adversarial attacks, such as data poisoning, evasion attacks, and exploratory attacks, have shown to be able to either cause the machine learning methods to misbehave, or leak sensitive model parameters. In this work, we have devised a prototype of a malware cognitive system, called DeepArmour, which performs robust malware classification against adversarial attacks. At the heart of our method is a voting system with three different machine learning malware classifiers: random forest, multi-layer perceptron, and structure2vec. In addition, DeepArmour applies several adversarial countermeasures, such as feature reconstruction and adversarial retraining to strengthen the robustness.
Springer logo

To illuminate and motivate: a fuzzy-trace model of the spread of information online

We propose, and test, a model of online media platform users’ decisions to act on, and share, received information. Specifically, we focus on how mental representations of message content drive its spread. Our model is based on fuzzy-trace theory (FTT), a leading theory of decision under risk. Per FTT, online content is mentally represented in two ways: verbatim (objective, but decontextualized, facts), and gist (subjective, but meaningful, interpretation). Although encoded in parallel, gist tends to drive behaviors more strongly than verbatim representations for most individuals. Our model uses factors derived from FTT to make predictions regarding which content is more likely to be shared, namely: (a) different levels of mental representation, (b) the motivational content of a message, (c) difficulty of information processing (e.g., the ease with which a given message may be comprehended and, therefore, its gist extracted), and (d) social values.
Springer logo

Helping the Homeless: The Role of Empathy, Race and Deservingness in Motivating Policy Support and Charitable Giving

What will motivate citizens to support efforts to help those in need? Charitable organizations seeking support for their cause will often use the story of a specific individual to illustrate the problem and generate support. We explore the effectiveness of this strategy using the issue of homelessness. Specifically, we examine the role that the race of beneficiaries featured in a message, and the inclusion of deservingness cues highlighting external attributions for an individual’s homelessness have on willingness to donate to the homeless and support government efforts to address homelessness. Utilizing two experiments with a nationally representative probability sample and an online opt-in quota sample, we find significant effects of deservingness information on expressions of sympathy, and on support for government efforts to address homelessness when viewing individuals from one’s own racial group.
Research and Politics logo

Can presidential misinformation on climate change be corrected? Evidence from Internet and phone experiments

Can presidential misinformation affect political knowledge and policy views of the mass public, even when that misinformation is followed by a fact-check? We present results from two experiments, conducted online and over the telephone, in which respondents were presented with Trump misstatements on climate change. While Trump’s misstatements on their own reduced factual accuracy, corrections prompted the average subject to become more accurate. Republicans were not as affected by a correction as their Democratic counterparts, but their factual beliefs about climate change were never more affected by Trump than by the facts. In neither experiment did corrections affect policy preferences. Debunking treatments can improve factual accuracy even among co-partisans subjected to presidential misinformation. Yet an increase in climate-related factual accuracy does not sway climate-related attitudes. Fact-checks can limit the effects of presidential misinformation, but have no impact on the president’s capacity to shape policy preferences.
ACL Anthology logo

Challenges and frontiers in abusive content detection

Online abusive content detection is an inherently difficult task. It has received considerable attention from academia, particularly within the computational linguistics community, and performance appears to have improved as the field has matured. However, considerable challenges and unaddressed frontiers remain, spanning technical, social and ethical dimensions. These issues constrain the performance, efficiency and generalizability of abusive content detection systems. In this article we delineate and clarify the main challenges and frontiers in the field, critically evaluate their implications and discuss potential solutions. We also highlight ways in which social scientific insights can advance research. We discuss the lack of support given to researchers working with abusive content and provide guidelines for ethical research.

How Should We Now Conceptualize Protest, Diffusion, and Regime Change?

Brancati and Lucardi’s findings on the absence of “democracy protest” diffusion across borders raise important questions for the future of protest studies. I argue that this subfield would benefit from a stronger engagement with theory (in general) and from a “patronal politics” perspective (in particular) when it comes to researching protest in non-democratic regimes. This means curtailing a widespread practice of linking the study of protest with the study of democratization, questioning the dominant “contentious politics” framework as commonly conceptualized, and instead focusing more on the central role of patronal network coordination dynamics (especially elite splits) in driving both protest and the potential for regime change. This perspective emphasizes the role of domestically generated succession expectations and public opinion in generating the most meaningful elite splits, and reveals how protests can be important instruments in the resulting power struggles among rival networks.
Usenix logo

SIMD-X: Programming and Processing of Graph Algorithms on GPUs

With high computation power and memory bandwidth, graphics processing units (GPUs) lend themselves to accelerate data-intensive analytics, especially when such applications fit the single instruction multiple data (SIMD) model. However, graph algorithms such as breadth-first search and k-core, often fail to take full advantage of GPUs, due to irregularity in memory access and control flow. To address this challenge, we have developed SIMD-X, for programming and processing of single instruction multiple, complex, data on GPUs.
Sciendo

Git Blame Who?: Stylistic Authorship Attribution of Small, Incomplete Source Code Fragments

Program authorship attribution has implications for the privacy of programmers who wish to contribute code anonymously. While previous work has shown that individually authored complete files can be attributed, these efforts have focused on such ideal data sets as contest submissions and student assignments. We explore the problem of authorship attribution “in the wild,” examining source code obtained from open-source version control systems, and investigate how contributions can be attributed to their authors, either on an individual or a per-account basis.
Cornell University

How we do things with words: Analyzing text as social and cultural data

In this article we describe our experiences with computational text analysis. We hope to achieve three primary goals. First, we aim to shed light on thorny issues not always at the forefront of discussions about computational text analysis methods. Second, we hope to provide a set of best practices for working with thick social and cultural concepts. Our guidance is based on our own experiences and is therefore inherently imperfect. Still, given our diversity of disciplinary backgrounds and research practices, we hope to capture a range of ideas and identify commonalities that will resonate for many. And this leads to our final goal: to help promote interdisciplinary collaborations. Interdisciplinary insights and partnerships are essential for realizing the full potential of any computational text analysis that involves social and cultural concepts, and the more we are able to bridge these divides, the more fruitful we believe our work will be.
Cambridge University Press

The Future Impact of Artificial Intelligence on Humans and Human Rights

What are the implications of artificial intelligence (AI) on human rights in the next three decades? Precise answers to this question are made difficult by the rapid rate of innovation in AI research and by the effects of human practices on the adaption of new technologies. Precise answers are also challenged by imprecise usages of the term “AI.”
Political Communication logo

You Break It, You Buy It: The Naiveté of Social Engineering in Tech – And How to Fix It

Facebook is not alone in espousing social goals. Twitter aims to promote conversation – their core values statement reads, “We believe in free expression and think every voice has the power to impact the world” (Twitter Values, n.d.). WhatsApp, responsible for spreading viral misinformation that has led to mob killings in India (Goel, Raj, & Ravichandran, 2018), claims that “behind every product decision is our desire to let people communicate anywhere in the world without barriers” (WhatsApp, n.d.). Google, perhaps most ubiquitous of all, is guided by the principle “to organize the world’s information and make it universally accessible and useful” (Google, n.d.).
SSRN

The (Mis)Informed Citizen: Indicators for Examining the Quality of Online News

Current discourses about the spread of misinformation tend to juxtapose misinformation with "quality" news — assuming that one is the opposite of the other and suggesting that, if we simply ensure people have better access to quality information, the harmful effects of misinformation will be mitigated.
American Public Health Association logo

Malicious Actors on Twitter: A Guide for Public Health Researchers

Social bots and other malicious actors have a significant presence on Twitter. It is increasingly clear that some of their activities can have a negative impact on public health. This guide provides an overview of the types of malicious actors currently active on Twitter by highlighting the characteristic behaviors and strategies employed. It covers both automated accounts (including traditional spambots, social spambots, content polluters, and fake followers) and human users (primarily trolls). It also addresses the unique threat of state-sponsored trolls. We utilize examples from our own research on vaccination to illustrate.
Journal of Medical Internet Research logo

Characterizing Trends in Human Papillomavirus Vaccine Discourse on Reddit (2007-2015): An Observational Study

Despite the introduction of the human papillomavirus (HPV) vaccination as a preventive measure in 2006 for cervical and other cancers, uptake rates remain suboptimal, resulting in preventable cancer mortality. Social media, widely used for information seeking, can influence users’ knowledge and attitudes regarding HPV vaccination. Little is known regarding attitudes related to HPV vaccination on Reddit (a popular news aggregation site and online community), particularly related to cancer risk and sexual activity. Examining HPV vaccine–related messages on Reddit may provide insight into how HPV discussions are characterized on forums online and influence decision making related to vaccination.
Policy Insights for the Behavioral and Brian Sciences logo

Communicating Meaning in the Intelligence Enterprise

Intelligence community experts face challenges communicating the results of analysis products to policy makers. Given the high-stakes nature of intelligence analyses, the consequences of misinformation may be dire, potentially leading to costly, ill-informed policies or lasting damage to national security. Much is known regarding how to effectively communicate complex analysis products to policy makers possessing different sources of expertise. Fuzzy-Trace Theory, an empirically-validated psychological account of how decision makers derive meaning from complex stimuli, emphasizes the importance of communicating the essential bottom-line of an analysis (“gist”), in parallel with precise details (“verbatim”). Verbatim details can be prone to misinterpretation when presented out of context.
BMJ Open logo

Can online self-reports assist in real-time identification of influenza vaccination uptake? A cross-sectional study of influenza vaccine-related tweets in the USA, 2013–2017

The Centers for Disease Control and Prevention (CDC) spend significant time and resources to track influenza vaccination coverage each influenza season using national surveys. Emerging data from social media provide an alternative solution to surveillance at both national and local levels of influenza vaccination coverage in near real time.
Political Communication logo

Identifying Media Effects Through Low-Cost, Multiwave Field Experiments

Field experiments are notoriously difficult to implement when studying media effects. They are often prohibitively expensive, require the cooperation of a nonacademic entity, and measure effects some time after exposure to treatment. In this article, we outline a design for low-cost, multiwave field experiments of media effects. Researchers can implement this design on their own and can control the timing of when they measure effects. We demonstrate the feasibility of the design with an application to the study of presidential debates.
European Journal of Communication logo

The disinformation order: Disruptive communication and the decline of democratic institutions

Many democratic nations are experiencing increased levels of false information circulating through social media and political websites that mimic journalism formats. In many cases, this disinformation is associated with the efforts of movements and parties on the radical right to mobilize supporters against centre parties and the mainstream press that carries their messages. The spread of disinformation can be traced to growing legitimacy problems in many democracies.