Some new research on Big Data, from different sources, sheds new light on issues raised in our previous article, The Big Data Debate: “N=All” or “Complete bollocks. Absolute nonsense.”

A November 27, 2014 ScienceDaily article, Social media data contain pitfalls for understanding human behavior, notes that  “A growing number of academic researchers are mining social media data to learn about both online and offline human behavior. In recent years, studies have claimed the ability to predict everything from summer blockbusters to fluctuations in the stock market. But mounting evidence of flaws in many of these studies points to a need for researchers to be wary of serious pitfalls that arise when working with huge social media data sets.” The article quotes from a publication by Derek Ruths and Jürgen Pfeffer, at McGill University in Montreal and Carnegie Mellon University in Pittsburgh, whose research highlights specific issues in using social media data sets and also provides strategies for addressing them.

Challenges include:

“Different social media platforms attract different users”

“Publicly available data feeds used in social media research don’t always provide an accurate representation of the platform’s overall data”

“The design of social media platforms can dictate how users behave and, therefore, what behavior can be measured.”

“Large numbers of spammers and bots, which masquerade as normal users on social media”

“Researchers often report results for groups of easy-to-classify users, topics, and events, making new methods seem more accurate than they actually are.”

The researchers make a comparison between the current findings and “The infamous Dewey Defeats Truman headline of 1948” where the The Literary Digest stumbled on an early social media pitfall. The researchers note that “Rather than permanently discrediting the practice of polling, that glaring error led to today’s more sophisticated techniques, higher standards, and more accurate polls.”  Of course, as our earlier article points out, in 1948 the key factor in raising public confidence in polling was the first Gallop poll, which successfully predicted the Dewey/Truman result based on a 3000 sample, whereas The Literary Digest got it wrong based on a survey yielding 2.4 million returns. That article quotes from Big data: are we making a big mistake? by Tim Harford, in which Tim notes that  “The big data craze threatens to be The Literary Digest all over again.”

The researchers have good reason to be upbeat in the conclusion that “By tackling the issues we face, we’ll be able to realize the tremendous potential for good promised by social media-based research.” However, some 2014 research on the political use of social media indicates the potential for voters to be misled. In an October 21, 2014 analysis of a new Pew Research Center study,  Political Polarization & Media Habits by Amy Mitchell, Jeffrey Gottfried, Jocelyn Kiley and Katerina Eva Matsa, the authors note that “When it comes to getting news about politics and government, liberals and conservatives inhabit different worlds. There is little overlap in the news sources they turn to and trust. And whether discussing politics online or with friends, they are more likely than others to interact with like-minded individuals”

The Pew Research Journalism Project is part of a year-long effort to shed light on political polarization in America. It looks at the ways people get information about government and politics in three different settings: the news media, social media and the way people talk about politics with friends and family. The authors of the analysis note that “In all three areas, the study finds that those with the most consistent ideological views on the left and right have information streams that are distinct from those of individuals with more mixed political views – and very distinct from each other.”

The authors also warn that “These cleavages can be overstated. The study also suggests that in America today, it is virtually impossible to live in an ideological bubble. Most Americans rely on an array of outlets – with varying audience profiles – for political news. And many consistent conservatives and liberals hear dissenting political views in their everyday lives.”

However, the overall finding of the study suggest that consistent conservatives:

Are tightly clustered around a single news source, far more than any other group in the survey, with 47% citing Fox News as their main source for news about government and politics. Express greater distrust than trust of 24 of the 36 news sources measured in the survey. At the same time, fully 88% of consistent conservatives trust Fox News. Are, when on Facebook, more likely than those in other ideological groups to hear political opinions that are in line with their own views. Are more likely to have friends who share their own political views. Two-thirds (66%) say most of their close friends share their views on government and politics.”

By contrast, those with consistently liberal views:

“Are less unified in their media loyalty; they rely on a greater range of news outlets, including some – like NPR and the New York Times– that others use far less. Express more trust than distrust of 28 of the 36 news outlets in the survey. NPR, PBS and the BBC are the most trusted news sources for consistent liberals. Are more likely than those in other ideological groups to block or “defriend” someone on a social network – as well as to end a personal friendship – because of politics. Are more likely to follow issue-based groups, rather than political parties or candidates, in their Facebook feeds.”

The analysis provides a lot more detail, including where those with down-the-line conservative and liberal views do share some common ground.

Another recent survey is reported in The London School of Economics and Political Science (LSE) blog Survey finds correlation between strength of scientists’ political beliefs and social media use for sharing research. Authors Sara K. Yeo, Michael A. Cacciatore, Dominique Brossard, Dietram A. Scheufele, and Michael A. Xenos conducted a survey of tenure-track scientists on their use of social media for science-related purposes, their attitudes toward such use, and their political ideology. They found that the stronger the scientists’ political beliefs, the more likely they were to use Facebook or Twitter to talk about their work.

The authors note that “Liberals tended to use Facebook more than conservatives, consistent with charges from the political right in the U.S. that Facebook has a liberal bias and is an echo chamber for left-leaning thinkers. Aside from political ideology, the perceived effectiveness and barriers to use of social media for science-related purposes predicted use of Twitter, but not Facebook.”  They suggest that  “One potential explanation for why Twitter seems to be the social medium of choice for scientists is that it appears to be viewed as a more professional outlet, while Facebook is more often perceived as a space for personal information. Scientists may also avoid Facebook as a tool for sharing research because of the emergence of a host of other social networks specifically tailored to researchers.”

The authors conclude that “as our data show, scientists are only beginning to get their feet wet in this new communication world. Given the controversial nature of many recent scientific debates, researchers will have to do much more to connect directly with public audiences.”

So while we can be optimistic about the “tremendous potential for good promised by social media-based research”, it appears that political bias may be one of the social media data pitfalls we have to be wary of when trying to understand human behaviour.