Perils of Data Without Context
The Perils of Data Without Context
Nick Bilton’s post on the New York Times Bits Blog (“Disruptions: Data Without Context Tells a Misleading Story”, February 24, 2013) highlights the broader risk of assumptions based on counting occurrences (“looking only at the numbers”) instead of scoring intrinsic value. It’s a challenge facing anyone trying to sort quality from quantity, especially where data from social media is concerned: unlike the traditional Web (where for example the value of a backlink is easily determined), social media has no inbuilt quality measures that apply universally to all networks.
Bilton’s post was about Google Flu Trends, an equation Google uses to figure out how many people have influenza. He noted that Google calculates this by “… people’s location + flu-related search queries on Google + some really smart algorithms = the number of people with the flu in the United States.” During the flu season’s peak in mid-January, the algorithms estimated that nearly 11 percent of the U.S. population had influenza — which was almost double the estimate made by the Centers for Disease Control and Prevention, as quoted in the science journal Nature. Researchers put the mismatch down the the fact that, thanks to widespread media coverage of this year’s severe U.S. ‘flu season, many people may have been discussing or querying ‘flu without actually having it, and it was these phantom occurrences which distorted the real count. As Declan Butler commented in the Nature article, social media helped news of the flu spread quicker than the virus itself. In other words, as Bilton concluded, “Google’s algorithm was looking only at the numbers, not at the context of the search results.” Anyone who’s ploughed through ten thousand verbatim postings from their social media monitoring service will recognize the problem of teasing out quality from quantity. In social media analytics, the signal-to-noise ratio is very low.