Free Newsletters - Space - Defense - Environment - Energy - Solar - Nuclear
by Staff Writers
Houston TX (SPX) Mar 18, 2014
Numbers and data can be critical tools in bringing complex issues into crisp focus. The understanding of diseases, for example, benefits from algorithms that help monitor their spread. But without context, a number may just be a number, or worse, misleading.
"The Parable of Google Flu: Traps in Big Data Analysis" is published in the journal Science, funded, in part, by a grant from the National Science Foundation. Specifically, the authors examine Google's data-aggregating tool Google Flu Trend (GFT), which was designed to provide real-time monitoring of flu cases around the world based on Google searches that matched terms for flu-related activity.
"Google Flu Trend is an amazing piece of engineering and a very useful tool, but it also illustrates where 'big data' analysis can go wrong," said Ryan Kennedy, University of Houston political science professor. He and co-researchers David Lazer (Northeastern University/Harvard University), Alex Vespignani (Northeastern University) and Gary King (Harvard University) detail new research about the problematic use of big data from aggregators such as Google.
Even with modifications to the GFT over many years, the tool that set out to improve response to flu outbreaks has overestimated peak flu cases in the U.S. over the past two years.
"Many sources of 'big data' come from private companies, who, just like Google, are constantly changing their service in accordance with their business model," said Kennedy, who also teaches research methods and statistics for political scientists. "We need a better understanding of how this affects the data they produce; otherwise we run the risk of drawing incorrect conclusions and adopting improper policies."
GFT overestimated the prevalence of flu in the 2012-2013 season, as well as the actual levels of flu in 2011-2012, by more than 50 percent, according to the research. Additionally, from August 2011 to September 2013, GFT over-predicted the prevalence of flu in 100 out of 108 weeks.
The team also questions data collections from platforms such as Twitter and Facebook (like polling trends and market popularity) as campaigns and companies can manipulate these platforms to ensure their products are trending.
Still, the article contends there is room for data from the Googles and Twitters of the Internet to combine with more traditional methodologies, in the name of creating a deeper and more accurate understanding of human behavior.
"Our analysis of Google Flu demonstrates that the best results come from combining information and techniques from both sources," Kennedy said. "Instead of talking about a 'big data revolution,' we should be discussing an 'all data revolution,' where new technologies and techniques allow us to do more and better analysis of all kinds."
University of Houston
Epidemics on Earth - Bird Flu, HIV/AIDS, Ebola
|The content herein, unless otherwise known to be public domain, are Copyright 1995-2014 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. Privacy Statement All images and articles appearing on Space Media Network have been edited or digitally altered in some way. Any requests to remove copyright material will be acted upon in a timely and appropriate manner. Any attempt to extort money from Space Media Network will be ignored and reported to Australian Law Enforcement Agencies as a potential case of financial fraud involving the use of a telephonic carriage device or postal service.|