Using Google search data may increase the accuracy of predictive models for real-time influenza monitoring, according to recently published data.
Tobias Preis, PhD, and Susannah Moat, PhD, of the business school at the University of Warwick, United Kingdom, created a disease reporting model based on historic influenza data collected from the CDC’s US Outpatient Influenza-like Illness Surveillance Network and weekly Google Flu Trends queries for searches relating to influenza symptoms. They then compared their integrated model’s predictive accuracy to one based solely on CDC database information. Because the algorithm used by the Google Flu Trends was revised near the end of the Jan. 3, 2010 and Sept. 21, 2013 study period, much of the data collected represents the older search algorithm.
The mean absolute error of the researchers’ integrated model was lower than the baseline model both within sample periods (0.114 vs. 0.131) and when predicting illness outside the sample (0.133 vs. 0.162). By using search data, the integrated model reduced mean absolute error for predictions from 16% to 52.7%, depending on how many weeks of data were used to train the model.
“Our results show that public health professionals can indeed use data on the number of Google searches for flu-related symptoms to improve their estimates of how many people have the flu right now, as long as their analysis takes simple precautions to allow for the fact that human behavior can change across time,” Preis said in a press release.
“It’s true that simply using the number of searches as an estimate of flu levels can result in misleading figures. However, simple models can be built to watch out for increases in searches that do not correspond to increases in reports of flu, and which use this information to improve upcoming estimates.”
Disclosure: The researchers report no relevant financial disclosures.