An influenza forecasting model that includes Twitter data could reduce errors in models that rely solely on historical records, according to recent data.
Researchers examined outbreak surveillance data from the CDC’s influenza-like illness (ILINet) system, along with prediction models based on the database. Using an influenza surveillance system based on Twitter data that was developed by others and filtered out media awareness campaigns and other confounders, the researchers created a predictive model combining the two. Prediction data also was collected from Google Flu Trends, a surveillance system based on Google search volume, for additional comparison.
Forecasts were made during the 2011-2012, 2012-2013 and 2013-2014 influenza seasons (Nov. 27-April 5) using week by week observational and historic data. Comparisons were made to ILINet reports immediately after their release and 1 week after collection and when the CDC released its more accurate estimates weeks later.
“Our analysis is the first to systematically characterize the limitations of ILINet data,” the researchers wrote. “We have found that forecasting studies that use historical ILINet data must account for the fact that these data are often initially inaccurate and undergo frequent revision, effectively increasing the lag between data collection and the time that accurate numbers are available to health professionals.”
Researchers found that a model combining Twitter and historical data outperformed one that only relied on the latter. Using the Twitter model reduced nowcasting error by 29.6%, which dipped to 6.09% when using the CDC’s final estimates. The Twitter model was regularly more accurate than the baseline when forecasting outbreak estimates, with 10-week predictions that had fewer errors than the baseline model of 4 weeks earlier.
For current estimates, Google Flu Trends only reduced error over the baseline during a single influenza season, and was outperformed when making future predictions. These results conflicted with previous studies, although the inclusion of one season where Google Flu Trends performed much worse than usual could be an explanation, the researchers wrote. The algorithm used by the service has since been updated.
“There are several benefits to using Twitter over [Google Flu Trends], including the ubiquity, openness, public availability and ease of use of Twitter data,” the researchers wrote. “These factors have led the wider academic community to focus on Twitter, especially in light of recent poor performance of [Google Flu Trends], and the attendant concerns about using metrics based on proprietary data and algorithms. As we collect additional years of tweets, we will be able to make broader claims about the relative utility of Google and Twitter data.”
Disclosure: One researcher reported support from a Microsoft Research PhD fellowship, and currently serves on the advisory board for Sickweather. Another reported receipt of compensation for talks and consultation from Directing Medicine, Progeny Systems and Sickweather.