Long-Term Influenza Outbreak Forecast Using Time-Precedence Correlation of Web Data

Influenza leads to many deaths every year and is a threat to human health. For effective prevention, traditional national-scale statistical surveillance systems have been developed, and numerous studies have been conducted to predict influenza outbreaks using web data. Most studies have captured the short-term signs of influenza outbreaks, such as one-week prediction using the characteristics of web data uploaded in real time; however, long-term predictions of more than 2-10 weeks are required to effectively cope with influenza outbreaks. In this study, we determined that web data uploaded in real time have a time-precedence relationship with influenza outbreaks. For example, a few weeks before an influenza pandemic, the word ``colds´´ appears frequently in web data. The web data after the appearance of the word ``colds´´ can be used as information for forecasting future influenza outbreaks, which can improve long-term influenza prediction accuracy. In this study, we propose a novel long-term influenza outbreak forecast model utilizing the time precedence between the emergence of web data and an influenza outbreak. Based on the proposed model, we conducted experiments on: 1) selecting suitable web data for long-term influenza prediction; 2) determining whether the proposed model is regionally dependent; and 3) evaluating the accuracy according to the prediction timeframe. The proposed model showed a correlation of 0.87 in the long-term prediction of ten weeks while significantly outperforming other state-of-the-art methods.