Objectives: Characterizing and predicting the evolutionary process of influenza, which remains challenging, are of importance in capturing the patterns of influenza activities and the development of prevention and control strategies. In this study, we quantified genetic mutation activity and developed a statistical model to predict dominant influenza A serotype with limited sequencing data.
Data and methods: A total number of 8097 and 7090 HA sequences for A/H1N1 and A/H3N2 were collected from 2008/09 to 2018/19 flu season in seven countries or regions. And g-measure, which reflected the overall level of genetic activity through time, was considered to predict dominant flu serotype in population.
Results: The model discriminated the influenza serotypes well with the sensitivity = 0.84, precision = 0.79 and AUC = 0.78 (95% CI: 0.54 - 0.97), and explained 42% of the serotypes variability with the R2.
Conclusions: Our study suggests that the dominance of flu serotype in population can be well discriminated by genetic mutation activities from sample strains. By the data-driven computational framework, the genetic mutation can be quantified to trace the genetic activities on a real-time basis, and provide early warning for the coming flu season.