Background: Influenza A viruses (IAV) exhibit vast genetic mutability and have great zoonotic potential to infect avian and mammalian hosts and are known to be responsible for a number of pandemics. A key computational issue in influenza prevention and control is the identification of molecular signatures with cross-species transmission potential. We propose an adjusted entropy-based host-specific signature identification method that uses a similarity coefficient to incorporate the amino acid substitution information and improve the identification performance. Mutations in the polymerase genes (e.g., PB2) are known to play a major role in avian influenza virus adaptation to mammalian hosts. We thus focus on the analysis of PB2 protein sequences and identify host specific PB2 amino acid signatures.
Results: Validation with a set of H5N1 PB2 sequences from 1996 to 2006 results in adjusted entropy having a 40% false negative discovery rate compared to a 60% false negative rate using unadjusted entropy. Simulations across different levels of sequence divergence show a false negative rate of no higher than 10% while unadjusted entropy ranged from 9 to 100%. In addition, under all levels of divergence adjusted entropy never had a false positive rate higher than 9%. Adjusted entropy also identifies important mutations in H1N1pdm PB2 previously identified in the literature that explain changes in divergence between 2008 and 2009 which unadjusted entropy could not identify.
Conclusions: Based on these results, adjusted entropy provides a reliable and widely applicable host signature identification approach useful for IAV monitoring and vaccine development.