Computational identification of physicochemical signatures for host tropism of influenza A virus

Avian influenza viruses from migratory birds have managed to cross host species barriers and infected various hosts like human and swine. Epidemics and pandemics might occur when influenza viruses are adapted to humans, causing deaths and enormous economic loss. Receptor-binding specificity of the virus is one of the key factors for the transmission of influenza viruses across species. The determination of host tropism and understanding of molecular properties would help identify the mechanism why zoonotic influenza viruses can cross species barrier and infect humans. In this study, we have constructed computational models for host tropism prediction on human-adapted subtypes of influenza HA proteins using random forest. The feature vectors of the prediction models were generated based on seven physicochemical properties of amino acids from influenza sequences of three major hosts. Feature aggregation and associative rules were further applied to select top 20 features and extract host-associated physicochemical signatures on the combined model of nonspecific subtypes. The prediction model achieved high performance (
Accuracy=0.948
,
Precision=0.954
,
MCC=0.922
). Support and confidence rates were calculated for the host class-association rules. The results indicated that secondary structure and normalized Van der Waals volume were identified as more important physicochemical signatures in determining the host tropism.