Alejandro Uscanga Junco, Lorena Díaz-González, Bla. K-FluDB: A Novel K-Mer Based Database for Enhanced Genomic Surveillance of Influenza A Viruses. Bioinformatics Advances, 2025
Motivation
Influenza A viruses frequently cause seasonal outbreaks and pandemics due to their genetic diversity and reassortment potential. Existing genomic surveillance tools face challenges with redundant databases, delaying subtype identification and obscuring reassortment dynamics. K-FluDB, a novel k-mer-based database, addresses these issues by enhancing subtype identification, capturing genomic diversity, and assisting in the detection of reassortment events critical for understanding viral evolution and improving outbreak proactive measures.
Results
K-FluDB provides a comprehensive pangenome for Influenza A, including complete and subtype-specific subsequences from 50 subtype combinations across all 18 hemagglutinin (HA) and 11 neuraminidase (NA) subtypes. Achieving 99.64% compression, K-FluDB eliminates redundancy while preserving essential information. Validation with real-world datasets showed high recovery indices (up to 96.24%) and correct subtype prediction ratios (exceeding 99% for HA and NA). K-FluDB also assists in the detection of reassortment events.
Availability and implementation
Three versions of K-FluDB, optimized for read lengths of 75, 150, and 300 nucleotides, are freely available at https://zenodo.org/records/17203072, and the source code is available at https://github.com/usjunco/pangen.
Influenza A viruses frequently cause seasonal outbreaks and pandemics due to their genetic diversity and reassortment potential. Existing genomic surveillance tools face challenges with redundant databases, delaying subtype identification and obscuring reassortment dynamics. K-FluDB, a novel k-mer-based database, addresses these issues by enhancing subtype identification, capturing genomic diversity, and assisting in the detection of reassortment events critical for understanding viral evolution and improving outbreak proactive measures.
Results
K-FluDB provides a comprehensive pangenome for Influenza A, including complete and subtype-specific subsequences from 50 subtype combinations across all 18 hemagglutinin (HA) and 11 neuraminidase (NA) subtypes. Achieving 99.64% compression, K-FluDB eliminates redundancy while preserving essential information. Validation with real-world datasets showed high recovery indices (up to 96.24%) and correct subtype prediction ratios (exceeding 99% for HA and NA). K-FluDB also assists in the detection of reassortment events.
Availability and implementation
Three versions of K-FluDB, optimized for read lengths of 75, 150, and 300 nucleotides, are freely available at https://zenodo.org/records/17203072, and the source code is available at https://github.com/usjunco/pangen.
See Also:
Latest articles in those days:
- Analysis of the gene sequences of two cases of human infection with avian influenza H9N2 in Guizhou province in 2024 1 hours ago
- Influenza hijacks myeloid cells to inflict type-I interferon-fueled damage in the heart 2 hours ago
- MHC class II functions as a host-specific entry receptor for representative human and swine H3N2 influenza A viruses 2 hours ago
- Longitudinal Surveillance of Influenza A Virus Exposure in Wild Boars (Sus scrofa) in Spain (2015-2023): Serologic and Virologic Evidence of Subtype Infections and H5N1 Spillover Risk 2 hours ago
- [preprint]Emergence and antigenic characterisation of influenza A(H3N2) viruses with hemagglutinin substitutions N158K and K189R during the 2024/25 influenza season 21 hours ago
[Go Top] [Close Window]


