Publication Date:
2013
abstract:
In this work we introduce a novel alignment-free genomic classification approach based on probabilistic topic modeling. Using a kmer (small fragments of length k) decomposition of DNA sequences and the LDA algorithm, we built a classifier for 16S rRNA bacterial gene sequences.
We tested our method with a ten-fold cross validation procedure considering a bacteria dataset of 3000 elements belonging to the most numerous bacteria phyla: Actinobacteria, Firmicutes and Proteobacteria.
Our results, in terms of precision scores and for different number of topics, ranges from 100%, at class level, to 77% at genus level, considering k-mers of length 8. These results demonstrate the effectiveness of our approach and, as future work, we are going to tune our methodology to improve classification results at genus level, implementing a consensus mechanism.
Iris type:
04.01 Contributo in Atti di convegno
Keywords:
Genomic classification; Alignment-free analysis; 16S rRNA; DNA k-mers; Topic modeling; LDA
List of contributors:
Rizzo, Riccardo; Urso, Alfonso; Fiannaca, Antonino; LA ROSA, Massimo
Book title:
Computational Intelligence Methods for Bioinformatics and Biostatistics