Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Genomic Sequence Classification using Probabilistic Topic Modeling

Contributo in Atti di convegno
Data di Pubblicazione:
2013
Abstract:
In this work we introduce a novel alignment-free genomic classification approach based on probabilistic topic modeling. Using a kmer (small fragments of length k) decomposition of DNA sequences and the LDA algorithm, we built a classifier for 16S rRNA bacterial gene sequences. We tested our method with a ten-fold cross validation procedure considering a bacteria dataset of 3000 elements belonging to the most numerous bacteria phyla: Actinobacteria, Firmicutes and Proteobacteria. Our results, in terms of precision scores and for different number of topics, ranges from 100%, at class level, to 77% at genus level, considering k-mers of length 8. These results demonstrate the effectiveness of our approach and, as future work, we are going to tune our methodology to improve classification results at genus level, implementing a consensus mechanism.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Genomic classification; Alignment-free analysis; 16S rRNA; DNA k-mers; Topic modeling; LDA
Elenco autori:
Rizzo, Riccardo; Urso, Alfonso; Fiannaca, Antonino; LA ROSA, Massimo
Autori di Ateneo:
FIANNACA ANTONINO
LA ROSA MASSIMO
RIZZO RICCARDO
URSO ALFONSO
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/278030
Titolo del libro:
Computational Intelligence Methods for Bioinformatics and Biostatistics
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)