Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Genomic sequence classification using probabilistic topic modeling

Capitolo di libro
Data di Pubblicazione:
2014
Abstract:
Taxonomic classification of genomic sequences is usually based on evolutionary distance obtained by alignment. In this work we introduce a novel alignment-free classification approach based on probabilistic topic modeling. Using a k-mer (small fragments of length k) decomposition of DNA sequences and the Latent Dirichlet Allocation algorithm, we built a classifier for 16S rRNA bacterial gene sequences. We tested our method with a tenfold cross validation procedure considering a bacteria dataset of 3000 elements belonging to the most numerous bacteria phyla: Actinobacteria, Firmicutes and Proteobacteria. Experiments were carried out using complete and 400 bp long 16S sequences, in order to test the robustness of the proposed methodology. Our results, in terms of precision scores and for different number of topics, ranges from 100 %, at class level, to 77 % at genus level, for both full and 400 bp length, considering k-mers of length 8. These results demonstrate the effectiveness of the proposed approach. © 2014 Springer International Publishing Switzerland.
Tipologia CRIS:
02.01 Contributo in volume (Capitolo o Saggio)
Keywords:
16S rRNA; Alignment-free analysis; DNA k-mers; Genomic classification; LDA; Topic modeling
Elenco autori:
Rizzo, Riccardo; Urso, Alfonso; Fiannaca, Antonino; LA ROSA, Massimo
Autori di Ateneo:
FIANNACA ANTONINO
LA ROSA MASSIMO
RIZZO RICCARDO
URSO ALFONSO
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/275669
Titolo del libro:
Computational Intelligence Methods for Bioinformatics and Biostatistics
  • Dati Generali

Dati Generali

URL

http://www.scopus.com/record/display.url?eid=2-s2.0-84905394884&origin=inward
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)