Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills
  1. Outputs

A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network

Academic Article
Publication Date:
2015
abstract:
OBJECTIVES: In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. METHODS: In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". RESULTS: The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. CONCLUSIONS: Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments.
Iris type:
01.01 Articolo in rivista
Keywords:
Alignment-free analysis; DNA barcode classification; k-Mer representation; Neural gas
List of contributors:
Rizzo, Riccardo; Urso, Alfonso; Fiannaca, Antonino; LA ROSA, Massimo
Authors of the University:
FIANNACA ANTONINO
LA ROSA MASSIMO
RIZZO RICCARDO
URSO ALFONSO
Handle:
https://iris.cnr.it/handle/20.500.14243/300847
Published in:
ARTIFICIAL INTELLIGENCE IN MEDICINE
Journal
  • Use of cookies

Powered by VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)