LAF Barcoding: classifying DNA Barcode multi-locus sequences with feature vectors and supervised approaches
Capitolo di libro
Data di Pubblicazione:
2015
Abstract:
DNA barcodes - one or multiple very short gene sequences - have been proven
effective to classify a specimen to species. To handle this task in the plant and fungus
kingdoms, multi-locus DNA barcode data as well as sequence analysis techniques are
demanded, posing new challenges.
In this work, we describe LAF-BARCODING, a Logic Alignment Free technique that
counts the number of fixed-length substrings (k-mers) of the input sequences, represents
them in feature vectors, and classifies them through a rule-based approach in order to
specifically assign multi-locus DNA barcode sequences to their corresponding species.
We use LAF to classify several sets of DNA barcode sequences, belonging to the
plant and fungus life kingdoms, obtaining compact and meaningful classification models
(if-then rules) with high accuracy rates. Conversely to the widespread alignmentbased
(e.g., character, tree, and similarity) methods, we highlight that LAF can be successfully
applied to multi-locus DNA barcode sequences.
Tipologia CRIS:
02.01 Contributo in volume (Capitolo o Saggio)
Keywords:
DNA Barcoding; alignment-free; classification; supervised machine learning.
Elenco autori:
Weitschek, Emanuel; Fiscon, Giulia; Bertolazzi, Paola; Cestarelli, Valerio; Felici, Giovanni
Link alla scheda completa:
Titolo del libro:
12th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics