Data di Pubblicazione:
2013
Abstract:
Spaced seeds are used in approximate pattern matching algorithms to quickly discard regions where a match is not likely to occur. We propose a family of lossless spaced seeds based on Quadratic Residues modulo a prime number. Our seeds work with a threshold t>1 in the sense that two regions are considered similar only if the seed hits t times within the regions. We prove that, for any number of errors, our seeds have an exponentially smaller probability of producing false positive matches than any traditional seed using a threshold t=1. To establish our result we introduce a formal notion of selectivity that generalizes the concept of seed weight, and we relate it to the minimum coverage and to a new structural property defined in terms on seed rotations. This groundwork will be useful for further analysis on seeds with threshold and we use it to provide improved bounds for approximate matching with 2 or 3 errors. Our results show that the use of a single seed with a threshold t>1 should be considered as a possible alternative to single or multiple seeds with t=1. © 2013 Elsevier Inc. All rights reserved.
Tipologia CRIS:
01.01 Articolo in rivista
Keywords:
Approximate matching; Hamming distance; Homology search; Lossless filtration; Sequence comparison; Spaced seeds
Elenco autori:
Manzini, Giovanni
Link alla scheda completa:
Pubblicato in: