Data di Pubblicazione:
2011
Abstract:
Background: Analysis of the human genome has revealed that as much as an order of magnitude more of the
genomic sequence is transcribed than accounted for by the predicted and characterized genes. A number of these
transcripts are alternatively spliced forms of known protein coding genes; however, it is becoming clear that many
of them do not necessarily correspond to a functional protein.
Results: In this study we analyze alternative splicing isoforms of human gene products that are unambiguously
identified by mass spectrometry and compare their properties with those of isoforms of the same genes for which
no peptide was found in publicly available mass spectrometry datasets. We analyze them in detail for the presence
of uninterrupted functional domains, active sites as well as the plausibility of their predicted structure. We report
how well each of these strategies and their combination can correctly identify translated isoforms and derive a
lower limit for their specificity, that is, their ability to correctly identify non-translated products.
Conclusions: The most effective strategy for correctly identifying translated products relies on the conservation of
active sites, but it can only be applied to a small fraction of isoforms, while a reasonably high coverage, sensitivity
and specificity can be achieved by analyzing the presence of non-truncated functional domains. Combining the
latter with an assessment of the plausibility of the modeled structure of the isoform increases both coverage and
specificity with a moderate cost in terms of sensitivity.
Tipologia CRIS:
01.01 Articolo in rivista
Keywords:
Alternative Splice; Pfam Domain; Negative Dataset; Protein coding
Elenco autori:
LE PERA, Loredana
Link alla scheda completa:
Pubblicato in: