A NOVEL GENERAL-PURPOSE RNA-SEQ PROTOCOL OPTIMIZING THE DETECTION OF TRANSCRIPTOME EXPRESSION COMPLEXITY
Abstract
Publication Date:
2012
abstract:
Recent studies have demonstrated an unexpected complexity of transcription in eukaryotes.
Indeed the majority of the genome is transcribed and only a little fraction of these transcripts is
annotated as protein coding genes and their splice variants. Therefore high throughput transcriptome
sequencing continuously identifies novel RNAs and novel classes of RNAs, which are the result of
antisense, overlapping and non-coding RNA expression, demonstrating that the transcriptome
captures a level of complexity that the simple genome sequence may not (1).
Among next-generation sequencing platforms, the latest series of Roche 454 GS Sequencer, the GS
FLX Titanium FLX+, allows to obtain in each run over a million reads, each with a length up to 700
base. Sequences of such length, providing connectivity information among splicing sites, in addition
to enabling accurate mapping and relative quantification of mRNAs, are particularly suitable for the
characterization of full-length splicing variants that may be differently expressed in
physiopathological conditions (2). On the other hand the higher throughput of the Illumina HiSeq
1000 (150 bp) and ABI SOLID (75 bp) platforms, makes them particularly suitable for transcripts
level quantification and for small RNAs sequencing.
Irrespectively of the NGS platform used, the first step required for transcriptome sequencing is the
construction of a cDNA library. Several protocols have been developed so far to this aim and each
of them is suitable for sequencing on a specific platform exclusively.
Here we describe a new fast and simple method (Patent pending RM2010A000293-
PCT/IB2011/052369) to prepare and amplify a representative and strand-specific cDNA library
starting from low input total RNA (500ng) for RNA-Seq applications, that may be implemented with
all major platforms currently available (Roche 454, Illumina, ABI/Solid).
Our method includes the following steps: a) rRNA removal from total RNA b) retrotranscription of
the rRNA-depleted RNA to cDNA with 5' phosphorylated Tag-random-octamers custom designed
capable of preserving strand information; c) single-strand cDNAs purification; d) ligation and
amplification of the purified cDNAs, thus obtaining high yield of concatamers around 20kb long.
These DNA molecules can be equally sequenced both with Illumina and Roche 454 sequencing
platforms allowing not only the quantitative but also the qualitative assessment of the transcriptome
complexity.
Moreover, we developed a suitable bioinformatic pipeline for the analysis of the sequences produced
upon application of this protocol. Indeed, we developed an in house python script, named Tag_Find
(available upon request), able to recognize the position and the type of tag found within the read
sequence. The program returns out two files, one containing the type of tags found and their reads
positions and one fastq file with non-tagged reads, cleaned up from tags. The Tag_Find efficiency was
tested on an artificial dataset of 454 reads, constructed by mimicking the specific structure of cDNA
libraries used in this experiment. All the reads obtained upon the tags elimination were mapped onto
the hg18/NCBI36 release of the human genome, using the NGS-Trex system (http://www.ngstrex.
org/) with a userdefined preset of parameters. edgeR (3) and goseq (4) packages of Bioconductor
were used for the differential gene expression analysis on genic mapped reads.
For validation purposes, we tested the efficiency of this strategy by analyzing the transcriptome of
two xenograft tumor masses derived from the injection in nude mice of an osteosarcoma cell line
(OSC) with a nearly-homoplasmic mitochondrial Complex I disruptive mutation (m.3571insC) in the
MT-ND1 gene. The xenografts shared the same nuclear genome, but c
Iris type:
04.02 Abstract in Atti di convegno
Keywords:
NGS; 454 ROCHE; Genomics; Epigenomics; Transcriptomics
List of contributors: