A NOVEL GENERAL-PURPOSE RNA-SEQ PROTOCOL OPTIMIZING THE DETECTION OF TRANSCRIPTOME EXPRESSION COMPLEXITY

Abstract

Publication Date:

2012

abstract:

Recent studies have demonstrated an unexpected complexity of transcription in eukaryotes. Indeed the majority of the genome is transcribed and only a little fraction of these transcripts is annotated as protein coding genes and their splice variants. Therefore high throughput transcriptome sequencing continuously identifies novel RNAs and novel classes of RNAs, which are the result of antisense, overlapping and non-coding RNA expression, demonstrating that the transcriptome captures a level of complexity that the simple genome sequence may not (1). Among next-generation sequencing platforms, the latest series of Roche 454 GS Sequencer, the GS FLX Titanium FLX+, allows to obtain in each run over a million reads, each with a length up to 700 base. Sequences of such length, providing connectivity information among splicing sites, in addition to enabling accurate mapping and relative quantification of mRNAs, are particularly suitable for the characterization of full-length splicing variants that may be differently expressed in physiopathological conditions (2). On the other hand the higher throughput of the Illumina HiSeq 1000 (150 bp) and ABI SOLID (75 bp) platforms, makes them particularly suitable for transcripts level quantification and for small RNAs sequencing. Irrespectively of the NGS platform used, the first step required for transcriptome sequencing is the construction of a cDNA library. Several protocols have been developed so far to this aim and each of them is suitable for sequencing on a specific platform exclusively. Here we describe a new fast and simple method (Patent pending RM2010A000293- PCT/IB2011/052369) to prepare and amplify a representative and strand-specific cDNA library starting from low input total RNA (500ng) for RNA-Seq applications, that may be implemented with all major platforms currently available (Roche 454, Illumina, ABI/Solid). Our method includes the following steps: a) rRNA removal from total RNA b) retrotranscription of the rRNA-depleted RNA to cDNA with 5' phosphorylated Tag-random-octamers custom designed capable of preserving strand information; c) single-strand cDNAs purification; d) ligation and amplification of the purified cDNAs, thus obtaining high yield of concatamers around 20kb long. These DNA molecules can be equally sequenced both with Illumina and Roche 454 sequencing platforms allowing not only the quantitative but also the qualitative assessment of the transcriptome complexity. Moreover, we developed a suitable bioinformatic pipeline for the analysis of the sequences produced upon application of this protocol. Indeed, we developed an in house python script, named Tag_Find (available upon request), able to recognize the position and the type of tag found within the read sequence. The program returns out two files, one containing the type of tags found and their reads positions and one fastq file with non-tagged reads, cleaned up from tags. The Tag_Find efficiency was tested on an artificial dataset of 454 reads, constructed by mimicking the specific structure of cDNA libraries used in this experiment. All the reads obtained upon the tags elimination were mapped onto the hg18/NCBI36 release of the human genome, using the NGS-Trex system (http://www.ngstrex. org/) with a userdefined preset of parameters. edgeR (3) and goseq (4) packages of Bioconductor were used for the differential gene expression analysis on genic mapped reads. For validation purposes, we tested the efficiency of this strategy by analyzing the transcriptome of two xenograft tumor masses derived from the injection in nude mice of an osteosarcoma cell line (OSC) with a nearly-homoplasmic mitochondrial Complex I disruptive mutation (m.3571insC) in the MT-ND1 gene. The xenografts shared the same nuclear genome, but c

Iris type:

04.02 Abstract in Atti di convegno

Keywords:

NGS; 454 ROCHE; Genomics; Epigenomics; Transcriptomics

List of contributors:

Caratozzolo, MARIANO FRANCESCO; Paluscio, Annamaria; Marzano, Flaviana; Tullo, Apollonia; D'Elia, Domenica; Licciulli, VITO FLAVIO; Pesole, Graziano; Liuni, Sabino; Manzari, Caterina; Sbisa', Elisabetta

Authors of the University:

CARATOZZOLO MARIANO FRANCESCO

D'ELIA DOMENICA

LICCIULLI VITO FLAVIO

MANZARI CATERINA

MARZANO FLAVIANA

PESOLE GRAZIANO

SBISA' ELISABETTA

TULLO APOLLONIA

Handle:

https://iris.cnr.it/handle/20.500.14243/310843