A Grid based solution for Management and Analysis of Microarrays in distributed experiments
Articolo
Data di Pubblicazione:
2007
Abstract:
Several systems have been presented in the last years in order to manage the complexity of large
microarray experiments. Although good results have been achieved, most systems tend to lack in
one or more fields. A Grid based approach may provide a shared, standardized and reliable solution
for storage and analysis of biological data, in order to maximize the results of experimental efforts.
A Grid framework has been therefore adopted due to the necessity of remotely accessing large
amounts of distributed data as well as to scale computational performances for terabyte datasets.
Two different biological studies have been planned in order to highlight the benefits that can
emerge from our Grid based platform. The described environment relies on storage services and
computational services provided by the gLite Grid middleware. The Grid environment is also able
to exploit the added value of metadata in order to let users better classify and search experiments.
A state-of-art Grid portal has been implemented in order to hide the complexity of framework
from end users and to make them able to easily access available services and data. The functional
architecture of the portal is described. As a first test of the system performances, a gene expression
analysis has been performed on a dataset of Affymetrix GeneChip® Rat Expression Array
RAE230A, from the ArrayExpress database. The sequence of analysis includes three steps: (i) group
opening and image set uploading, (ii) normalization, and (iii) model based gene expression (based
on PM/MM difference model). Two different Linux versions (sequential and parallel) of the dChip
software have been developed to implement the analysis and have been tested on a cluster. From
results, it emerges that the parallelization of the analysis process and the execution of parallel jobs
on distributed computational resources actually improve the performances. Moreover, the Grid
environment have been tested both against the possibility of uploading and accessing distributed
datasets through the Grid middleware and against its ability in managing the execution of jobs on
distributed computational resources. Results from the Grid test will be discussed in a further paper.
Tipologia CRIS:
01.01 Articolo in rivista
Elenco autori:
Viti, Federica; Scaglione, Silvia
Link alla scheda completa:
Pubblicato in: