A Web-based and Grid-enabled dChip version for the analysis of large sets of gene expression data
Articolo
Data di Pubblicazione:
2008
Abstract:
Background: Microarray techniques are one of the main methods used to investigate thousands
of gene expression profiles for enlightening complex biological processes responsible for serious
diseases, with a great scientific impact and a wide application area. Several standalone applications
had been developed in order to analyze microarray data. Two of the most known free analysis
software packages are the R-based Bioconductor and dChip. The part of dChip software
concerning the calculation and the analysis of gene expression has been modified to permit its
execution on both cluster environments (supercomputers) and Grid infrastructures (distributed
computing).
This work is not aimed at replacing existing tools, but it provides researchers with a method to
analyze large datasets without any hardware or software constraints.
Results: An application able to perform the computation and the analysis of gene expression on
large datasets has been developed using algorithms provided by dChip. Different tests have been
carried out in order to validate the results and to compare the performances obtained on different
infrastructures. Validation tests have been performed using a small dataset related to the
comparison of HUVEC (Human Umbilical Vein Endothelial Cells) and Fibroblasts, derived from
same donors, treated with IFN-?.
Moreover performance tests have been executed just to compare performances on different
environments using a large dataset including about 1000 samples related to Breast Cancer patients.
Conclusion: A Grid-enabled software application for the analysis of large Microarray datasets has
been proposed. DChip software has been ported on Linux platform and modified, using
appropriate parallelization strategies, to permit its execution on both cluster environments and
Grid infrastructures. The added value provided by the use of Grid technologies is the possibility to
exploit both computational and data Grid infrastructures to analyze large datasets of distributed
data. The software has been validated and performances on cluster and Grid environments have
been compared obtaining quite good scalability results.
Tipologia CRIS:
01.01 Articolo in rivista
Elenco autori:
Scaglione, Silvia
Link alla scheda completa:
Pubblicato in: