Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R
description
Transcript of Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R
Analysis and visualization of microarray experiment data
integrating Pipeline Pilot, Spotfire and R
Vladimir Morozov, ALS Therapy Development Institute
2009 - Boston, MA
Abstract
More 30 public and proprietary microarray experiments have been analyzed using in-house software. Pipeline Pilot workflows are developed to integrate the analysis results into the company gene target Knowledge Sphere platform. The gene expression values are analyzed and plotted via the R connector and custom R scripts. Pipeline Pilot workflows are embedded as Spotfire guides to retrieve gene annotation from NCBI, produce visualizations of differential expression statistics and biological pathway regulation.
R/Bioconductor pipeline
Array quality
AffyBatch and design
QC images
ExprSet
Affymetrix experiment data & annotation
Quality control
Modeling gene expression by
biological variables
Pathway analysis of the
gene modelsNormalization
gene expression values stored on SQL server
Gene statistics Pathway statistics
R data files
Images
Tables
RDBMS
The data are available via the company portal
Access to experiment data set
Links call a Pipeline Pilot protocol
PP protocol parses a directory with the experiment files and expose them thorogh web page
Automatically generate Volcano plots for the all statistical comparisons from the design file
Ander hood:
The Guide page points out to start PP protocol
Modified Discngine connector as components
Custom JaveScripts using the Spotfire API
Parsing experiment design data
JavaScripts with JSON input are exposed on the Spotfire Guide
panel
Pathway analysis visualizations are generated via similar Pipeline Pilot-JavaScript-Spotfire framework
Search for genes inside Spotfire visualization
Under hood:
User queryNCBI GET Esearch
(couldn’t get the web service work in PP)
Propagated to all mammalian orthologe IDs via SQL server HomoloGene database. PP “ODBC Select for Each Data” component is slow. So joining is done via temporary text table and SQL BULK INSERT
Enterz IDs extracted
The gene IDs are passed to Spotfire Guide Javascript via JSON file
Analysis of individual gene expression
Call to PP protocol with gene , experiment IDs as parameters
Opens the PP protocol that read gene expression data from MS SQL server
Under hood:
R custom scripts using “pairwiseCI” ,”Hmisc” and “ggplot2” R packages. The PP “Run R” component is modified to accept command line arguments:
Expression values from SQL server
Experiment design file
Acknowledgments
• Shawn Sullivan & Bashar Alnakhala(ALS-TDI) for providing SQL server storage and Web front-end
• Eric Le Roux (Discngine) for the Spotfire connector
• All ALS-TDI scientists for feedbacks