Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Analysis and visualization of microarray experiment data

integrating Pipeline Pilot, Spotfire and R

Vladimir Morozov, ALS Therapy Development Institute

2009 - Boston, MA

Abstract

More 30 public and proprietary microarray experiments have been analyzed using in-house software. Pipeline Pilot workflows are developed to integrate the analysis results into the company gene target Knowledge Sphere platform. The gene expression values are analyzed and plotted via the R connector and custom R scripts. Pipeline Pilot workflows are embedded as Spotfire guides to retrieve gene annotation from NCBI, produce visualizations of differential expression statistics and biological pathway regulation.

R/Bioconductor pipeline

Array quality

AffyBatch and design

QC images

ExprSet

Affymetrix experiment data & annotation

Quality control

Modeling gene expression by

biological variables

Pathway analysis of the

gene modelsNormalization

gene expression values stored on SQL server

Gene statistics Pathway statistics

R data files

Images

Tables

RDBMS

The data are available via the company portal

Access to experiment data set

Links call a Pipeline Pilot protocol

PP protocol parses a directory with the experiment files and expose them thorogh web page

Automatically generate Volcano plots for the all statistical comparisons from the design file

Ander hood:

The Guide page points out to start PP protocol

Modified Discngine connector as components

Custom JaveScripts using the Spotfire API

Parsing experiment design data

JavaScripts with JSON input are exposed on the Spotfire Guide

panel

Pathway analysis visualizations are generated via similar Pipeline Pilot-JavaScript-Spotfire framework

Search for genes inside Spotfire visualization

Under hood:

User queryNCBI GET Esearch

(couldn’t get the web service work in PP)

Propagated to all mammalian orthologe IDs via SQL server HomoloGene database. PP “ODBC Select for Each Data” component is slow. So joining is done via temporary text table and SQL BULK INSERT

Enterz IDs extracted

The gene IDs are passed to Spotfire Guide Javascript via JSON file

Analysis of individual gene expression

Call to PP protocol with gene , experiment IDs as parameters

Opens the PP protocol that read gene expression data from MS SQL server

Under hood:

R custom scripts using “pairwiseCI” ,”Hmisc” and “ggplot2” R packages. The PP “Run R” component is modified to accept command line arguments:

Expression values from SQL server

Experiment design file

Acknowledgments

• Shawn Sullivan & Bashar Alnakhala(ALS-TDI) for providing SQL server storage and Web front-end

• Eric Le Roux (Discngine) for the Spotfire connector

• All ALS-TDI scientists for feedbacks

Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Documents

Transcript of Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R