Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from...

47
Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway analysis. To derive meaningful information out of this data, we need to develop integrative visualization techniques, which provide an insight into its biological relevance.

Transcript of Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from...

Page 1: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Systems Biology Visualization

Surabhi Agarwal

There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway analysis. To derive meaningful information

out of this data, we need to develop integrative visualization techniques, which provide an insight

into its biological relevance.

Page 2: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Definition of the Problem

Audio Narration

1

5

34

2

Action Description of the action Static

ImageDsiplay image and read narration We will consider the case study of the disease condition known as Glioma

which is a group of brain tumors. In the first part of the animation, we take an insight into the regulation of genes in Glioma by gene expression data analysis . It will give us an insight into the genes, which are modulated (up- or down-regulated) during Glioma. In the second part of the study, we will find the metabolic pathways that are involved in Glioma by undertaking a study with the protein Interaction data . In the third part, we will explore pathway databases and its features to study the pathways that were retrieved from the gene and protein interaction studies.

We will consider the case study of the disease condition known as Glioma which is a group of brain tumors. In the first part of the animation, we take an insight into the regulation of genes in Glioma by gene expression data analysis . It will give us an insight into the genes, which are modulated (up- or down-regulated) during Glioma. In the second part of the study, we will find the metabolic pathways that are involved in Glioma by undertaking a study with the protein Interaction data . In the third part, we will explore pathway databases and its features to study the pathways that were retrieved from the gene and protein interaction studies.

Page 3: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Master Layout (Part 1)

5

3

2

4

1 This animation consists of 3 parts:Part 1: Gene Expression Data AnalysisPart 2 : Protein Interaction Data AnalysisPart 3: Metabolic Profile Databases

http://www.genome.jp/kegg/

Chose the problem to study and extract relevant data

Send the gene expression profile data as input to the tool

Compute the features related to gene regulation

Genes up- or down-regulation

Page 4: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Definitions of the components:Part 1 – Gene expression data analysis

5

3

2

4

11. Interaction Data: Interaction data refers to information regarding the

nature and type of bonding between various biological components. It can be Protein Interaction Data, Gene Expression Data and Metabolic Pathway Data.

2. Visualization tools: Software tools that are capable of reading interaction data and then representing it in a graphical format thereby providing a simplistic biological insight. E.g. Cytoscape for Protein Interaction data, Genespring for Gene Expression Data.

3. Microarray: Microarrays are printed on a solid surface, typically glass, and used to study and analyze large number of samples simultaneously in high-throughput.

Page 5: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Gene Expression Profile Data–Option

Audio Narration

1

5

34

2

Action Description of the action

INPUT OUTPUT

Option for user to view Input Or Output

The Data generation box should be linked to step 1. Input box should be linked to the step 2 input slides. Same goes for output. Output slides should be linked to step 3. Visulaization slide should be linked to Step 4.This SLIDE is to provide the user an option to go through only specific content from the animation

To view the protocol for submitting files, click on input. To view the protocol for retrieving and analyzing output files, click on output. To proceed to full animation click on the arrow.

Proceed to Full Animation

DATA GENERATION

VISUALIZATION

Page 6: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 1.a - Gene Expression Profile Data – Data Extraction from Experiments

Audio Narration

1

5

34

2

Action Description of the actionUsers can extract gene microarray data from Microarray Experiments. The normalized microarray data gives an insight into the regulation of the genes. This regulation is checked by studying the microarray data through Gene Expression Profile Data Analysis software. For a detailed insight into the Microarray Technique, study the OSCAR animation for Microarray Technologies.

Schematic for extracting the data for defined problem

Follow the animation. Re-draw the figures.

Biological Samples e.g. gliomas

Microarray Chips

Scanned Slides

Biochemistry by A.L.Lehninger et al., 3rd edition

Page 7: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 1.b - Gene Expression Profile Data – Data Extraction from Databases

Audio Narration

1

5

34

2

Action Description of the actionUsers can extract microarray data directly from experiments or from Public repositories such as GEO datasets from NCBI. Premier microarray research institutes have their own dedicated databases for the microarray data that has been extracted in their labs. This data is in the form of compressed files due to their large file sizes. These files need to be stored in a local Personal Computer System. Here, as an example, we’ll study the regulation of genes in brain tumor, known as Glioma. Gene expression data analysis will give us a picture of the genes, which are modulated (up- or down-regulated) during Glioma.

Microarray Data Repository

Query Term High-Grade glioma

PMID ACCESSION NUMBER

PROTEIN NAME

GLIOMA TYPE

VALIDATION FOLD CHANGE

p-VALUE

Schematic for extracting the data for defined problem

Follow the animation and show storage of files in Local System

Microarray Data file

BInput - Extracting microarray data

For analysis

Page 8: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 2: Gene Expression Profile Data - Input

Action Audio Narration

1

5

34

Description of the action

2

http://www.genome.jp/kegg/

ADD PROJECT

ADD EXPERIMENT

UPLOAD DATA

SELECT PLATFORM

Name of the Project

Select Experimental Type

Agilent Single ColorAgilent Two ColorAffymentrix Copy NumberAffymentrix ExpressionIllumina Association AnalysisIllumina Copy NumberIllumina Single ColorRealTime - PCR

Affymentrix Expression

Select Technology (if applicable)

BarleyBovineE.ColiBSubtilisDrosophilaHumanMouseMaizeHuman

Human

Browse File Folder A/GSE123/GSM456.CEL

Glioma

The software follows the input procedure in a sequential manner. Initial steps are to add a new project and experiment. While adding experiment, user needs to define the type of experiment. Due to lack of standardization, microarray data is saved in various file formats such as CEL, GPR, GAL, CDT. Various tools support one or more of such formats.

Schematic for entering data and setting parameters

Follow the animation and re-draw images to replicate the working of a software environment

The technology used in Microarray Experiments refers to the reference organism used for

making the microarray chip

Page 9: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 3.a - Gene Expression Profile Data - Output

Action Audio Narration

1

5

34

Description of the action

2

http://www.genome.jp/kegg/

>=8Fold Change Cutoff

Schematic for interpreting the results of Gene Expression Data Analysis

Show the simulation of the software. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors

Summary Statistics

Filter data - Fold Change

Functional Analysis - GO

Heat Map

High cutoff to give significant results.

High cutoff is provided to give significant results. During comparison, probe sets that satisfy the fold change cutoff of more than 8 in at least one condition pair will be displayed in the result. Regulation is reported by comparing ratio of conditions 1 and 2. Thus, highlighted gene HMGCS1 is up-regulated in sample GSM34580 as compared to GSM 34586.

Page 10: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 3.b - Gene Expression Profile Data - Output

Audio Narration

1

5

34

2

http://www.genome.jp/kegg/

Action Description of the actionSchematic for interpreting the results of Gene Expression Data Analysis

Animator needs to re-draw all screen shots as they have been taken from the references software. Animator must not copy the image or a part thereof., in the final animation. Show the simulation of the software. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors

Filter data - Fold Change

Functional Analysis - GO

Summary Statistics

Heat Map

Heat Map is the graphical visualization of the regulation of genes, which is determined by the cut-off value of fold change provided by the user. The up-regulation of the gene is marked in “red” while the down-regulation is marked by “blue” color as explained in the figure legend.

upregulated

downregulated

Legend for color coding of regulation

Page 11: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 3.c - Gene Expression Profile Data - Output

Action Audio Narration

1

5

34

Description of the action

2

http://www.genome.jp/kegg/

Schematic for interpreting the results of Gene Expression Data Analysis

Show the simulation of the software. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors

Filter data - Fold Change

Functional Analysis - GO

Summary Statistics

Heat Map

The summary statistics result gives the statistical gist of the genes screened after specifying a cut-off to the gene expression analysis server. This includes the number of genes observed to be regulated and the statistical significance of the fold change corresponding to it.

Page 12: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 3.d - Gene Expression Profile Data - Results

Audio Narration

1

5

34

2

http://www.genome.jp/kegg/

Action Description of the actionSchematic for interpreting the results of Gene Expression Data Analysis

Show the simulation of the software. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate colors

Molecular Functions 1. catalytic activity2. hydroxymethylglutaryl-CoA synthase

activity3. cytokine activity4. protein binding5. chemokine activity6. G-protein-coupled receptor binding7. signal transducer activity

Biological Functions1. lipid metabolic process 2. fatty acid metabolic process3. positive regulation of endothelial cell

proliferation 4. angiogenesis5. apoptosis6. cell adhesion 7. response to hypoxia

Cellular Components affected1.endoplasmic reticulum2.extracellular region3.soluble fraction4.cytoplasm5.membrane fraction

Filter data - Fold Change

Functional Analysis - GO

Summary Statistics

Heat Map

The Functional Analysis tools gives the functions that the regulated genes are involved in at the molecular level, biological level and the cellular components they modulate.

Page 13: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 4. - Gene Expression Profile Data - Visualization

1

5

34

2

http://www.ingenuity.com/

Page 14: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 3.d - Gene Expression Profile Data - Visualization

Audio Narration1

5

34

2

http://www.ingenuity.com/

Action Description of the action

Static Slide Animator needs to re-draw all screen shots as they have been taken from the references software. Animator must not copy the image or a part thereof, in the final animation. Show the image with audio narration. Show the zooming effect a shown in the animation.

The pathway information relevant in Gliomas Studies, from the input data, can be extracted. In this we show the merged gene regulatory pathway. We zoom into the pathway titled “Cell Cycle, Cellular Assembly and Organization, DNA Replication, Recombination, and Repair” and see the interactions of TP53 pathway.

Page 15: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Master Layout (Part 2)

5

3

2

4

1This animation consists of 3 parts:Part 1: Gene Expression Data AnalysisPart 2 : Protein Interaction Data AnalysisPart 3: Metabolic Profile Databases

Retrieve protein interaction data from experiments or public repositories or experiments

Input the data in the software tool in the right format

View, download and interpret the results

http://www.genome.jp/kegg/

Page 16: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Definitions of the components:Part 2 – Protein Interaction Data Analysis

5

3

2

4

11. Knowledgebase: The Protein Interaction Network tools accept the

user data and map it to its repository. These storage units of the tools are called their knowledgebase.

2. Accession Number: The accession number of a protein refers to the unique identifier, which acts as a common link to relate the data provided as input by the users with the knowledgebase of the tool.

3. Protein microarray: These are miniaturized arrays, commonly printed on glass, polyacrylamide gel pads or microwells, onto which small quantities of thousands of proteins can be simultaneously immobilized for high-throughput assaying.

Page 17: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Gene Expression Profile Data–Option

Audio Narration

1

5

34

2

Action Description of the action

INPUT OUTPUT

Option for user to view Input Or Output

The Data generation box should be linked to step 1. Input box should be linked to the step 2 input slides. Same goes for output. Output slides should be linked to step 3. Visulaization slide should be linked to Step 4.This SLIDE is to provide the user an option to go through only specific content from the animation

To view the protocol for submitting files, click on input. To view the protocol for retrieving and analyzing output files, click on output. To proceed to full animation click on the arrow.

Proceed to Full Animation

DATA GENERATION

VISUALIZATION

Page 18: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 1.a - Protein Molecular Interaction Network –Data Extraction

Audio Narration

1

5

34

2

Action Description of the action

Protein Microarray Chips

Scanned Slides

Protein Samples

Users can extract protein microarray data from Microarray Experiments. The normalized microarray data gives an insight into the regulation of the genes. This regulation is checked by studying the microarray data through Gene Expression Profile Data Analysis software. For a detailed insight into the Microarray Technique, study the OSCAR animation for Microarray Technologies.

Schematic for extracting the data for defined problem

Follow the animation. Re-draw the figures.

Page 19: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 1.b - Protein Molecular Interaction Network –Data Extraction

Audio Narration

1

5

34

2

Action Description of the actionProtein molecular interaction software are used to build and analyze networks of proteins, given their accession numbers. The networks are built by mapping input data to the software’s knowledgebase. Here, we explain with a list of proteins modulated in the disease condition called glioma, which are extracted from 1.literature resources.2.Microarray DatabasesAs an output we get a spreadsheet containing microarray data

PMID ACCESSION NUMBER

PROTEIN NAME

GLIOMA TYPE

VALIDATION FOLD CHANGE

p-VALUE

Extract Data from Literature sources and store it in a spreadsheet

Schematic for extracting the data for defined problem

The first panel is about extracting information from web resource. Show the required PDFs getting downloaded and read through to extract data. Follow this by a screen shot of Microarray databases. In the end show the “Raw.xls” file being formed.

Rawdata.xls

Literature Resource

Query Term High-Grade glioma

Extract data from Microarray Data repositories

Page 20: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 1.c - Protein Molecular Interaction Network –Data Extraction

Audio Narration

1

5

34

2

Action Description of the actionProtein molecular interaction software are used to build and analyze networks of proteins, given their accession numbers. The networks are built by mapping input data to the software’s knowledgebase. Here, we explain with a list of proteins modulated in the disease condition called glioma, which are extracted from literature resources or databases.

PMID ACCESSION NUMBER

PROTEIN NAME

GLIOMA TYPE

VALIDATION FOLD CHANGE

p-VALUE

Extract data from Microarray Data repositories

Schematic for extracting the data for defined problem

The first panel is about extracting information from web resource. Show the required PDFs getting downloaded and read through to store specific data in spreadsheets

Rawdata.xls

Page 21: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 2.a - Protein Molecular Interaction Network –Input

Audio Narration

1

5

34

2

CREATE PROJECT UPLOAD MAP DATA

Enter Project Name

Enter Experiment Type

Biomarker AnalysisCore AnalysisToxicology AnalysisMetabolic Analysis

Core Analysis

Action Description of the actionSchematic for Input

Show the simulation of the software. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors

The name of the project and experiments must be entered by the user in the software for the purpose of saving the current status of the work. In the experiment type, the user must select the type of analysis that needs to be conducted on the dataset. For this Glioma case study, we undertake core analysis of the data to identify its network.

Project Glioma

Page 22: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 2.b - Protein Molecular Interaction Network –Input

1

5

34

2

CREATE PROJECT UPLOAD DATA MAP DATA

Upload Excel File Folder1/Rawdata.xls

PMID Protein Name Accession Number Glioma Type17653765 Fructose bisphosphate aldolase 78070601 anaplastic oligodendroglioma17653765 Phosphoglycerate mutase 1 56081766 anaplastic oligodendroglioma17653765 Carbonic anhydrase ii 443135 anaplastic oligodendroglioma

Enolase 1 4503571 Glioblastoma multiformeEnolase 693933 Glioblastoma multiformea-Enolase like 1 3282243 Glioblastoma multiformeEnolase 1 4503571 Glioblastoma MultiformeAldolase C, fructose biphosphate P09972 glioblastoma,Grade II,III,IVEnolase 1 P06733 glioblastoma,Grade II,III,IVEnolase 2 P09104 glioblastoma,Grade II,III,IVGlyceraldehyde-3-phosphate dehydrogenase, liver P04406 glioblastoma,Grade II,III,IVLactate dehydrogenase B P07195 glioblastoma,Grade II,III,IVPhosphoglycerate kinase 1 P00558 glioblastoma,Grade II,III,IVPhosphoglycerate mutase 1, brain Q6P6D7 glioblastoma,Grade II,III,IVPyruvate kinase, isozymes M1/M2 P14618-2 glioblastoma,Grade II,III,IVPyruvate kinase, isozymes M1/M2, splice isoform M1 P14618 glioblastoma,Grade II,III,IVTriosephosphate isomerase P60174 glioblastoma,Grade II,III,IVPyruvate kinase NI Malignant GliomaGlyceraldehyde 3-phosphate dehydrogenase P04406 Malignant GliomaTriosephosphate isomerase P60174 Malignant GliomaEnolase 1 P06733 Malignant GliomaAldolase A NI Malignant Glioma

19109410 GAPDH P16858 Glioma gradeIII,IV19109410 Pyruvate kinase isozyme M1/M2 P52480 Glioma gradeIII,IV19109410 Alpha-Enolase P17182 Glioma gradeIII,IV19109410 Phosphoglycerate kinase 1 P09411 Glioma gradeIII,IV19109410 GAPDH P16858 Glioma gradeIII,IV

MENTION THE TYPE OF IDENTIFIER SUCH AS:

UNIPROT, GENEBANK ID, REFSEQ ID, ENTREZ GENE,

ETC

Page 23: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 2.c - Protein Molecular Interaction Network –Input

Audio Narration1

5

34

2

Action Description of the action

Schematic for Input Show the simulation of the

software. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the next tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate colors.

Upload the Raw data file that was created after scrutinizing the papers. The format of the Raw data file to be uploaded varies amongst different software. Although most software recognize Spreadsheet format of data, some of them have their own specific input file format such as .sif file for Cytoscape. Once the raw data file is uploaded, the tool will display all columns. The user needs to select the columns that are to be given to the tool. Out of all the columns, it is compulsory to enter the ACCESSION NUMBER (OR ANY OTHER PROTEIN IDENTIFIER). This column is highlighted in red. These identifiers can be of multiple types, which need to be defined so that the tool can match the user’s data to its dictionary of identifier terms called the knowledgebase. All other information provided is optional and the users can provide them depending on the nature of analysis.

Page 24: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 2.d - Protein Molecular Interaction Network –Input

1

5

34

2

CREATE PROJECT UPLOAD DATA MAP DATA

PMID Protein Name Accession Number Glioma Type

17653765 Fructose bisphosphate aldolase 78070601 anaplastic oligodendroglioma

17653765 Phosphoglycerate mutase 1 56081766 anaplastic oligodendroglioma

17653765 Carbonic anhydrase ii 443135 anaplastic oligodendroglioma

Enolase 1 4503571 Glioblastoma multiforme

Enolase 693933 Glioblastoma multiforme

a-Enolase like 1 3282243 Glioblastoma multiforme

Enolase 1 4503571 Glioblastoma Multiforme

Aldolase C, fructose biphosphate P09972 glioblastoma,Grade II,III,IV

Enolase 1 P06733 glioblastoma,Grade II,III,IV

Enolase 2 P09104 glioblastoma,Grade II,III,IV

Glyceraldehyde-3-phosphate dehydrogenase, liver P04406 glioblastoma,Grade II,III,IV

Lactate dehydrogenase B P07195 glioblastoma,Grade II,III,IV

Phosphoglycerate kinase 1 P00558 glioblastoma,Grade II,III,IV

Phosphoglycerate mutase 1, brain Q6P6D7 glioblastoma,Grade II,III,IV

Pyruvate kinase, isozymes M1/M2 P14618-2 glioblastoma,Grade II,III,IV

Pyruvate kinase, isozymes M1/M2, splice isoform M1 P14618 glioblastoma,Grade II,III,IV

Triosephosphate isomerase P60174 glioblastoma,Grade II,III,IV

Pyruvate kinase NI Malignant Glioma

Glyceraldehyde 3-phosphate dehydrogenase P04406 Malignant Glioma

Triosephosphate isomerase P60174 Malignant Glioma

Enolase 1 P06733 Malignant Glioma

Aldolase A NI Malignant Glioma

19109410 GAPDH P16858 Glioma gradeIII,IV

19109410 Pyruvate kinase isozyme M1/M2 P52480 Glioma gradeIII,IV

19109410 Alpha-Enolase P17182 Glioma gradeIII,IV

19109410 Phosphoglycerate kinase 1 P09411 Glioma gradeIII,IV

19109410 GAPDH P16858 Glioma gradeIII,IV

Page 25: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 2.d - Protein Molecular Interaction Network –Input

1

5

34

2

The input raw data is mapped to the knowledgebase of the software to provide a uniform set of IDs for building a network. The IDs from the input file that are not matched with its knowledgebase are highlighted in red

Schematic for Input

This file is same as input file. Only the entries that are not mapped need to be highlighted as animation

Audio NarrationAction Description of the action

Page 26: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 2.e - Protein Molecular Interaction Network –Input

1

5

34

2

CREATE PROJECT UPLOAD MAP DATA

Data gets mapped to Knowledgebase of software to produce output files

ID Gene Description Location Family78070601 ALDOC* aldolase C, fructose-bisphosphate Cytoplasm enzyme

56081766 PGAM1* phosphoglycerate mutase 1 (brain) Cytoplasm phosphatase

4503571 ENO1* enolase 1, (alpha) Cytoplasmtranscription regulator

693933 ENO1* enolase 1, (alpha) Cytoplasmtranscription regulator

P09972 ALDOC* aldolase C, fructose-bisphosphate Cytoplasm enzyme

P06733 ENO1* enolase 1, (alpha) Cytoplasmtranscription regulator

P09104 ENO2 enolase 2 (gamma, neuronal) Cytoplasm enzyme

P04406 GAPDH (includes EG:2597)* glyceraldehyde-3-phosphate dehydrogenase Cytoplasm enzymeP07195 LDHB lactate dehydrogenase B Cytoplasm enzymeP00558 PGK1* phosphoglycerate kinase 1 Cytoplasm kinase

Q6P6D7 PGAM1* phosphoglycerate mutase 1 (brain) Cytoplasm phosphataseP14618-2 PKM2* pyruvate kinase, muscle Cytoplasm kinaseP14618 PKM2* pyruvate kinase, muscle Cytoplasm kinaseP60174 TPI1* triosephosphate isomerase 1 Cytoplasm enzyme

P04406 GAPDH (includes EG:2597)* glyceraldehyde-3-phosphate dehydrogenase Cytoplasm enzymeP60174 TPI1* triosephosphate isomerase 1 Cytoplasm enzyme

P06733 ENO1* enolase 1, (alpha) Cytoplasmtranscription regulator

P16858 GAPDH (includes EG:14433)* glyceraldehyde-3-phosphate dehydrogenasePlasma Membrane enzyme

P52480 PKM2* pyruvate kinase, muscle Cytoplasm kinase

P17182 ENO1* enolase 1, (alpha) Cytoplasmtranscription regulator

Page 27: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 2.e - Protein Molecular Interaction Network –Input

1

5

34

2

The tool also extracts other relevant information from its knowledgebase corresponding to that ID. The uniform IDs and the new columns are displayed in the form of a new spreadsheet which has the refined data. The columns highlighted in blue are the ones that are newly added. The red column is provided for uniformity by taking one specific naming scheme for identifiers.

Schematic for Input

Show the simulation of the software. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the next tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors

Audio NarrationAction Description of the action

Page 28: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 3 - Protein Interaction Data Analysis - Output

Audio Narration

1

5

34

2

BUILD PATHWAY OUTPUT NETWORK OUTPUT PATHWAY

Action Description of the action

1 Genetic Disorder, Neurological Disease, Nucleic Acid Metabolism

2 Cell-To-Cell Signaling and Interaction, Nervous System Development and Function, Cellular Assembly and Organization

3 Cancer, Reproductive System Disease, Gastrointestinal Disease

TOP NETWORK FUNCTIONS TOP DISEASE NETWORK TOP PHYSIOLOGICAL NETWORK

1 Cancer2 Gastrointestinal Disease3 Neurological Disease

1. Nervous System Development and Function

2. Hematological System Development and Function

3. Immune Cell Trafficking

1. Glycolysis/Gluconeogenesis2. Mitochondrial Dysfunction 3. 14-3-3-mediated Signaling

TOP CANONICAL PATHWAY

The tools provide a summary of results which show the top networks produced in each category. The ranking is based on the number of mappings from user input dataset to software’s knowledgebase. The prediction of “Neurological Disease”, “Cancer”, “Nervous System” as top networks reinforce our data analysis. The data analysis from this tool also shows that “Glycolysis/Gluconeogenesis” is the pathway that is getting modulated from our list of proteins

Schematic for Output summary

Follow the animation

Page 29: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 4.a -Protein Molecular Interaction Network - Output

Audio Narration

1

5

34

2

BUILD PATHWAY OUTPUT NETWORK OUTPUT PATHWAY

Action Description of the actionUsers can modulate parameters which define the number and size of networks to be formed. Users can also modulate the presence of molecules apart from genes, proteins or RNA. The molecules that have shown relationships with other genes or proteins of the knowledgebase are mapped into the network. The IDs that are repetitive will point to the same node in the network

Select the number of networks to be constructed

1

Select the maximum number of Molecules in the network

70

Select endogenous chemicals No

Schematic for Output summary

Follow the animation

Page 30: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 4.b - Protein Molecular Interaction Network - Output

Audio Narration

1

5

34

2

BUILD PATHWAY OUTPUT NETWORK OUTPUT PATHWAY

Action Description of the actionSchematic for Output summary

Follow the animation. Highlight the yellow boxes in animation as well.

From the input given by users, the tool analyzes the set of molecules, which are present in its database of metabolic network. The molecules that are found to occur most frequently are used as seeds which connect to other such molecules. Networks are also extended based on interactions between two small networks to produce a larger network. Such analysis will depend on the parameters set by the user in the initial steps. Based on this information, the tool will predict the pathway to which the molecules are most likely to belong. Further analysis of these pathways can be carried out using metabolic profile databases.

Seed Molecules Molecular interaction Another Small Interaction NetworkNetwork interaction

α -D-Glucose-6P

β-D-Fructose-6P

β-D-Fructose-1,6-P2

Glyceraldehyde-3P

α-D-Glucose-1P

Glyceraldehyde-2P

Phosphoenol pyruvate

Starch and sucrose metabolism

Pentose Phosphate Pathway

Glycerone-P

β-D-Glucose

α -D-Glucose

Page 31: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 4.c - Protein Molecular Interaction Network - Visualization

1

5

34

2

http://www.ingenuity.com/

Page 32: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 3.d - Gene Expression Profile Data - Visualization

Audio Narration1

5

34

2

http://www.ingenuity.com/

Action Description of the actionZoom effect Animator needs to re-draw all

screen shots as they have been taken from the references software. Animator must not copy the image or a part thereof, in the animation. Show the image with each part zooming and then coming as a zoomed image.

The pathway information relevant in Gliomas Studies, from the input data, can be extracted. In this pathway, we can observe the role of Isocitrate Dehydrogenase (IDH), in regulation of metabolism during Glioma. Recently a published study has also shown the involvement of IDH in Gloma related pathways. Most such software are linked to Protein Pathway Interaction Software, which are described in detail in the next part of the animation.

Page 33: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Master Layout (Part 3)

5

3

2

4

1This animation consists of 3 parts:Part 1: Gene Expression Data AnalysisPart 2 : Protein Interaction Data AnalysisPart 3: Metabolic Profile Databases

Select the level of organization of the biological system to study

Select from one of the publicly available databases

Select the relevant options in the database to view the pathway network and interaction data of the system under consideration

http://www.genome.jp/kegg/

Page 34: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Definitions of the components:Part 3 – Metabolic profile databases

5

3

2

4

11. Biological System: In the biological context, a system refers to an entity that exists

with the help of mutual interactions between its components.

2. Level of organization: The level of organization describes the complexity of the biological system being studied. Components of one system could be made up of constituent parts, which in turn form another system at a different level of organization. For example, a cell is a system in itself. However for larger physiological systems, a cell would only be a component within it.

3. Visualization: To explore various protein-protein interactions, it is critical to percept lists of protein interaction data, which is retrieved as elaborate spreadsheets that make the analysis cumbersome. Mapping of such data in a diagrammatic form makes it easier for scientists to develop a biological insight into the interaction data.

4. Functional annotation: By examining the maps of protein–protein interaction data, researchers can discover new biological relationships between proteins or predict their functions based on specific interactions.

5. Graphical Notation: The first step in the analysis of protein interaction data is the identification of protein complexes and groups of complexes. In a simple graphical notation, a “Node” represents a protein while the “Edges” represent the interaction between the two proteins.

Page 35: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

5

3

2

4

16. Pathway: A pathway in Biology refers to a series of inter-related metabolic

reactions, which depicts the order of conversion of one entity to another.

7. Meta node: It is a single node onto which all members of a protein cluster are collapsed. These meta nodes help in deciphering biological applications of the networks which are collapsed as one.

Definitions of the components:Part 3 – Metabolic profile databases

Page 36: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 1: Pathway Databases – Input

Action Audio Narration

1

5

34

Description of the action

Choose the system

ORGANISMENZYMESDISEASEPATHWAY

2METABOLISM

GENETIC INFORMATION PROCESSING

ENVIRONMENTAL INFORMATION PROCESSING

CELLULAR PROCESSES

ORGANISMAL SYSTEMS

CANCER

IMMUNE SYSTEM DISEASE

NEURO DEGENERATIVE DISEASE

CARDIO-VASCULAR DISEASE

METABOLIC DISEASES

INFECTIOUS DISEASES

ENZYME NAME

EC NUMBER

SYNONYMS

PROKARYOTES

PROTISTS

FUNGI

PLANTS

ANIMALS

Animation of the Input search strategies for Pathway databases

Follow the steps in the animation. Re-draw images. The audio narration must be read, as the cursor in the animation moves to the 4 headings of the web-page

The pathway databases are repositories to gain a visual insight into the biological interaction of genes and proteins. The general features of these databases include searching by1.Pathway: The entire network information in the web based database can be searched by selecting the metabolic pathway of interest, such as cellular processes, genetic information flow, etc.2.Diseases: Here all the networks are grouped based on the diseases which are caused by their modulation.3.Enzymes: The enzymes belonging to the pathway database are grouped and the pathways can be searched by giving their enzyme information as a query.4.Organism: All organisms are given a unique identifier. Users can also select the organism, and then study the pathway as it occurs in those organisms.http://www.genome.jp/kegg/

CANCER

IMMUNE SYSTEM DISEASE

NEURO DEGENERATIVE DISEASE

CARDIO-VASCULAR DISEASE

METABOLIC DISEASES

INFECTIOUS DISEASES

Page 37: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Step 2.a - Pathway Databases – Visualization of Pathways for Glioma

Action

1

5

34

2

Zoomed Images Animator needs to re-draw all screen shots as they have been taken from the references software. Animator must not copy the image or a part thereof., in the final animation. Display Image. Highlight the “nodes” and “edges” as shown in animation. The red box zooms to show the area of the network which is getting zoomed into. This is followed by the zoomed image of that part of the network. Each zoomed image is followed by the narration in the order given.

We use pathway databases to study one of the pathways from our Glioma studies in Protein Interaction Networks, namely “Cell Cycle, Cellular Assembly and Organization, DNA Replication, Recombination, and Repair”. Here we highlight the nodes and edges within the pathway. Here the nodes are the corresponding gene and the edges are interaction between them. Users can also find images from such visualization tools for specific gene interaction such as in this case we depict the interactions of TP53, derived from Glioma studies.

http://www.ingenuity.com/, http://www.cytoscape.org/

Nodes

Edges

Audio NarrationDescription of the action

Page 38: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

1

5

34

2

http://www.genome.jp/kegg/

Step 2.c - Pathway Databases – Interpretation

ORTHOLOGY ENZYME REACTION

Definition: glucose-1-phosphate phosphodismutase

Pathways it belongs to: Pathway_ID1: Glycolysis / GluconeogenesisPathway_ID2: Starch and sucrose metabolism

Genes involved: Gene_ID123: BSU Gene_ID273: BLIGene_ID987: BLDGene_ID789: SPZ

Action Description of the action Audio NarrationOptions given once you click on a particular entity of Pathway

Pathways can also be pbtained for protein interaction networks. In such networks, the metabolites are the “nodes” and the reaction between them are the “edges”. Each node such as a substrate, reactant or an enzyme is hyper-linked to another page which gives the detailed information about the particular entity. Each element of the pathway including the pathway itself is assigned an identifier for the purpose of referring to it from anywhere in the database. It also gives all the information related to the molecule or reaction such as its orthology, the pathways it belongs to and the corresponding gene IDs.

In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors

Page 39: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Action Audio Narration

1

5

34

Description of the action

2

http://www.genome.jp/kegg/

Step 2.d - Pathway Databases – Interpretation

ORTHOLOGY ENZYME REACTION

Enzyme Commission Number: 5.4.2.2

Class of Enzyme: Transferases Transferring phosphorus-containing groups Phosphotransferases with an alcohol group as acceptor

Substrate: D-glucose 1-phosphate

Products: D-glucose, D-glucose 1,6-bisphosphate

Options given once you click on a particular entity of Pathway

It also gives all the enzyme related information for the reaction such as the Enzyme nomenclature, Enzyme Commission Number, Class of Enzyme, substrates and products.

In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors

Page 40: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Action Audio Narration

1

5

34

Description of the action

2

Step 2.e - Pathway Databases – Interpretation

ORTHOLOGY ENZYME REACTION

Metabolic Reaction: Metabolism; Carbohydrate Metabolism; Glycolysis / Gluconeogenesis 2 D-Glucose 1-phosphate <=> alpha-D-Glucose 1,6-bisphosphate +alpha-D-Glucose

http://www.genome.jp/dbget-bin/www_bget?R00960+RP00303+RC00078

Options given once you click on a particular entity of Pathway

Re-Draw the equation. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors

The metabolic reaction that the enzyme is involved in is also provided in its equation form along with structures of reaction substrates.

Page 41: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Interactivity option 1:Step No: 1 - Assignment

Boundary/limitsInteractivity Type Options Results

1

2

5

3

4

.gal Files

Gene Expression Profile Data

Input

Protein Interaction Data

Input

Metabolic Profile Data

Input

.cel Files

Name of Enzyme

Name of PathwayList of Protein Identifiers

Name of Disease

.gpr Files .sif Files .cdt Files

Drag the yellow buttons into one amongst the 3 Analysis Tools. The correct results are given in the next slide

Type of Input Data

Type of Analysis Tools

If the user drags it into the right box, the animation should flash a “Tick” Sign. If the box is incorrect, flash a “Cross” Sign and ask the user to “Try Again”

Drag and Drop.

Page 42: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Interactivity option 1:Step No: 2 -RESULTS

Boundary/limitsInteracativity Type Options Results

1

2

5

3

4

Gene Expression Profile Data

Input

Protein Interaction Data

Input

Metabolic Profile Data

Input

Name of Enzyme

Name of Pathway

List of Protein Identifiers Name of Disease

.gal Files

.cel Files

.gpr Files

.cdt Files

.sif Files

Drag the yellow buttons into one amongst the 3 Analysis Tools. The correct results are given in the next slide

If the user drags it into the right box, the animation should flash a “Tick” Sign. If the boox is incorrect, flash a “Cross” Sign and ask the user to “Try Again”

Drag and Drop.

Page 43: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Questionnaire - 11

5

2

4

3

1. Which amongst these is not a feature of a Protein network?a. Edgesb. Nodesc. Metanodesd. Antinodes

2. What are the results of Gene Expression Analysis?a. Heat Mapb. Fold Changec. P-valued. All of the Above

3. Protein Pathways can be studied using?a. Stand-alone toolsb. Web-based toolsc. Bothd. None

Page 44: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Questionnaire - 21

5

2

4

3

4. Which is a mandatory entry to study Protein Interaction Pathways?a. Fold Changeb. p-Valuec. Unique Identifier like Accession Numberd. All of the Above

5. In case of Gene Expression Data Analysis, Heat Map represents?a. Significance of the Geneb. Fold Changec. p-valued. Gene Ontology

6. Which amongst these is a valid Microarray File Extension?a. GALb. GPRc. CELd. All of the Above

Page 45: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Links for further reading

Books

Systems Biology: An Approach P Kohl1, EJ Crampin2, TA Quinn1 and D Noble1

An introduction to Systems Biology: Design Principles of Biological Circuits by Uri Alon June 2006,

Chapman&Hall/CRC, Taylor and Francis Group

Introduction to Systems Biology Choi, Sangdun (California Institute of Technology) July 2007,

Humana Press

Research Papers

Visualizing biological pathways: requirements analysis, systems evaluation, and research agenda. Saraiya, P., North, C. & Duca, K. (2005).

Tools for visually exploring biological networks. Suderman, M. & Hallett, M (2007).

A survey of visualization tools for biological network analysis. Pavlopoulos, G.A.G., Wegener, A.L.A. & Schneider, R.R. (2008).

Visualization of omics data for systems biology Nils Gehlenborg, Seán I O’Donoghue, Nitin S Baliga, Alexander Goesmann, Matthew A Hibbs, Hiroaki Kitano, Oliver Kohlbacher, Heiko Neuweger, Reinhard Schneider, Dan Tenenbaum & Anne-Claude Gavin. Nature (2010)

Page 46: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Links for further reading

Webliography

http://www.genome.jp/kegg/

http://www.chem.agilent.com/Library/usermanuals/Public/GeneSpring-manual.pdfhttp://www.moleculardevices.com/pages/software/gn_genepix_pro.html

http://www.cytoscape.org/

http://www.ingenuity.com/

http://www.genego.com/metacore.php

http://www.ece.cmu.edu/~brunos/Lecture3.pdf

http://pathways.embl.de/

http://www.biocyc.org/

http://www.arena3d.org/

http://spotfire.tibco.com/

http://www.bioconductor.org/

http://www.chem.agilent.com/en-US/Products/software/lifesciencesinformatics/genespringgx/pages/gp34727.aspx

http://www.cytoscape.org/download.php

Page 47: Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway.

Links for further reading

Following URLs are used for animations

http://www.genome.jp/kegg/

Biochemistry by A.L.Lehninger et al., 3rd edition

http://www.ingenuity.com/

http://www.cytoscape.org/

http://www.genome.jp/dbget-bin/www_bget?R00960+RP00303+RC00078

http://www.genego.com/metacore.php

http://www.ece.cmu.edu/~brunos/Lecture3.pdf

http://pathways.embl.de/

http://www.chem.agilent.com/Library/usermanuals/Public/GeneSpring-manual.pdfhttp://www.moleculardevices.com/pages/software/gn_genepix_pro.html